* [PATCH v1 00/10] baseband/acc200 @ 2022-07-08 0:01 Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 01/10] baseband/acc200: introduce PMD for ACC200 Nicolas Chautru ` (10 more replies) 0 siblings, 11 replies; 50+ messages in thread From: Nicolas Chautru @ 2022-07-08 0:01 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal, trix Cc: maxime.coquelin, mdr, bruce.richardson, david.marchand, stephen, Nicolas Chautru This is targeting 22.11 and includes the PMD for the integrated accelerator on Intel Xeon SPR-EEC. There is a dependency on that parallel serie still in-flight which extends the bbdev api https://patches.dpdk.org/project/dpdk/list/?series=23894 I will be offline for a few weeks for the summer break but Hernan will cover for me during that time if required. Thanks Nic Nicolas Chautru (10): baseband/acc200: introduce PMD for ACC200 baseband/acc200: add HW register definitions baseband/acc200: add info get function baseband/acc200: add queue configuration baseband/acc200: add LDPC processing functions baseband/acc200: add LTE processing functions baseband/acc200: add support for FFT operations baseband/acc200: support interrupt baseband/acc200: add device status and vf2pf comms baseband/acc200: add PF configure companion function MAINTAINERS | 3 + app/test-bbdev/meson.build | 3 + app/test-bbdev/test_bbdev_perf.c | 76 + doc/guides/bbdevs/acc200.rst | 244 ++ doc/guides/bbdevs/index.rst | 1 + drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ drivers/baseband/acc200/acc200_pmd.h | 690 ++++ drivers/baseband/acc200/acc200_vf_enum.h | 89 + drivers/baseband/acc200/meson.build | 8 + drivers/baseband/acc200/rte_acc200_cfg.h | 115 + drivers/baseband/acc200/rte_acc200_pmd.c | 5403 ++++++++++++++++++++++++++++++ drivers/baseband/acc200/version.map | 10 + drivers/baseband/meson.build | 1 + 13 files changed, 7111 insertions(+) create mode 100644 doc/guides/bbdevs/acc200.rst create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h create mode 100644 drivers/baseband/acc200/acc200_pmd.h create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h create mode 100644 drivers/baseband/acc200/meson.build create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c create mode 100644 drivers/baseband/acc200/version.map -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v1 01/10] baseband/acc200: introduce PMD for ACC200 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru @ 2022-07-08 0:01 ` Nicolas Chautru 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru 2022-07-08 0:01 ` [PATCH v1 02/10] baseband/acc200: add HW register definitions Nicolas Chautru ` (9 subsequent siblings) 10 siblings, 1 reply; 50+ messages in thread From: Nicolas Chautru @ 2022-07-08 0:01 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal, trix Cc: maxime.coquelin, mdr, bruce.richardson, david.marchand, stephen, Nicolas Chautru [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=a, Size: 19422 bytes --] This patch introduce stubs for device driver for the ACC200 integrated VRAN accelerator on SPR-EEC Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- MAINTAINERS | 3 + doc/guides/bbdevs/acc200.rst | 244 +++++++++++++++++++++++++++++++ doc/guides/bbdevs/index.rst | 1 + drivers/baseband/acc200/acc200_pmd.h | 38 +++++ drivers/baseband/acc200/meson.build | 6 + drivers/baseband/acc200/rte_acc200_pmd.c | 179 +++++++++++++++++++++++ drivers/baseband/acc200/version.map | 3 + drivers/baseband/meson.build | 1 + 8 files changed, 475 insertions(+) create mode 100644 doc/guides/bbdevs/acc200.rst create mode 100644 drivers/baseband/acc200/acc200_pmd.h create mode 100644 drivers/baseband/acc200/meson.build create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c create mode 100644 drivers/baseband/acc200/version.map diff --git a/MAINTAINERS b/MAINTAINERS index 1652e08..73284a1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1337,6 +1337,9 @@ F: doc/guides/bbdevs/features/fpga_5gnr_fec.ini F: drivers/baseband/acc100/ F: doc/guides/bbdevs/acc100.rst F: doc/guides/bbdevs/features/acc100.ini +F: drivers/baseband/acc200/ +F: doc/guides/bbdevs/acc200.rst +F: doc/guides/bbdevs/features/acc200.ini Null baseband M: Nicolas Chautru <nicolas.chautru@intel.com> diff --git a/doc/guides/bbdevs/acc200.rst b/doc/guides/bbdevs/acc200.rst new file mode 100644 index 0000000..3a4dd55 --- /dev/null +++ b/doc/guides/bbdevs/acc200.rst @@ -0,0 +1,244 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2022 Intel Corporation + +Intel(R) ACC200 vRAN Dedicated Accelerator Poll Mode Driver +=========================================================== + +The Intel® vRAN Dedicated Accelerator ACC200 peripheral enables cost-effective 4G +and 5G next-generation virtualized Radio Access Network (vRAN) solutions integrated on +Sapphire Rapids EEC Intel(R)7 based Xeon(R) multi-core Serverprocessor. + +Features +-------- + +The ACC200 includes a 5G Low Density Parity Check (LDPC) encoder/decoder, rate match/dematch, +Hybrid Automatic Repeat Request (HARQ) with access to DDR memory for buffer management, a 4G +Turbo encoder/decoder, a Fast Fourier Transform (FFT) block providing DFT/iDFT processing offload +for the 5G Sounding Reference Signal (SRS), a Queue Manager (QMGR), and a DMA subsystem. +There is no dedidated on-card memory for HARQ, this is using coherent memory on the CPU side. + +These correspond to the following features exposed by the PMD: + +- LDPC Encode in the Downlink (5GNR) +- LDPC Decode in the Uplink (5GNR) +- Turbo Encode in the Downlink (4G) +- Turbo Decode in the Uplink (4G) +- FFT processing +- SR-IOV with 16 VFs per PF +- Maximum of 256 queues per VF +- MSI + +ACC200 PMD supports the following BBDEV capabilities: + +* For the LDPC encode operation: + - ``RTE_BBDEV_LDPC_CRC_24B_ATTACH`` : set to attach CRC24B to CB(s) + - ``RTE_BBDEV_LDPC_RATE_MATCH`` : if set then do not do Rate Match bypass + - ``RTE_BBDEV_LDPC_INTERLEAVER_BYPASS`` : if set then bypass interleaver + +* For the LDPC decode operation: + - ``RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK`` : check CRC24B from CB(s) + - ``RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP`` : drops CRC24B bits appended while decoding + - ``RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK`` : check CRC24A from CB(s) + - ``RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK`` : check CRC16 from CB(s) + - ``RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE`` : provides an input for HARQ combining + - ``RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE`` : provides an input for HARQ combining + - ``RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE`` : disable early termination + - ``RTE_BBDEV_LDPC_DEC_SCATTER_GATHER`` : supports scatter-gather for input/output data + - ``RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION`` : supports compression of the HARQ input/output + - ``RTE_BBDEV_LDPC_LLR_COMPRESSION`` : supports LLR input compression + +* For the turbo encode operation: + - ``RTE_BBDEV_TURBO_CRC_24B_ATTACH`` : set to attach CRC24B to CB(s) + - ``RTE_BBDEV_TURBO_RATE_MATCH`` : if set then do not do Rate Match bypass + - ``RTE_BBDEV_TURBO_ENC_INTERRUPTS`` : set for encoder dequeue interrupts + - ``RTE_BBDEV_TURBO_RV_INDEX_BYPASS`` : set to bypass RV index + - ``RTE_BBDEV_TURBO_ENC_SCATTER_GATHER`` : supports scatter-gather for input/output data + +* For the turbo decode operation: + - ``RTE_BBDEV_TURBO_CRC_TYPE_24B`` : check CRC24B from CB(s) + - ``RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE`` : perform subblock de-interleave + - ``RTE_BBDEV_TURBO_DEC_INTERRUPTS`` : set for decoder dequeue interrupts + - ``RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN`` : set if negative LLR input is supported + - ``RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP`` : keep CRC24B bits appended while decoding + - ``RTE_BBDEV_TURBO_DEC_CRC_24B_DROP`` : option to drop the code block CRC after decoding + - ``RTE_BBDEV_TURBO_EARLY_TERMINATION`` : set early termination feature + - ``RTE_BBDEV_TURBO_DEC_SCATTER_GATHER`` : supports scatter-gather for input/output data + - ``RTE_BBDEV_TURBO_HALF_ITERATION_EVEN`` : set half iteration granularity + - ``RTE_BBDEV_TURBO_SOFT_OUTPUT`` : set the APP LLR soft output + - ``RTE_BBDEV_TURBO_EQUALIZER`` : set the turbo equalizer feature + - ``RTE_BBDEV_TURBO_SOFT_OUT_SATURATE`` : set the soft output saturation + - ``RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH`` : set to run an extra odd iteration after CRC match + - ``RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT`` : set if negative APP LLR output supported + - ``RTE_BBDEV_TURBO_MAP_DEC`` : supports flexible parallel MAP engine decoding + +Installation +------------ + +Section 3 of the DPDK manual provides instructions on installing and compiling DPDK. + +DPDK requires hugepages to be configured as detailed in section 2 of the DPDK manual. +The bbdev test application has been tested with a configuration 40 x 1GB hugepages. The +hugepage configuration of a server may be examined using: + +.. code-block:: console + + grep Huge* /proc/meminfo + + +Initialization +-------------- + +When the device first powers up, its PCI Physical Functions (PF) can be listed through these +commands for ACC200: + +.. code-block:: console + + sudo lspci -vd8086:57c0 + +The physical and virtual functions are compatible with Linux UIO drivers: +``vfio`` and ``igb_uio``. However, in order to work the 5G/4G +FEC device first needs to be bound to one of these linux drivers through DPDK. + + +Bind PF UIO driver(s) +~~~~~~~~~~~~~~~~~~~~~ + +Install the DPDK igb_uio driver, bind it with the PF PCI device ID and use +``lspci`` to confirm the PF device is under use by ``igb_uio`` DPDK UIO driver. + +The igb_uio driver may be bound to the PF PCI device using one of two methods for ACC200: + + +1. PCI functions (physical or virtual, depending on the use case) can be bound to +the UIO driver by repeating this command for every function. + +.. code-block:: console + + cd <dpdk-top-level-directory> + insmod ./build/kmod/igb_uio.ko + echo "8086 57c0" > /sys/bus/pci/drivers/igb_uio/new_id + lspci -vd8086:57c0 + + +2. Another way to bind PF with DPDK UIO driver is by using the ``dpdk-devbind.py`` tool + +.. code-block:: console + + cd <dpdk-top-level-directory> + ./usertools/dpdk-devbind.py -b igb_uio 0000:f7:00.0 + +where the PCI device ID (example: 0000:f7:00.0) is obtained using lspci -vd8086:57c0 + + +In a similar way the PF may be bound with vfio-pci as any PCIe device. + + +Enable Virtual Functions +~~~~~~~~~~~~~~~~~~~~~~~~ + +Now, it should be visible in the printouts that PCI PF is under igb_uio control +"``Kernel driver in use: igb_uio``" + +To show the number of available VFs on the device, read ``sriov_totalvfs`` file.. + +.. code-block:: console + + cat /sys/bus/pci/devices/0000\:<b>\:<d>.<f>/sriov_totalvfs + + where 0000\:<b>\:<d>.<f> is the PCI device ID + + +To enable VFs via igb_uio, echo the number of virtual functions intended to +enable to ``max_vfs`` file.. + +.. code-block:: console + + echo <num-of-vfs> > /sys/bus/pci/devices/0000\:<b>\:<d>.<f>/max_vfs + + +Afterwards, all VFs must be bound to appropriate UIO drivers as required, same +way it was done with the physical function previously. + +Enabling SR-IOV via vfio driver is pretty much the same, except that the file +name is different: + +.. code-block:: console + + echo <num-of-vfs> > /sys/bus/pci/devices/0000\:<b>\:<d>.<f>/sriov_numvfs + + +Configure the VFs through PF +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The PCI virtual functions must be configured before working or getting assigned +to VMs/Containers. The configuration involves allocating the number of hardware +queues, priorities, load balance, bandwidth and other settings necessary for the +device to perform FEC functions. + +This configuration needs to be executed at least once after reboot or PCI FLR and can +be achieved by using the functions ``rte_acc200_configure()``, +which sets up the parameters defined in the compatible ``acc200_conf`` structure. + +Test Application +---------------- + +BBDEV provides a test application, ``test-bbdev.py`` and range of test data for testing +the functionality of the device, depending on the device's +capabilities. The test application is located under app->test-bbdev folder and has the +following options: + +.. code-block:: console + + "-p", "--testapp-path": specifies path to the bbdev test app. + "-e", "--eal-params" : EAL arguments which are passed to the test app. + "-t", "--timeout" : Timeout in seconds (default=300). + "-c", "--test-cases" : Defines test cases to run. Run all if not specified. + "-v", "--test-vector" : Test vector path (default=dpdk_path+/app/test-bbdev/test_vectors/bbdev_null.data). + "-n", "--num-ops" : Number of operations to process on device (default=32). + "-b", "--burst-size" : Operations enqueue/dequeue burst size (default=32). + "-s", "--snr" : SNR in dB used when generating LLRs for bler tests. + "-s", "--iter_max" : Number of iterations for LDPC decoder. + "-l", "--num-lcores" : Number of lcores to run (default=16). + "-i", "--init-device" : Initialise PF device with default values. + + +To execute the test application tool using simple decode or encode data, +type one of the following: + +.. code-block:: console + + ./test-bbdev.py -c validation -n 64 -b 1 -v ./ldpc_dec_default.data + ./test-bbdev.py -c validation -n 64 -b 1 -v ./ldpc_enc_default.data + + +The test application ``test-bbdev.py``, supports the ability to configure the PF device with +a default set of values, if the "-i" or "- -init-device" option is included. The default values +are defined in test_bbdev_perf.c. + + +Test Vectors +~~~~~~~~~~~~ + +In addition to the simple LDPC decoder and LDPC encoder tests, bbdev also provides +a range of additional tests under the test_vectors folder, which may be useful. The results +of these tests will depend on the device capabilities which may cause some +testcases to be skipped, but no failure should be reported. + + +Alternate Baseband Device configuration tool +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +On top of the embedded configuration feature supported in test-bbdev using "- -init-device" +option mentioned above, there is also a tool available to perform that device configuration +using a companion application. +The ``pf_bb_config`` application notably enables then to run bbdev-test from the VF +and not only limited to the PF as captured above. + +See for more details: https://github.com/intel/pf-bb-config + +Specifically for the BBDEV ACC200 PMD, the command below can be used: + +.. code-block:: console + + ./pf_bb_config ACC200 -c ./acc200/acc200_config_vf_5g.cfg + ./test-bbdev.py -e="-c 0xff0 -a${VF_PCI_ADDR}" -c validation -n 64 -b 64 -l 1 -v ./ldpc_dec_default.data diff --git a/doc/guides/bbdevs/index.rst b/doc/guides/bbdevs/index.rst index cedd706..4e9dea8 100644 --- a/doc/guides/bbdevs/index.rst +++ b/doc/guides/bbdevs/index.rst @@ -14,4 +14,5 @@ Baseband Device Drivers fpga_lte_fec fpga_5gnr_fec acc100 + acc200 la12xx diff --git a/drivers/baseband/acc200/acc200_pmd.h b/drivers/baseband/acc200/acc200_pmd.h new file mode 100644 index 0000000..a22ca67 --- /dev/null +++ b/drivers/baseband/acc200/acc200_pmd.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#ifndef _RTE_ACC200_PMD_H_ +#define _RTE_ACC200_PMD_H_ + +/* Helper macro for logging */ +#define rte_bbdev_log(level, fmt, ...) \ + rte_log(RTE_LOG_ ## level, acc200_logtype, fmt "\n", \ + ##__VA_ARGS__) + +#ifdef RTE_LIBRTE_BBDEV_DEBUG +#define rte_bbdev_log_debug(fmt, ...) \ + rte_bbdev_log(DEBUG, "acc200_pmd: " fmt, \ + ##__VA_ARGS__) +#else +#define rte_bbdev_log_debug(fmt, ...) +#endif + +/* ACC200 PF and VF driver names */ +#define ACC200PF_DRIVER_NAME intel_acc200_pf +#define ACC200VF_DRIVER_NAME intel_acc200_vf + +/* ACC200 PCI vendor & device IDs */ +#define RTE_ACC200_VENDOR_ID (0x8086) +#define RTE_ACC200_PF_DEVICE_ID (0x57C0) +#define RTE_ACC200_VF_DEVICE_ID (0x57C1) + +/* Private data structure for each ACC200 device */ +struct acc200_device { + void *mmio_base; /**< Base address of MMIO registers (BAR0) */ + uint32_t ddr_size; /* Size in kB */ + bool pf_device; /**< True if this is a PF ACC200 device */ + bool configured; /**< True if this ACC200 device is configured */ +}; + +#endif /* _RTE_ACC200_PMD_H_ */ diff --git a/drivers/baseband/acc200/meson.build b/drivers/baseband/acc200/meson.build new file mode 100644 index 0000000..7b47bc6 --- /dev/null +++ b/drivers/baseband/acc200/meson.build @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2021 Intel Corporation + +deps += ['bbdev', 'bus_vdev', 'ring', 'pci', 'bus_pci'] + +sources = files('rte_acc200_pmd.c') diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c new file mode 100644 index 0000000..4103e48 --- /dev/null +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -0,0 +1,179 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#include <unistd.h> + +#include <rte_common.h> +#include <rte_log.h> +#include <rte_dev.h> +#include <rte_malloc.h> +#include <rte_mempool.h> +#include <rte_byteorder.h> +#include <rte_errno.h> +#include <rte_branch_prediction.h> +#include <rte_hexdump.h> +#include <rte_pci.h> +#include <rte_bus_pci.h> +#ifdef RTE_BBDEV_OFFLOAD_COST +#include <rte_cycles.h> +#endif + +#include <rte_bbdev.h> +#include <rte_bbdev_pmd.h> +#include "acc200_pmd.h" + +#ifdef RTE_LIBRTE_BBDEV_DEBUG +RTE_LOG_REGISTER_DEFAULT(acc200_logtype, DEBUG); +#else +RTE_LOG_REGISTER_DEFAULT(acc200_logtype, NOTICE); +#endif + +/* Free memory used for software rings */ +static int +acc200_dev_close(struct rte_bbdev *dev) +{ + RTE_SET_USED(dev); + return 0; +} + + +static const struct rte_bbdev_ops acc200_bbdev_ops = { + .close = acc200_dev_close, +}; + +/* ACC200 PCI PF address map */ +static struct rte_pci_id pci_id_acc200_pf_map[] = { + { + RTE_PCI_DEVICE(RTE_ACC200_VENDOR_ID, RTE_ACC200_PF_DEVICE_ID) + }, + {.device_id = 0}, +}; + +/* ACC200 PCI VF address map */ +static struct rte_pci_id pci_id_acc200_vf_map[] = { + { + RTE_PCI_DEVICE(RTE_ACC200_VENDOR_ID, RTE_ACC200_VF_DEVICE_ID) + }, + {.device_id = 0}, +}; + +/* Initialization Function */ +static void +acc200_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv) +{ + struct rte_pci_device *pci_dev = RTE_DEV_TO_PCI(dev->device); + + dev->dev_ops = &acc200_bbdev_ops; + + ((struct acc200_device *) dev->data->dev_private)->pf_device = + !strcmp(drv->driver.name, + RTE_STR(ACC200PF_DRIVER_NAME)); + ((struct acc200_device *) dev->data->dev_private)->mmio_base = + pci_dev->mem_resource[0].addr; + + rte_bbdev_log_debug("Init device %s [%s] @ vaddr %p paddr %#"PRIx64"", + drv->driver.name, dev->data->name, + (void *)pci_dev->mem_resource[0].addr, + pci_dev->mem_resource[0].phys_addr); +} + +static int acc200_pci_probe(struct rte_pci_driver *pci_drv, + struct rte_pci_device *pci_dev) +{ + struct rte_bbdev *bbdev = NULL; + char dev_name[RTE_BBDEV_NAME_MAX_LEN]; + + if (pci_dev == NULL) { + rte_bbdev_log(ERR, "NULL PCI device"); + return -EINVAL; + } + + rte_pci_device_name(&pci_dev->addr, dev_name, sizeof(dev_name)); + + /* Allocate memory to be used privately by drivers */ + bbdev = rte_bbdev_allocate(pci_dev->device.name); + if (bbdev == NULL) + return -ENODEV; + + /* allocate device private memory */ + bbdev->data->dev_private = rte_zmalloc_socket(dev_name, + sizeof(struct acc200_device), RTE_CACHE_LINE_SIZE, + pci_dev->device.numa_node); + + if (bbdev->data->dev_private == NULL) { + rte_bbdev_log(CRIT, + "Allocate of %zu bytes for device \"%s\" failed", + sizeof(struct acc200_device), dev_name); + rte_bbdev_release(bbdev); + return -ENOMEM; + } + + /* Fill HW specific part of device structure */ + bbdev->device = &pci_dev->device; + bbdev->intr_handle = pci_dev->intr_handle; + bbdev->data->socket_id = pci_dev->device.numa_node; + + /* Invoke ACC200 device initialization function */ + acc200_bbdev_init(bbdev, pci_drv); + + rte_bbdev_log_debug("Initialised bbdev %s (id = %u)", + dev_name, bbdev->data->dev_id); + return 0; +} + +static int acc200_pci_remove(struct rte_pci_device *pci_dev) +{ + struct rte_bbdev *bbdev; + int ret; + uint8_t dev_id; + + if (pci_dev == NULL) + return -EINVAL; + + /* Find device */ + bbdev = rte_bbdev_get_named_dev(pci_dev->device.name); + if (bbdev == NULL) { + rte_bbdev_log(CRIT, + "Couldn't find HW dev \"%s\" to uninitialise it", + pci_dev->device.name); + return -ENODEV; + } + dev_id = bbdev->data->dev_id; + + /* free device private memory before close */ + rte_free(bbdev->data->dev_private); + + /* Close device */ + ret = rte_bbdev_close(dev_id); + if (ret < 0) + rte_bbdev_log(ERR, + "Device %i failed to close during uninit: %i", + dev_id, ret); + + /* release bbdev from library */ + rte_bbdev_release(bbdev); + + rte_bbdev_log_debug("Destroyed bbdev = %u", dev_id); + + return 0; +} + +static struct rte_pci_driver acc200_pci_pf_driver = { + .probe = acc200_pci_probe, + .remove = acc200_pci_remove, + .id_table = pci_id_acc200_pf_map, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING +}; + +static struct rte_pci_driver acc200_pci_vf_driver = { + .probe = acc200_pci_probe, + .remove = acc200_pci_remove, + .id_table = pci_id_acc200_vf_map, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING +}; + +RTE_PMD_REGISTER_PCI(ACC200PF_DRIVER_NAME, acc200_pci_pf_driver); +RTE_PMD_REGISTER_PCI_TABLE(ACC200PF_DRIVER_NAME, pci_id_acc200_pf_map); +RTE_PMD_REGISTER_PCI(ACC200VF_DRIVER_NAME, acc200_pci_vf_driver); +RTE_PMD_REGISTER_PCI_TABLE(ACC200VF_DRIVER_NAME, pci_id_acc200_vf_map); diff --git a/drivers/baseband/acc200/version.map b/drivers/baseband/acc200/version.map new file mode 100644 index 0000000..c2e0723 --- /dev/null +++ b/drivers/baseband/acc200/version.map @@ -0,0 +1,3 @@ +DPDK_22 { + local: *; +}; diff --git a/drivers/baseband/meson.build b/drivers/baseband/meson.build index 686e98b..343f83a 100644 --- a/drivers/baseband/meson.build +++ b/drivers/baseband/meson.build @@ -7,6 +7,7 @@ endif drivers = [ 'acc100', + 'acc200', 'fpga_5gnr_fec', 'fpga_lte_fec', 'la12xx', -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 00/11] baseband/acc200 2022-07-08 0:01 ` [PATCH v1 01/10] baseband/acc200: introduce PMD for ACC200 Nicolas Chautru @ 2022-09-12 1:08 ` Nic Chautru 2022-09-12 1:08 ` [PATCH v2 01/11] baseband/acc100: refactory to segregate common code Nic Chautru ` (10 more replies) 0 siblings, 11 replies; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nic Chautru v2: Includes now code refactory to have common structures and code reused with the parallel ACC1XX serie PMD which can be shared moving forward. v1: This is targeting 22.11 and includes the PMD for the new serie integrated accelerator on Intel Xeon SPR-EEC. There is a dependency on that parallel patch serie still in-flight which extends the bbdev api https://patches.dpdk.org/project/dpdk/list/?series=23894 and is required to apply that patch. Nic Chautru (1): baseband/acc100: refactory to segregate common code Nicolas Chautru (10): baseband/acc200: introduce PMD for ACC200 baseband/acc200: add HW register definitions baseband/acc200: add info get function baseband/acc200: add queue configuration baseband/acc200: add LDPC processing functions baseband/acc200: add LTE processing functions baseband/acc200: add support for FFT operations baseband/acc200: support interrupt baseband/acc200: add device status and vf2pf comms baseband/acc200: add PF configure companion function MAINTAINERS | 3 + app/test-bbdev/meson.build | 3 + app/test-bbdev/test_bbdev_perf.c | 82 +- doc/guides/bbdevs/acc200.rst | 244 ++ doc/guides/bbdevs/index.rst | 1 + drivers/baseband/acc100/acc100_pf_enum.h | 939 ------ drivers/baseband/acc100/acc100_pmd.h | 449 +-- drivers/baseband/acc100/acc101_pmd.h | 10 - drivers/baseband/acc100/acc_common.h | 1388 +++++++++ drivers/baseband/acc100/rte_acc100_cfg.h | 70 +- drivers/baseband/acc100/rte_acc100_pmd.c | 1856 ++++-------- drivers/baseband/acc100/rte_acc_common_cfg.h | 101 + drivers/baseband/acc200/acc200_pf_enum.h | 108 + drivers/baseband/acc200/acc200_pmd.h | 196 ++ drivers/baseband/acc200/acc200_vf_enum.h | 83 + drivers/baseband/acc200/meson.build | 8 + drivers/baseband/acc200/rte_acc200_cfg.h | 48 + drivers/baseband/acc200/rte_acc200_pmd.c | 4195 ++++++++++++++++++++++++++ drivers/baseband/acc200/version.map | 10 + drivers/baseband/meson.build | 1 + 20 files changed, 7045 insertions(+), 2750 deletions(-) create mode 100644 doc/guides/bbdevs/acc200.rst create mode 100644 drivers/baseband/acc100/acc_common.h create mode 100644 drivers/baseband/acc100/rte_acc_common_cfg.h create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h create mode 100644 drivers/baseband/acc200/acc200_pmd.h create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h create mode 100644 drivers/baseband/acc200/meson.build create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c create mode 100644 drivers/baseband/acc200/version.map -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 01/11] baseband/acc100: refactory to segregate common code 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru @ 2022-09-12 1:08 ` Nic Chautru 2022-09-12 15:19 ` Bruce Richardson 2022-09-12 1:08 ` [PATCH v2 02/11] baseband/acc200: introduce PMD for ACC200 Nic Chautru ` (9 subsequent siblings) 10 siblings, 1 reply; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nic Chautru Refactoring all shareable common code to be used by future PMD (including ACC200 it his serie as well as taking into account following PMDs in roadmap) by gathering such structures or inline methods. Cleaning up the enum files to remove un-used registers definitions. No functionality change. Signed-off-by: Nic Chautru <nicolas.chautru@intel.com> --- app/test-bbdev/test_bbdev_perf.c | 6 +- drivers/baseband/acc100/acc100_pf_enum.h | 939 ------------- drivers/baseband/acc100/acc100_pmd.h | 449 +------ drivers/baseband/acc100/acc101_pmd.h | 10 - drivers/baseband/acc100/acc_common.h | 1388 +++++++++++++++++++ drivers/baseband/acc100/rte_acc100_cfg.h | 70 +- drivers/baseband/acc100/rte_acc100_pmd.c | 1856 ++++++++------------------ drivers/baseband/acc100/rte_acc_common_cfg.h | 101 ++ 8 files changed, 2069 insertions(+), 2750 deletions(-) create mode 100644 drivers/baseband/acc100/acc_common.h create mode 100644 drivers/baseband/acc100/rte_acc_common_cfg.h diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c index 5d07670..af9ceca 100644 --- a/app/test-bbdev/test_bbdev_perf.c +++ b/app/test-bbdev/test_bbdev_perf.c @@ -708,18 +708,18 @@ typedef int (test_case_function)(struct active_device *ad, #ifdef RTE_BASEBAND_ACC100 if ((get_init_device() == true) && (!strcmp(info->drv.driver_name, ACC100PF_DRIVER_NAME))) { - struct rte_acc100_conf conf; + struct rte_acc_conf conf; unsigned int i; printf("Configure ACC100/ACC101 FEC Driver %s with default values\n", info->drv.driver_name); /* clear default configuration before initialization */ - memset(&conf, 0, sizeof(struct rte_acc100_conf)); + memset(&conf, 0, sizeof(struct rte_acc_conf)); /* Always set in PF mode for built-in configuration */ conf.pf_mode_en = true; - for (i = 0; i < RTE_ACC100_NUM_VFS; ++i) { + for (i = 0; i < RTE_ACC_NUM_VFS; ++i) { conf.arb_dl_4g[i].gbr_threshold1 = ACC100_QOS_GBR; conf.arb_dl_4g[i].gbr_threshold1 = ACC100_QOS_GBR; conf.arb_dl_4g[i].round_robin_weight = ACC100_QMGR_RR; diff --git a/drivers/baseband/acc100/acc100_pf_enum.h b/drivers/baseband/acc100/acc100_pf_enum.h index 2fba667..f4e5002 100644 --- a/drivers/baseband/acc100/acc100_pf_enum.h +++ b/drivers/baseband/acc100/acc100_pf_enum.h @@ -14,32 +14,6 @@ enum { HWPfQmgrEgressQueuesTemplate = 0x0007FE00, HWPfQmgrIngressAq = 0x00080000, - HWPfQmgrArbQAvail = 0x00A00010, - HWPfQmgrArbQBlock = 0x00A00014, - HWPfQmgrAqueueDropNotifEn = 0x00A00024, - HWPfQmgrAqueueDisableNotifEn = 0x00A00028, - HWPfQmgrSoftReset = 0x00A00038, - HWPfQmgrInitStatus = 0x00A0003C, - HWPfQmgrAramWatchdogCount = 0x00A00040, - HWPfQmgrAramWatchdogCounterEn = 0x00A00044, - HWPfQmgrAxiWatchdogCount = 0x00A00048, - HWPfQmgrAxiWatchdogCounterEn = 0x00A0004C, - HWPfQmgrProcessWatchdogCount = 0x00A00050, - HWPfQmgrProcessWatchdogCounterEn = 0x00A00054, - HWPfQmgrProcessUl4GWatchdogCounter = 0x00A00058, - HWPfQmgrProcessDl4GWatchdogCounter = 0x00A0005C, - HWPfQmgrProcessUl5GWatchdogCounter = 0x00A00060, - HWPfQmgrProcessDl5GWatchdogCounter = 0x00A00064, - HWPfQmgrProcessMldWatchdogCounter = 0x00A00068, - HWPfQmgrMsiOverflowUpperVf = 0x00A00070, - HWPfQmgrMsiOverflowLowerVf = 0x00A00074, - HWPfQmgrMsiWatchdogOverflow = 0x00A00078, - HWPfQmgrMsiOverflowEnable = 0x00A0007C, - HWPfQmgrDebugAqPointerMemGrp = 0x00A00100, - HWPfQmgrDebugOutputArbQFifoGrp = 0x00A00140, - HWPfQmgrDebugMsiFifoGrp = 0x00A00180, - HWPfQmgrDebugAxiWdTimeoutMsiFifo = 0x00A001C0, - HWPfQmgrDebugProcessWdTimeoutMsiFifo = 0x00A001C4, HWPfQmgrDepthLog2Grp = 0x00A00200, HWPfQmgrTholdGrp = 0x00A00300, HWPfQmgrGrpTmplateReg0Indx = 0x00A00600, @@ -48,85 +22,23 @@ enum { HWPfQmgrGrpTmplateReg3Indx = 0x00A00780, HWPfQmgrGrpTmplateReg4Indx = 0x00A00800, HWPfQmgrVfBaseAddr = 0x00A01000, - HWPfQmgrUl4GWeightRrVf = 0x00A02000, - HWPfQmgrDl4GWeightRrVf = 0x00A02100, - HWPfQmgrUl5GWeightRrVf = 0x00A02200, - HWPfQmgrDl5GWeightRrVf = 0x00A02300, - HWPfQmgrMldWeightRrVf = 0x00A02400, HWPfQmgrArbQDepthGrp = 0x00A02F00, HWPfQmgrGrpFunction0 = 0x00A02F40, - HWPfQmgrGrpFunction1 = 0x00A02F44, HWPfQmgrGrpPriority = 0x00A02F48, - HWPfQmgrWeightSync = 0x00A03000, HWPfQmgrAqEnableVf = 0x00A10000, - HWPfQmgrAqResetVf = 0x00A20000, HWPfQmgrRingSizeVf = 0x00A20004, HWPfQmgrGrpDepthLog20Vf = 0x00A20008, HWPfQmgrGrpDepthLog21Vf = 0x00A2000C, - HWPfQmgrGrpFunction0Vf = 0x00A20010, - HWPfQmgrGrpFunction1Vf = 0x00A20014, HWPfDmaConfig0Reg = 0x00B80000, HWPfDmaConfig1Reg = 0x00B80004, HWPfDmaQmgrAddrReg = 0x00B80008, - HWPfDmaSoftResetReg = 0x00B8000C, HWPfDmaAxcacheReg = 0x00B80010, - HWPfDmaVersionReg = 0x00B80014, - HWPfDmaFrameThreshold = 0x00B80018, - HWPfDmaTimestampLo = 0x00B8001C, - HWPfDmaTimestampHi = 0x00B80020, - HWPfDmaAxiStatus = 0x00B80028, HWPfDmaAxiControl = 0x00B8002C, - HWPfDmaNoQmgr = 0x00B80030, - HWPfDmaQosScale = 0x00B80034, HWPfDmaQmanen = 0x00B80040, - HWPfDmaQmgrQosBase = 0x00B80060, - HWPfDmaFecClkGatingEnable = 0x00B80080, - HWPfDmaPmEnable = 0x00B80084, - HWPfDmaQosEnable = 0x00B80088, - HWPfDmaHarqWeightedRrFrameThreshold = 0x00B800B0, - HWPfDmaDataSmallWeightedRrFrameThresh = 0x00B800B4, - HWPfDmaDataLargeWeightedRrFrameThresh = 0x00B800B8, - HWPfDmaInboundCbMaxSize = 0x00B800BC, HWPfDmaInboundDrainDataSize = 0x00B800C0, HWPfDmaVfDdrBaseRw = 0x00B80400, - HWPfDmaCmplTmOutCnt = 0x00B80800, - HWPfDmaProcTmOutCnt = 0x00B80804, - HWPfDmaStatusRrespBresp = 0x00B80810, - HWPfDmaCfgRrespBresp = 0x00B80814, - HWPfDmaStatusMemParErr = 0x00B80818, - HWPfDmaCfgMemParErrEn = 0x00B8081C, - HWPfDmaStatusDmaHwErr = 0x00B80820, - HWPfDmaCfgDmaHwErrEn = 0x00B80824, - HWPfDmaStatusFecCoreErr = 0x00B80828, - HWPfDmaCfgFecCoreErrEn = 0x00B8082C, - HWPfDmaStatusFcwDescrErr = 0x00B80830, - HWPfDmaCfgFcwDescrErrEn = 0x00B80834, - HWPfDmaStatusBlockTransmit = 0x00B80838, - HWPfDmaBlockOnErrEn = 0x00B8083C, - HWPfDmaStatusFlushDma = 0x00B80840, - HWPfDmaFlushDmaOnErrEn = 0x00B80844, - HWPfDmaStatusSdoneFifoFull = 0x00B80848, - HWPfDmaStatusDescriptorErrLoVf = 0x00B8084C, - HWPfDmaStatusDescriptorErrHiVf = 0x00B80850, - HWPfDmaStatusFcwErrLoVf = 0x00B80854, - HWPfDmaStatusFcwErrHiVf = 0x00B80858, - HWPfDmaStatusDataErrLoVf = 0x00B8085C, - HWPfDmaStatusDataErrHiVf = 0x00B80860, - HWPfDmaCfgMsiEnSoftwareErr = 0x00B80864, HWPfDmaDescriptorSignatuture = 0x00B80868, - HWPfDmaFcwSignature = 0x00B8086C, HWPfDmaErrorDetectionEn = 0x00B80870, - HWPfDmaErrCntrlFifoDebug = 0x00B8087C, - HWPfDmaStatusToutData = 0x00B80880, - HWPfDmaStatusToutDesc = 0x00B80884, - HWPfDmaStatusToutUnexpData = 0x00B80888, - HWPfDmaStatusToutUnexpDesc = 0x00B8088C, - HWPfDmaStatusToutProcess = 0x00B80890, - HWPfDmaConfigCtoutOutDataEn = 0x00B808A0, - HWPfDmaConfigCtoutOutDescrEn = 0x00B808A4, - HWPfDmaConfigUnexpComplDataEn = 0x00B808A8, - HWPfDmaConfigUnexpComplDescrEn = 0x00B808AC, - HWPfDmaConfigPtoutOutEn = 0x00B808B0, HWPfDmaFec5GulDescBaseLoRegVf = 0x00B88020, HWPfDmaFec5GulDescBaseHiRegVf = 0x00B88024, HWPfDmaFec5GulRespPtrLoRegVf = 0x00B88028, @@ -143,414 +55,34 @@ enum { HWPfDmaFec4GdlDescBaseHiRegVf = 0x00B88084, HWPfDmaFec4GdlRespPtrLoRegVf = 0x00B88088, HWPfDmaFec4GdlRespPtrHiRegVf = 0x00B8808C, - HWPfDmaVfDdrBaseRangeRo = 0x00B880A0, - HWPfQosmonACntrlReg = 0x00B90000, HWPfQosmonAEvalOverflow0 = 0x00B90008, - HWPfQosmonAEvalOverflow1 = 0x00B9000C, - HWPfQosmonADivTerm = 0x00B90010, - HWPfQosmonATickTerm = 0x00B90014, - HWPfQosmonAEvalTerm = 0x00B90018, - HWPfQosmonAAveTerm = 0x00B9001C, - HWPfQosmonAForceEccErr = 0x00B90020, - HWPfQosmonAEccErrDetect = 0x00B90024, - HWPfQosmonAIterationConfig0Low = 0x00B90060, - HWPfQosmonAIterationConfig0High = 0x00B90064, - HWPfQosmonAIterationConfig1Low = 0x00B90068, - HWPfQosmonAIterationConfig1High = 0x00B9006C, - HWPfQosmonAIterationConfig2Low = 0x00B90070, - HWPfQosmonAIterationConfig2High = 0x00B90074, - HWPfQosmonAIterationConfig3Low = 0x00B90078, - HWPfQosmonAIterationConfig3High = 0x00B9007C, - HWPfQosmonAEvalMemAddr = 0x00B90080, - HWPfQosmonAEvalMemData = 0x00B90084, - HWPfQosmonAXaction = 0x00B900C0, - HWPfQosmonARemThres1Vf = 0x00B90400, - HWPfQosmonAThres2Vf = 0x00B90404, - HWPfQosmonAWeiFracVf = 0x00B90408, - HWPfQosmonARrWeiVf = 0x00B9040C, HWPfPermonACntrlRegVf = 0x00B98000, - HWPfPermonACountVf = 0x00B98008, - HWPfPermonAKCntLoVf = 0x00B98010, - HWPfPermonAKCntHiVf = 0x00B98014, - HWPfPermonADeltaCntLoVf = 0x00B98020, - HWPfPermonADeltaCntHiVf = 0x00B98024, - HWPfPermonAVersionReg = 0x00B9C000, - HWPfPermonACbControlFec = 0x00B9C0F0, - HWPfPermonADltTimerLoFec = 0x00B9C0F4, - HWPfPermonADltTimerHiFec = 0x00B9C0F8, - HWPfPermonACbCountFec = 0x00B9C100, - HWPfPermonAAccExecTimerLoFec = 0x00B9C104, - HWPfPermonAAccExecTimerHiFec = 0x00B9C108, - HWPfPermonAExecTimerMinFec = 0x00B9C200, - HWPfPermonAExecTimerMaxFec = 0x00B9C204, - HWPfPermonAControlBusMon = 0x00B9C400, - HWPfPermonAConfigBusMon = 0x00B9C404, - HWPfPermonASkipCountBusMon = 0x00B9C408, - HWPfPermonAMinLatBusMon = 0x00B9C40C, - HWPfPermonAMaxLatBusMon = 0x00B9C500, - HWPfPermonATotalLatLowBusMon = 0x00B9C504, - HWPfPermonATotalLatUpperBusMon = 0x00B9C508, - HWPfPermonATotalReqCntBusMon = 0x00B9C50C, - HWPfQosmonBCntrlReg = 0x00BA0000, HWPfQosmonBEvalOverflow0 = 0x00BA0008, - HWPfQosmonBEvalOverflow1 = 0x00BA000C, - HWPfQosmonBDivTerm = 0x00BA0010, - HWPfQosmonBTickTerm = 0x00BA0014, - HWPfQosmonBEvalTerm = 0x00BA0018, - HWPfQosmonBAveTerm = 0x00BA001C, - HWPfQosmonBForceEccErr = 0x00BA0020, - HWPfQosmonBEccErrDetect = 0x00BA0024, - HWPfQosmonBIterationConfig0Low = 0x00BA0060, - HWPfQosmonBIterationConfig0High = 0x00BA0064, - HWPfQosmonBIterationConfig1Low = 0x00BA0068, - HWPfQosmonBIterationConfig1High = 0x00BA006C, - HWPfQosmonBIterationConfig2Low = 0x00BA0070, - HWPfQosmonBIterationConfig2High = 0x00BA0074, - HWPfQosmonBIterationConfig3Low = 0x00BA0078, - HWPfQosmonBIterationConfig3High = 0x00BA007C, - HWPfQosmonBEvalMemAddr = 0x00BA0080, - HWPfQosmonBEvalMemData = 0x00BA0084, - HWPfQosmonBXaction = 0x00BA00C0, - HWPfQosmonBRemThres1Vf = 0x00BA0400, - HWPfQosmonBThres2Vf = 0x00BA0404, - HWPfQosmonBWeiFracVf = 0x00BA0408, - HWPfQosmonBRrWeiVf = 0x00BA040C, HWPfPermonBCntrlRegVf = 0x00BA8000, - HWPfPermonBCountVf = 0x00BA8008, - HWPfPermonBKCntLoVf = 0x00BA8010, - HWPfPermonBKCntHiVf = 0x00BA8014, - HWPfPermonBDeltaCntLoVf = 0x00BA8020, - HWPfPermonBDeltaCntHiVf = 0x00BA8024, - HWPfPermonBVersionReg = 0x00BAC000, - HWPfPermonBCbControlFec = 0x00BAC0F0, - HWPfPermonBDltTimerLoFec = 0x00BAC0F4, - HWPfPermonBDltTimerHiFec = 0x00BAC0F8, - HWPfPermonBCbCountFec = 0x00BAC100, - HWPfPermonBAccExecTimerLoFec = 0x00BAC104, - HWPfPermonBAccExecTimerHiFec = 0x00BAC108, - HWPfPermonBExecTimerMinFec = 0x00BAC200, - HWPfPermonBExecTimerMaxFec = 0x00BAC204, - HWPfPermonBControlBusMon = 0x00BAC400, - HWPfPermonBConfigBusMon = 0x00BAC404, - HWPfPermonBSkipCountBusMon = 0x00BAC408, - HWPfPermonBMinLatBusMon = 0x00BAC40C, - HWPfPermonBMaxLatBusMon = 0x00BAC500, - HWPfPermonBTotalLatLowBusMon = 0x00BAC504, - HWPfPermonBTotalLatUpperBusMon = 0x00BAC508, - HWPfPermonBTotalReqCntBusMon = 0x00BAC50C, - HwPfFabI2MArbCntrlReg = 0x00BB0000, HWPfFabricMode = 0x00BB1000, - HwPfFabI2MGrp0DebugReg = 0x00BBF000, - HwPfFabI2MGrp1DebugReg = 0x00BBF004, - HwPfFabI2MGrp2DebugReg = 0x00BBF008, - HwPfFabI2MGrp3DebugReg = 0x00BBF00C, - HwPfFabI2MBuf0DebugReg = 0x00BBF010, - HwPfFabI2MBuf1DebugReg = 0x00BBF014, - HwPfFabI2MBuf2DebugReg = 0x00BBF018, - HwPfFabI2MBuf3DebugReg = 0x00BBF01C, - HwPfFabM2IBuf0Grp0DebugReg = 0x00BBF020, - HwPfFabM2IBuf1Grp0DebugReg = 0x00BBF024, - HwPfFabM2IBuf0Grp1DebugReg = 0x00BBF028, - HwPfFabM2IBuf1Grp1DebugReg = 0x00BBF02C, - HwPfFabM2IBuf0Grp2DebugReg = 0x00BBF030, - HwPfFabM2IBuf1Grp2DebugReg = 0x00BBF034, - HwPfFabM2IBuf0Grp3DebugReg = 0x00BBF038, - HwPfFabM2IBuf1Grp3DebugReg = 0x00BBF03C, HWPfFecUl5gCntrlReg = 0x00BC0000, - HWPfFecUl5gI2MThreshReg = 0x00BC0004, - HWPfFecUl5gVersionReg = 0x00BC0100, - HWPfFecUl5gFcwStatusReg = 0x00BC0104, - HWPfFecUl5gWarnReg = 0x00BC0108, HwPfFecUl5gIbDebugReg = 0x00BC0200, - HwPfFecUl5gObLlrDebugReg = 0x00BC0204, - HwPfFecUl5gObHarqDebugReg = 0x00BC0208, - HwPfFecUl5g1CntrlReg = 0x00BC1000, - HwPfFecUl5g1I2MThreshReg = 0x00BC1004, - HwPfFecUl5g1VersionReg = 0x00BC1100, - HwPfFecUl5g1FcwStatusReg = 0x00BC1104, - HwPfFecUl5g1WarnReg = 0x00BC1108, - HwPfFecUl5g1IbDebugReg = 0x00BC1200, - HwPfFecUl5g1ObLlrDebugReg = 0x00BC1204, - HwPfFecUl5g1ObHarqDebugReg = 0x00BC1208, - HwPfFecUl5g2CntrlReg = 0x00BC2000, - HwPfFecUl5g2I2MThreshReg = 0x00BC2004, - HwPfFecUl5g2VersionReg = 0x00BC2100, - HwPfFecUl5g2FcwStatusReg = 0x00BC2104, - HwPfFecUl5g2WarnReg = 0x00BC2108, - HwPfFecUl5g2IbDebugReg = 0x00BC2200, - HwPfFecUl5g2ObLlrDebugReg = 0x00BC2204, - HwPfFecUl5g2ObHarqDebugReg = 0x00BC2208, - HwPfFecUl5g3CntrlReg = 0x00BC3000, - HwPfFecUl5g3I2MThreshReg = 0x00BC3004, - HwPfFecUl5g3VersionReg = 0x00BC3100, - HwPfFecUl5g3FcwStatusReg = 0x00BC3104, - HwPfFecUl5g3WarnReg = 0x00BC3108, - HwPfFecUl5g3IbDebugReg = 0x00BC3200, - HwPfFecUl5g3ObLlrDebugReg = 0x00BC3204, - HwPfFecUl5g3ObHarqDebugReg = 0x00BC3208, - HwPfFecUl5g4CntrlReg = 0x00BC4000, - HwPfFecUl5g4I2MThreshReg = 0x00BC4004, - HwPfFecUl5g4VersionReg = 0x00BC4100, - HwPfFecUl5g4FcwStatusReg = 0x00BC4104, - HwPfFecUl5g4WarnReg = 0x00BC4108, - HwPfFecUl5g4IbDebugReg = 0x00BC4200, - HwPfFecUl5g4ObLlrDebugReg = 0x00BC4204, - HwPfFecUl5g4ObHarqDebugReg = 0x00BC4208, - HwPfFecUl5g5CntrlReg = 0x00BC5000, - HwPfFecUl5g5I2MThreshReg = 0x00BC5004, - HwPfFecUl5g5VersionReg = 0x00BC5100, - HwPfFecUl5g5FcwStatusReg = 0x00BC5104, - HwPfFecUl5g5WarnReg = 0x00BC5108, - HwPfFecUl5g5IbDebugReg = 0x00BC5200, - HwPfFecUl5g5ObLlrDebugReg = 0x00BC5204, - HwPfFecUl5g5ObHarqDebugReg = 0x00BC5208, - HwPfFecUl5g6CntrlReg = 0x00BC6000, - HwPfFecUl5g6I2MThreshReg = 0x00BC6004, - HwPfFecUl5g6VersionReg = 0x00BC6100, - HwPfFecUl5g6FcwStatusReg = 0x00BC6104, - HwPfFecUl5g6WarnReg = 0x00BC6108, - HwPfFecUl5g6IbDebugReg = 0x00BC6200, - HwPfFecUl5g6ObLlrDebugReg = 0x00BC6204, - HwPfFecUl5g6ObHarqDebugReg = 0x00BC6208, - HwPfFecUl5g7CntrlReg = 0x00BC7000, - HwPfFecUl5g7I2MThreshReg = 0x00BC7004, - HwPfFecUl5g7VersionReg = 0x00BC7100, - HwPfFecUl5g7FcwStatusReg = 0x00BC7104, - HwPfFecUl5g7WarnReg = 0x00BC7108, - HwPfFecUl5g7IbDebugReg = 0x00BC7200, - HwPfFecUl5g7ObLlrDebugReg = 0x00BC7204, - HwPfFecUl5g7ObHarqDebugReg = 0x00BC7208, - HwPfFecUl5g8CntrlReg = 0x00BC8000, - HwPfFecUl5g8I2MThreshReg = 0x00BC8004, - HwPfFecUl5g8VersionReg = 0x00BC8100, - HwPfFecUl5g8FcwStatusReg = 0x00BC8104, - HwPfFecUl5g8WarnReg = 0x00BC8108, - HwPfFecUl5g8IbDebugReg = 0x00BC8200, - HwPfFecUl5g8ObLlrDebugReg = 0x00BC8204, - HwPfFecUl5g8ObHarqDebugReg = 0x00BC8208, - HWPfFecDl5gCntrlReg = 0x00BCF000, - HWPfFecDl5gI2MThreshReg = 0x00BCF004, - HWPfFecDl5gVersionReg = 0x00BCF100, - HWPfFecDl5gFcwStatusReg = 0x00BCF104, - HWPfFecDl5gWarnReg = 0x00BCF108, - HWPfFecUlVersionReg = 0x00BD0000, - HWPfFecUlControlReg = 0x00BD0004, - HWPfFecUlStatusReg = 0x00BD0008, - HWPfFecDlVersionReg = 0x00BDF000, - HWPfFecDlClusterConfigReg = 0x00BDF004, - HWPfFecDlBurstThres = 0x00BDF00C, - HWPfFecDlClusterStatusReg0 = 0x00BDF040, - HWPfFecDlClusterStatusReg1 = 0x00BDF044, - HWPfFecDlClusterStatusReg2 = 0x00BDF048, - HWPfFecDlClusterStatusReg3 = 0x00BDF04C, - HWPfFecDlClusterStatusReg4 = 0x00BDF050, - HWPfFecDlClusterStatusReg5 = 0x00BDF054, - HWPfChaFabPllPllrst = 0x00C40000, - HWPfChaFabPllClk0 = 0x00C40004, - HWPfChaFabPllClk1 = 0x00C40008, - HWPfChaFabPllBwadj = 0x00C4000C, - HWPfChaFabPllLbw = 0x00C40010, - HWPfChaFabPllResetq = 0x00C40014, - HWPfChaFabPllPhshft0 = 0x00C40018, - HWPfChaFabPllPhshft1 = 0x00C4001C, - HWPfChaFabPllDivq0 = 0x00C40020, - HWPfChaFabPllDivq1 = 0x00C40024, - HWPfChaFabPllDivq2 = 0x00C40028, - HWPfChaFabPllDivq3 = 0x00C4002C, - HWPfChaFabPllDivq4 = 0x00C40030, - HWPfChaFabPllDivq5 = 0x00C40034, - HWPfChaFabPllDivq6 = 0x00C40038, - HWPfChaFabPllDivq7 = 0x00C4003C, - HWPfChaDl5gPllPllrst = 0x00C40080, - HWPfChaDl5gPllClk0 = 0x00C40084, - HWPfChaDl5gPllClk1 = 0x00C40088, - HWPfChaDl5gPllBwadj = 0x00C4008C, - HWPfChaDl5gPllLbw = 0x00C40090, - HWPfChaDl5gPllResetq = 0x00C40094, HWPfChaDl5gPllPhshft0 = 0x00C40098, - HWPfChaDl5gPllPhshft1 = 0x00C4009C, - HWPfChaDl5gPllDivq0 = 0x00C400A0, - HWPfChaDl5gPllDivq1 = 0x00C400A4, - HWPfChaDl5gPllDivq2 = 0x00C400A8, - HWPfChaDl5gPllDivq3 = 0x00C400AC, - HWPfChaDl5gPllDivq4 = 0x00C400B0, - HWPfChaDl5gPllDivq5 = 0x00C400B4, - HWPfChaDl5gPllDivq6 = 0x00C400B8, - HWPfChaDl5gPllDivq7 = 0x00C400BC, - HWPfChaDl4gPllPllrst = 0x00C40100, - HWPfChaDl4gPllClk0 = 0x00C40104, - HWPfChaDl4gPllClk1 = 0x00C40108, - HWPfChaDl4gPllBwadj = 0x00C4010C, - HWPfChaDl4gPllLbw = 0x00C40110, - HWPfChaDl4gPllResetq = 0x00C40114, - HWPfChaDl4gPllPhshft0 = 0x00C40118, - HWPfChaDl4gPllPhshft1 = 0x00C4011C, - HWPfChaDl4gPllDivq0 = 0x00C40120, - HWPfChaDl4gPllDivq1 = 0x00C40124, - HWPfChaDl4gPllDivq2 = 0x00C40128, - HWPfChaDl4gPllDivq3 = 0x00C4012C, - HWPfChaDl4gPllDivq4 = 0x00C40130, - HWPfChaDl4gPllDivq5 = 0x00C40134, - HWPfChaDl4gPllDivq6 = 0x00C40138, - HWPfChaDl4gPllDivq7 = 0x00C4013C, - HWPfChaUl5gPllPllrst = 0x00C40180, - HWPfChaUl5gPllClk0 = 0x00C40184, - HWPfChaUl5gPllClk1 = 0x00C40188, - HWPfChaUl5gPllBwadj = 0x00C4018C, - HWPfChaUl5gPllLbw = 0x00C40190, - HWPfChaUl5gPllResetq = 0x00C40194, - HWPfChaUl5gPllPhshft0 = 0x00C40198, - HWPfChaUl5gPllPhshft1 = 0x00C4019C, - HWPfChaUl5gPllDivq0 = 0x00C401A0, - HWPfChaUl5gPllDivq1 = 0x00C401A4, - HWPfChaUl5gPllDivq2 = 0x00C401A8, - HWPfChaUl5gPllDivq3 = 0x00C401AC, - HWPfChaUl5gPllDivq4 = 0x00C401B0, - HWPfChaUl5gPllDivq5 = 0x00C401B4, - HWPfChaUl5gPllDivq6 = 0x00C401B8, - HWPfChaUl5gPllDivq7 = 0x00C401BC, - HWPfChaUl4gPllPllrst = 0x00C40200, - HWPfChaUl4gPllClk0 = 0x00C40204, - HWPfChaUl4gPllClk1 = 0x00C40208, - HWPfChaUl4gPllBwadj = 0x00C4020C, - HWPfChaUl4gPllLbw = 0x00C40210, - HWPfChaUl4gPllResetq = 0x00C40214, - HWPfChaUl4gPllPhshft0 = 0x00C40218, - HWPfChaUl4gPllPhshft1 = 0x00C4021C, - HWPfChaUl4gPllDivq0 = 0x00C40220, - HWPfChaUl4gPllDivq1 = 0x00C40224, - HWPfChaUl4gPllDivq2 = 0x00C40228, - HWPfChaUl4gPllDivq3 = 0x00C4022C, - HWPfChaUl4gPllDivq4 = 0x00C40230, - HWPfChaUl4gPllDivq5 = 0x00C40234, - HWPfChaUl4gPllDivq6 = 0x00C40238, - HWPfChaUl4gPllDivq7 = 0x00C4023C, - HWPfChaDdrPllPllrst = 0x00C40280, - HWPfChaDdrPllClk0 = 0x00C40284, - HWPfChaDdrPllClk1 = 0x00C40288, - HWPfChaDdrPllBwadj = 0x00C4028C, - HWPfChaDdrPllLbw = 0x00C40290, - HWPfChaDdrPllResetq = 0x00C40294, - HWPfChaDdrPllPhshft0 = 0x00C40298, - HWPfChaDdrPllPhshft1 = 0x00C4029C, - HWPfChaDdrPllDivq0 = 0x00C402A0, - HWPfChaDdrPllDivq1 = 0x00C402A4, - HWPfChaDdrPllDivq2 = 0x00C402A8, - HWPfChaDdrPllDivq3 = 0x00C402AC, - HWPfChaDdrPllDivq4 = 0x00C402B0, - HWPfChaDdrPllDivq5 = 0x00C402B4, - HWPfChaDdrPllDivq6 = 0x00C402B8, - HWPfChaDdrPllDivq7 = 0x00C402BC, - HWPfChaErrStatus = 0x00C40400, - HWPfChaErrMask = 0x00C40404, - HWPfChaDebugPcieMsiFifo = 0x00C40410, - HWPfChaDebugDdrMsiFifo = 0x00C40414, - HWPfChaDebugMiscMsiFifo = 0x00C40418, - HWPfChaPwmSet = 0x00C40420, - HWPfChaDdrRstStatus = 0x00C40430, HWPfChaDdrStDoneStatus = 0x00C40434, HWPfChaDdrWbRstCfg = 0x00C40438, HWPfChaDdrApbRstCfg = 0x00C4043C, HWPfChaDdrPhyRstCfg = 0x00C40440, HWPfChaDdrCpuRstCfg = 0x00C40444, HWPfChaDdrSifRstCfg = 0x00C40448, - HWPfChaPadcfgPcomp0 = 0x00C41000, - HWPfChaPadcfgNcomp0 = 0x00C41004, - HWPfChaPadcfgOdt0 = 0x00C41008, - HWPfChaPadcfgProtect0 = 0x00C4100C, - HWPfChaPreemphasisProtect0 = 0x00C41010, - HWPfChaPreemphasisCompen0 = 0x00C41040, - HWPfChaPreemphasisOdten0 = 0x00C41044, - HWPfChaPadcfgPcomp1 = 0x00C41100, - HWPfChaPadcfgNcomp1 = 0x00C41104, - HWPfChaPadcfgOdt1 = 0x00C41108, - HWPfChaPadcfgProtect1 = 0x00C4110C, - HWPfChaPreemphasisProtect1 = 0x00C41110, - HWPfChaPreemphasisCompen1 = 0x00C41140, - HWPfChaPreemphasisOdten1 = 0x00C41144, - HWPfChaPadcfgPcomp2 = 0x00C41200, - HWPfChaPadcfgNcomp2 = 0x00C41204, - HWPfChaPadcfgOdt2 = 0x00C41208, - HWPfChaPadcfgProtect2 = 0x00C4120C, - HWPfChaPreemphasisProtect2 = 0x00C41210, - HWPfChaPreemphasisCompen2 = 0x00C41240, - HWPfChaPreemphasisOdten4 = 0x00C41444, - HWPfChaPreemphasisOdten2 = 0x00C41244, - HWPfChaPadcfgPcomp3 = 0x00C41300, - HWPfChaPadcfgNcomp3 = 0x00C41304, - HWPfChaPadcfgOdt3 = 0x00C41308, - HWPfChaPadcfgProtect3 = 0x00C4130C, - HWPfChaPreemphasisProtect3 = 0x00C41310, - HWPfChaPreemphasisCompen3 = 0x00C41340, - HWPfChaPreemphasisOdten3 = 0x00C41344, - HWPfChaPadcfgPcomp4 = 0x00C41400, - HWPfChaPadcfgNcomp4 = 0x00C41404, - HWPfChaPadcfgOdt4 = 0x00C41408, - HWPfChaPadcfgProtect4 = 0x00C4140C, - HWPfChaPreemphasisProtect4 = 0x00C41410, - HWPfChaPreemphasisCompen4 = 0x00C41440, - HWPfHiVfToPfDbellVf = 0x00C80000, - HWPfHiPfToVfDbellVf = 0x00C80008, - HWPfHiInfoRingBaseLoVf = 0x00C80010, - HWPfHiInfoRingBaseHiVf = 0x00C80014, - HWPfHiInfoRingPointerVf = 0x00C80018, - HWPfHiInfoRingIntWrEnVf = 0x00C80020, - HWPfHiInfoRingPf2VfWrEnVf = 0x00C80024, - HWPfHiMsixVectorMapperVf = 0x00C80060, - HWPfHiModuleVersionReg = 0x00C84000, - HWPfHiIosf2axiErrLogReg = 0x00C84004, - HWPfHiHardResetReg = 0x00C84008, HWPfHi5GHardResetReg = 0x00C8400C, HWPfHiInfoRingBaseLoRegPf = 0x00C84010, HWPfHiInfoRingBaseHiRegPf = 0x00C84014, HWPfHiInfoRingPointerRegPf = 0x00C84018, HWPfHiInfoRingIntWrEnRegPf = 0x00C84020, HWPfHiInfoRingVf2pfLoWrEnReg = 0x00C84024, - HWPfHiInfoRingVf2pfHiWrEnReg = 0x00C84028, - HWPfHiLogParityErrStatusReg = 0x00C8402C, - HWPfHiLogDataParityErrorVfStatusLo = 0x00C84030, - HWPfHiLogDataParityErrorVfStatusHi = 0x00C84034, HWPfHiBlockTransmitOnErrorEn = 0x00C84038, HWPfHiCfgMsiIntWrEnRegPf = 0x00C84040, HWPfHiCfgMsiVf2pfLoWrEnReg = 0x00C84044, - HWPfHiCfgMsiVf2pfHighWrEnReg = 0x00C84048, - HWPfHiMsixVectorMapperPf = 0x00C84060, - HWPfHiApbWrWaitTime = 0x00C84100, - HWPfHiXCounterMaxValue = 0x00C84104, HWPfHiPfMode = 0x00C84108, HWPfHiClkGateHystReg = 0x00C8410C, - HWPfHiSnoopBitsReg = 0x00C84110, HWPfHiMsiDropEnableReg = 0x00C84114, - HWPfHiMsiStatReg = 0x00C84120, - HWPfHiFifoOflStatReg = 0x00C84124, - HWPfHiHiDebugReg = 0x00C841F4, - HWPfHiDebugMemSnoopMsiFifo = 0x00C841F8, - HWPfHiDebugMemSnoopInputFifo = 0x00C841FC, - HWPfHiMsixMappingConfig = 0x00C84200, - HWPfHiJunkReg = 0x00C8FF00, - HWPfDdrUmmcVer = 0x00D00000, - HWPfDdrUmmcCap = 0x00D00010, HWPfDdrUmmcCtrl = 0x00D00020, - HWPfDdrMpcPe = 0x00D00080, - HWPfDdrMpcPpri3 = 0x00D00090, - HWPfDdrMpcPpri2 = 0x00D000A0, - HWPfDdrMpcPpri1 = 0x00D000B0, - HWPfDdrMpcPpri0 = 0x00D000C0, - HWPfDdrMpcPrwgrpCtrl = 0x00D000D0, - HWPfDdrMpcPbw7 = 0x00D000E0, - HWPfDdrMpcPbw6 = 0x00D000F0, - HWPfDdrMpcPbw5 = 0x00D00100, - HWPfDdrMpcPbw4 = 0x00D00110, - HWPfDdrMpcPbw3 = 0x00D00120, - HWPfDdrMpcPbw2 = 0x00D00130, - HWPfDdrMpcPbw1 = 0x00D00140, - HWPfDdrMpcPbw0 = 0x00D00150, - HWPfDdrMemoryInit = 0x00D00200, - HWPfDdrMemoryInitDone = 0x00D00210, HWPfDdrMemInitPhyTrng0 = 0x00D00240, - HWPfDdrMemInitPhyTrng1 = 0x00D00250, - HWPfDdrMemInitPhyTrng2 = 0x00D00260, - HWPfDdrMemInitPhyTrng3 = 0x00D00270, HWPfDdrBcDram = 0x00D003C0, HWPfDdrBcAddrMap = 0x00D003D0, HWPfDdrBcRef = 0x00D003E0, @@ -565,502 +97,31 @@ enum { HWPfDdrBcTim8 = 0x00D00480, HWPfDdrBcTim9 = 0x00D00490, HWPfDdrBcTim10 = 0x00D004A0, - HWPfDdrBcTim12 = 0x00D004C0, HWPfDdrDfiInit = 0x00D004D0, - HWPfDdrDfiInitComplete = 0x00D004E0, HWPfDdrDfiTim0 = 0x00D004F0, HWPfDdrDfiTim1 = 0x00D00500, HWPfDdrDfiPhyUpdEn = 0x00D00530, - HWPfDdrMemStatus = 0x00D00540, - HWPfDdrUmmcErrStatus = 0x00D00550, - HWPfDdrUmmcIntStatus = 0x00D00560, HWPfDdrUmmcIntEn = 0x00D00570, HWPfDdrPhyRdLatency = 0x00D48400, HWPfDdrPhyRdLatencyDbi = 0x00D48410, HWPfDdrPhyWrLatency = 0x00D48420, HWPfDdrPhyTrngType = 0x00D48430, - HWPfDdrPhyMrsTiming2 = 0x00D48440, - HWPfDdrPhyMrsTiming0 = 0x00D48450, - HWPfDdrPhyMrsTiming1 = 0x00D48460, - HWPfDdrPhyDramTmrd = 0x00D48470, - HWPfDdrPhyDramTmod = 0x00D48480, - HWPfDdrPhyDramTwpre = 0x00D48490, - HWPfDdrPhyDramTrfc = 0x00D484A0, - HWPfDdrPhyDramTrwtp = 0x00D484B0, HWPfDdrPhyMr01Dimm = 0x00D484C0, HWPfDdrPhyMr01DimmDbi = 0x00D484D0, HWPfDdrPhyMr23Dimm = 0x00D484E0, HWPfDdrPhyMr45Dimm = 0x00D484F0, HWPfDdrPhyMr67Dimm = 0x00D48500, HWPfDdrPhyWrlvlWwRdlvlRr = 0x00D48510, - HWPfDdrPhyOdtEn = 0x00D48520, - HWPfDdrPhyFastTrng = 0x00D48530, - HWPfDdrPhyDynTrngGap = 0x00D48540, - HWPfDdrPhyDynRcalGap = 0x00D48550, HWPfDdrPhyIdletimeout = 0x00D48560, - HWPfDdrPhyRstCkeGap = 0x00D48570, - HWPfDdrPhyCkeMrsGap = 0x00D48580, - HWPfDdrPhyMemVrefMidVal = 0x00D48590, - HWPfDdrPhyVrefStep = 0x00D485A0, - HWPfDdrPhyVrefThreshold = 0x00D485B0, - HWPfDdrPhyPhyVrefMidVal = 0x00D485C0, HWPfDdrPhyDqsCountMax = 0x00D485D0, HWPfDdrPhyDqsCountNum = 0x00D485E0, - HWPfDdrPhyDramRow = 0x00D485F0, - HWPfDdrPhyDramCol = 0x00D48600, - HWPfDdrPhyDramBgBa = 0x00D48610, - HWPfDdrPhyDynamicUpdreqrel = 0x00D48620, - HWPfDdrPhyVrefLimits = 0x00D48630, - HWPfDdrPhyIdtmTcStatus = 0x00D6C020, HWPfDdrPhyIdtmFwVersion = 0x00D6C410, - HWPfDdrPhyRdlvlGateInitDelay = 0x00D70000, - HWPfDdrPhyRdenSmplabc = 0x00D70008, - HWPfDdrPhyVrefNibble0 = 0x00D7000C, - HWPfDdrPhyVrefNibble1 = 0x00D70010, - HWPfDdrPhyRdlvlGateDqsSmpl0 = 0x00D70014, - HWPfDdrPhyRdlvlGateDqsSmpl1 = 0x00D70018, - HWPfDdrPhyRdlvlGateDqsSmpl2 = 0x00D7001C, HWPfDdrPhyDqsCount = 0x00D70020, - HWPfDdrPhyWrlvlRdlvlGateStatus = 0x00D70024, - HWPfDdrPhyErrorFlags = 0x00D70028, - HWPfDdrPhyPowerDown = 0x00D70030, - HWPfDdrPhyPrbsSeedByte0 = 0x00D70034, - HWPfDdrPhyPrbsSeedByte1 = 0x00D70038, - HWPfDdrPhyPcompDq = 0x00D70040, - HWPfDdrPhyNcompDq = 0x00D70044, - HWPfDdrPhyPcompDqs = 0x00D70048, - HWPfDdrPhyNcompDqs = 0x00D7004C, - HWPfDdrPhyPcompCmd = 0x00D70050, - HWPfDdrPhyNcompCmd = 0x00D70054, - HWPfDdrPhyPcompCk = 0x00D70058, - HWPfDdrPhyNcompCk = 0x00D7005C, - HWPfDdrPhyRcalOdtDq = 0x00D70060, - HWPfDdrPhyRcalOdtDqs = 0x00D70064, - HWPfDdrPhyRcalMask1 = 0x00D70068, - HWPfDdrPhyRcalMask2 = 0x00D7006C, - HWPfDdrPhyRcalCtrl = 0x00D70070, - HWPfDdrPhyRcalCnt = 0x00D70074, - HWPfDdrPhyRcalOverride = 0x00D70078, - HWPfDdrPhyRcalGateen = 0x00D7007C, - HWPfDdrPhyCtrl = 0x00D70080, - HWPfDdrPhyWrlvlAlg = 0x00D70084, - HWPfDdrPhyRcalVreftTxcmdOdt = 0x00D70088, - HWPfDdrPhyRdlvlGateParam = 0x00D7008C, - HWPfDdrPhyRdlvlGateParam2 = 0x00D70090, - HWPfDdrPhyRcalVreftTxdata = 0x00D70094, - HWPfDdrPhyCmdIntDelay = 0x00D700A4, - HWPfDdrPhyAlertN = 0x00D700A8, - HWPfDdrPhyTrngReqWpre2tck = 0x00D700AC, - HWPfDdrPhyCmdPhaseSel = 0x00D700B4, - HWPfDdrPhyCmdDcdl = 0x00D700B8, - HWPfDdrPhyCkDcdl = 0x00D700BC, - HWPfDdrPhySwTrngCtrl1 = 0x00D700C0, - HWPfDdrPhySwTrngCtrl2 = 0x00D700C4, - HWPfDdrPhyRcalPcompRden = 0x00D700C8, - HWPfDdrPhyRcalNcompRden = 0x00D700CC, - HWPfDdrPhyRcalCompen = 0x00D700D0, - HWPfDdrPhySwTrngRdqs = 0x00D700D4, - HWPfDdrPhySwTrngWdqs = 0x00D700D8, - HWPfDdrPhySwTrngRdena = 0x00D700DC, - HWPfDdrPhySwTrngRdenb = 0x00D700E0, - HWPfDdrPhySwTrngRdenc = 0x00D700E4, - HWPfDdrPhySwTrngWdq = 0x00D700E8, - HWPfDdrPhySwTrngRdq = 0x00D700EC, - HWPfDdrPhyPcfgHmValue = 0x00D700F0, - HWPfDdrPhyPcfgTimerValue = 0x00D700F4, - HWPfDdrPhyPcfgSoftwareTraining = 0x00D700F8, - HWPfDdrPhyPcfgMcStatus = 0x00D700FC, - HWPfDdrPhyWrlvlPhRank0 = 0x00D70100, - HWPfDdrPhyRdenPhRank0 = 0x00D70104, - HWPfDdrPhyRdenIntRank0 = 0x00D70108, - HWPfDdrPhyRdqsDcdlRank0 = 0x00D7010C, - HWPfDdrPhyRdqsShadowDcdlRank0 = 0x00D70110, - HWPfDdrPhyWdqsDcdlRank0 = 0x00D70114, - HWPfDdrPhyWdmDcdlShadowRank0 = 0x00D70118, - HWPfDdrPhyWdmDcdlRank0 = 0x00D7011C, - HWPfDdrPhyDbiDcdlRank0 = 0x00D70120, - HWPfDdrPhyRdenDcdlaRank0 = 0x00D70124, - HWPfDdrPhyDbiDcdlShadowRank0 = 0x00D70128, - HWPfDdrPhyRdenDcdlbRank0 = 0x00D7012C, - HWPfDdrPhyWdqsShadowDcdlRank0 = 0x00D70130, - HWPfDdrPhyRdenDcdlcRank0 = 0x00D70134, - HWPfDdrPhyRdenShadowDcdlaRank0 = 0x00D70138, - HWPfDdrPhyWrlvlIntRank0 = 0x00D7013C, - HWPfDdrPhyRdqDcdlBit0Rank0 = 0x00D70200, - HWPfDdrPhyRdqDcdlShadowBit0Rank0 = 0x00D70204, - HWPfDdrPhyWdqDcdlBit0Rank0 = 0x00D70208, - HWPfDdrPhyWdqDcdlShadowBit0Rank0 = 0x00D7020C, - HWPfDdrPhyRdqDcdlBit1Rank0 = 0x00D70240, - HWPfDdrPhyRdqDcdlShadowBit1Rank0 = 0x00D70244, - HWPfDdrPhyWdqDcdlBit1Rank0 = 0x00D70248, - HWPfDdrPhyWdqDcdlShadowBit1Rank0 = 0x00D7024C, - HWPfDdrPhyRdqDcdlBit2Rank0 = 0x00D70280, - HWPfDdrPhyRdqDcdlShadowBit2Rank0 = 0x00D70284, - HWPfDdrPhyWdqDcdlBit2Rank0 = 0x00D70288, - HWPfDdrPhyWdqDcdlShadowBit2Rank0 = 0x00D7028C, - HWPfDdrPhyRdqDcdlBit3Rank0 = 0x00D702C0, - HWPfDdrPhyRdqDcdlShadowBit3Rank0 = 0x00D702C4, - HWPfDdrPhyWdqDcdlBit3Rank0 = 0x00D702C8, - HWPfDdrPhyWdqDcdlShadowBit3Rank0 = 0x00D702CC, - HWPfDdrPhyRdqDcdlBit4Rank0 = 0x00D70300, - HWPfDdrPhyRdqDcdlShadowBit4Rank0 = 0x00D70304, - HWPfDdrPhyWdqDcdlBit4Rank0 = 0x00D70308, - HWPfDdrPhyWdqDcdlShadowBit4Rank0 = 0x00D7030C, - HWPfDdrPhyRdqDcdlBit5Rank0 = 0x00D70340, - HWPfDdrPhyRdqDcdlShadowBit5Rank0 = 0x00D70344, - HWPfDdrPhyWdqDcdlBit5Rank0 = 0x00D70348, - HWPfDdrPhyWdqDcdlShadowBit5Rank0 = 0x00D7034C, - HWPfDdrPhyRdqDcdlBit6Rank0 = 0x00D70380, - HWPfDdrPhyRdqDcdlShadowBit6Rank0 = 0x00D70384, - HWPfDdrPhyWdqDcdlBit6Rank0 = 0x00D70388, - HWPfDdrPhyWdqDcdlShadowBit6Rank0 = 0x00D7038C, - HWPfDdrPhyRdqDcdlBit7Rank0 = 0x00D703C0, - HWPfDdrPhyRdqDcdlShadowBit7Rank0 = 0x00D703C4, - HWPfDdrPhyWdqDcdlBit7Rank0 = 0x00D703C8, - HWPfDdrPhyWdqDcdlShadowBit7Rank0 = 0x00D703CC, - HWPfDdrPhyIdtmStatus = 0x00D740D0, - HWPfDdrPhyIdtmError = 0x00D74110, - HWPfDdrPhyIdtmDebug = 0x00D74120, - HWPfDdrPhyIdtmDebugInt = 0x00D74130, - HwPfPcieLnAsicCfgovr = 0x00D80000, - HwPfPcieLnAclkmixer = 0x00D80004, - HwPfPcieLnTxrampfreq = 0x00D80008, - HwPfPcieLnLanetest = 0x00D8000C, - HwPfPcieLnDcctrl = 0x00D80010, - HwPfPcieLnDccmeas = 0x00D80014, - HwPfPcieLnDccovrAclk = 0x00D80018, - HwPfPcieLnDccovrTxa = 0x00D8001C, - HwPfPcieLnDccovrTxk = 0x00D80020, - HwPfPcieLnDccovrDclk = 0x00D80024, - HwPfPcieLnDccovrEclk = 0x00D80028, - HwPfPcieLnDcctrimAclk = 0x00D8002C, - HwPfPcieLnDcctrimTx = 0x00D80030, - HwPfPcieLnDcctrimDclk = 0x00D80034, - HwPfPcieLnDcctrimEclk = 0x00D80038, - HwPfPcieLnQuadCtrl = 0x00D8003C, - HwPfPcieLnQuadCorrIndex = 0x00D80040, - HwPfPcieLnQuadCorrStatus = 0x00D80044, - HwPfPcieLnAsicRxovr1 = 0x00D80048, - HwPfPcieLnAsicRxovr2 = 0x00D8004C, - HwPfPcieLnAsicEqinfovr = 0x00D80050, - HwPfPcieLnRxcsr = 0x00D80054, - HwPfPcieLnRxfectrl = 0x00D80058, - HwPfPcieLnRxtest = 0x00D8005C, - HwPfPcieLnEscount = 0x00D80060, - HwPfPcieLnCdrctrl = 0x00D80064, - HwPfPcieLnCdrctrl2 = 0x00D80068, - HwPfPcieLnCdrcfg0Ctrl0 = 0x00D8006C, - HwPfPcieLnCdrcfg0Ctrl1 = 0x00D80070, - HwPfPcieLnCdrcfg0Ctrl2 = 0x00D80074, - HwPfPcieLnCdrcfg1Ctrl0 = 0x00D80078, - HwPfPcieLnCdrcfg1Ctrl1 = 0x00D8007C, - HwPfPcieLnCdrcfg1Ctrl2 = 0x00D80080, - HwPfPcieLnCdrcfg2Ctrl0 = 0x00D80084, - HwPfPcieLnCdrcfg2Ctrl1 = 0x00D80088, - HwPfPcieLnCdrcfg2Ctrl2 = 0x00D8008C, - HwPfPcieLnCdrcfg3Ctrl0 = 0x00D80090, - HwPfPcieLnCdrcfg3Ctrl1 = 0x00D80094, - HwPfPcieLnCdrcfg3Ctrl2 = 0x00D80098, - HwPfPcieLnCdrphase = 0x00D8009C, - HwPfPcieLnCdrfreq = 0x00D800A0, - HwPfPcieLnCdrstatusPhase = 0x00D800A4, - HwPfPcieLnCdrstatusFreq = 0x00D800A8, - HwPfPcieLnCdroffset = 0x00D800AC, - HwPfPcieLnRxvosctl = 0x00D800B0, - HwPfPcieLnRxvosctl2 = 0x00D800B4, - HwPfPcieLnRxlosctl = 0x00D800B8, - HwPfPcieLnRxlos = 0x00D800BC, - HwPfPcieLnRxlosvval = 0x00D800C0, - HwPfPcieLnRxvosd0 = 0x00D800C4, - HwPfPcieLnRxvosd1 = 0x00D800C8, - HwPfPcieLnRxvosep0 = 0x00D800CC, - HwPfPcieLnRxvosep1 = 0x00D800D0, - HwPfPcieLnRxvosen0 = 0x00D800D4, - HwPfPcieLnRxvosen1 = 0x00D800D8, - HwPfPcieLnRxvosafe = 0x00D800DC, - HwPfPcieLnRxvosa0 = 0x00D800E0, - HwPfPcieLnRxvosa0Out = 0x00D800E4, - HwPfPcieLnRxvosa1 = 0x00D800E8, - HwPfPcieLnRxvosa1Out = 0x00D800EC, - HwPfPcieLnRxmisc = 0x00D800F0, - HwPfPcieLnRxbeacon = 0x00D800F4, - HwPfPcieLnRxdssout = 0x00D800F8, - HwPfPcieLnRxdssout2 = 0x00D800FC, - HwPfPcieLnAlphapctrl = 0x00D80100, - HwPfPcieLnAlphanctrl = 0x00D80104, HwPfPcieLnAdaptctrl = 0x00D80108, - HwPfPcieLnAdaptctrl1 = 0x00D8010C, - HwPfPcieLnAdaptstatus = 0x00D80110, - HwPfPcieLnAdaptvga1 = 0x00D80114, - HwPfPcieLnAdaptvga2 = 0x00D80118, - HwPfPcieLnAdaptvga3 = 0x00D8011C, - HwPfPcieLnAdaptvga4 = 0x00D80120, - HwPfPcieLnAdaptboost1 = 0x00D80124, - HwPfPcieLnAdaptboost2 = 0x00D80128, - HwPfPcieLnAdaptboost3 = 0x00D8012C, - HwPfPcieLnAdaptboost4 = 0x00D80130, - HwPfPcieLnAdaptsslms1 = 0x00D80134, - HwPfPcieLnAdaptsslms2 = 0x00D80138, - HwPfPcieLnAdaptvgaStatus = 0x00D8013C, - HwPfPcieLnAdaptboostStatus = 0x00D80140, - HwPfPcieLnAdaptsslmsStatus1 = 0x00D80144, - HwPfPcieLnAdaptsslmsStatus2 = 0x00D80148, - HwPfPcieLnAfectrl1 = 0x00D8014C, - HwPfPcieLnAfectrl2 = 0x00D80150, - HwPfPcieLnAfectrl3 = 0x00D80154, - HwPfPcieLnAfedefault1 = 0x00D80158, - HwPfPcieLnAfedefault2 = 0x00D8015C, - HwPfPcieLnDfectrl1 = 0x00D80160, - HwPfPcieLnDfectrl2 = 0x00D80164, - HwPfPcieLnDfectrl3 = 0x00D80168, - HwPfPcieLnDfectrl4 = 0x00D8016C, - HwPfPcieLnDfectrl5 = 0x00D80170, - HwPfPcieLnDfectrl6 = 0x00D80174, - HwPfPcieLnAfestatus1 = 0x00D80178, - HwPfPcieLnAfestatus2 = 0x00D8017C, - HwPfPcieLnDfestatus1 = 0x00D80180, - HwPfPcieLnDfestatus2 = 0x00D80184, - HwPfPcieLnDfestatus3 = 0x00D80188, - HwPfPcieLnDfestatus4 = 0x00D8018C, - HwPfPcieLnDfestatus5 = 0x00D80190, - HwPfPcieLnAlphastatus = 0x00D80194, - HwPfPcieLnFomctrl1 = 0x00D80198, - HwPfPcieLnFomctrl2 = 0x00D8019C, - HwPfPcieLnFomctrl3 = 0x00D801A0, - HwPfPcieLnAclkcalStatus = 0x00D801A4, - HwPfPcieLnOffscorrStatus = 0x00D801A8, - HwPfPcieLnEyewidthStatus = 0x00D801AC, - HwPfPcieLnEyeheightStatus = 0x00D801B0, - HwPfPcieLnAsicTxovr1 = 0x00D801B4, - HwPfPcieLnAsicTxovr2 = 0x00D801B8, - HwPfPcieLnAsicTxovr3 = 0x00D801BC, - HwPfPcieLnTxbiasadjOvr = 0x00D801C0, - HwPfPcieLnTxcsr = 0x00D801C4, - HwPfPcieLnTxtest = 0x00D801C8, - HwPfPcieLnTxtestword = 0x00D801CC, - HwPfPcieLnTxtestwordHigh = 0x00D801D0, - HwPfPcieLnTxdrive = 0x00D801D4, - HwPfPcieLnMtcsLn = 0x00D801D8, - HwPfPcieLnStatsumLn = 0x00D801DC, - HwPfPcieLnRcbusScratch = 0x00D801E0, - HwPfPcieLnRcbusMinorrev = 0x00D801F0, - HwPfPcieLnRcbusMajorrev = 0x00D801F4, - HwPfPcieLnRcbusBlocktype = 0x00D801F8, - HwPfPcieSupPllcsr = 0x00D80800, - HwPfPcieSupPlldiv = 0x00D80804, - HwPfPcieSupPllcal = 0x00D80808, - HwPfPcieSupPllcalsts = 0x00D8080C, - HwPfPcieSupPllmeas = 0x00D80810, - HwPfPcieSupPlldactrim = 0x00D80814, - HwPfPcieSupPllbiastrim = 0x00D80818, - HwPfPcieSupPllbwtrim = 0x00D8081C, - HwPfPcieSupPllcaldly = 0x00D80820, - HwPfPcieSupRefclkonpclkctrl = 0x00D80824, - HwPfPcieSupPclkdelay = 0x00D80828, - HwPfPcieSupPhyconfig = 0x00D8082C, - HwPfPcieSupRcalIntf = 0x00D80830, - HwPfPcieSupAuxcsr = 0x00D80834, - HwPfPcieSupVref = 0x00D80838, - HwPfPcieSupLinkmode = 0x00D8083C, - HwPfPcieSupRrefcalctl = 0x00D80840, - HwPfPcieSupRrefcal = 0x00D80844, - HwPfPcieSupRrefcaldly = 0x00D80848, - HwPfPcieSupTximpcalctl = 0x00D8084C, - HwPfPcieSupTximpcal = 0x00D80850, - HwPfPcieSupTximpoffset = 0x00D80854, - HwPfPcieSupTximpcaldly = 0x00D80858, - HwPfPcieSupRximpcalctl = 0x00D8085C, - HwPfPcieSupRximpcal = 0x00D80860, - HwPfPcieSupRximpoffset = 0x00D80864, - HwPfPcieSupRximpcaldly = 0x00D80868, - HwPfPcieSupFence = 0x00D8086C, - HwPfPcieSupMtcs = 0x00D80870, - HwPfPcieSupStatsum = 0x00D809B8, - HwPfPciePcsDpStatus0 = 0x00D81000, - HwPfPciePcsDpControl0 = 0x00D81004, - HwPfPciePcsPmaStatusLane0 = 0x00D81008, - HwPfPciePcsPipeStatusLane0 = 0x00D8100C, - HwPfPciePcsTxdeemph0Lane0 = 0x00D81010, - HwPfPciePcsTxdeemph1Lane0 = 0x00D81014, - HwPfPciePcsInternalStatusLane0 = 0x00D81018, - HwPfPciePcsDpStatus1 = 0x00D8101C, - HwPfPciePcsDpControl1 = 0x00D81020, - HwPfPciePcsPmaStatusLane1 = 0x00D81024, - HwPfPciePcsPipeStatusLane1 = 0x00D81028, - HwPfPciePcsTxdeemph0Lane1 = 0x00D8102C, - HwPfPciePcsTxdeemph1Lane1 = 0x00D81030, - HwPfPciePcsInternalStatusLane1 = 0x00D81034, - HwPfPciePcsDpStatus2 = 0x00D81038, - HwPfPciePcsDpControl2 = 0x00D8103C, - HwPfPciePcsPmaStatusLane2 = 0x00D81040, - HwPfPciePcsPipeStatusLane2 = 0x00D81044, - HwPfPciePcsTxdeemph0Lane2 = 0x00D81048, - HwPfPciePcsTxdeemph1Lane2 = 0x00D8104C, - HwPfPciePcsInternalStatusLane2 = 0x00D81050, - HwPfPciePcsDpStatus3 = 0x00D81054, - HwPfPciePcsDpControl3 = 0x00D81058, - HwPfPciePcsPmaStatusLane3 = 0x00D8105C, - HwPfPciePcsPipeStatusLane3 = 0x00D81060, - HwPfPciePcsTxdeemph0Lane3 = 0x00D81064, - HwPfPciePcsTxdeemph1Lane3 = 0x00D81068, - HwPfPciePcsInternalStatusLane3 = 0x00D8106C, - HwPfPciePcsEbStatus0 = 0x00D81070, - HwPfPciePcsEbStatus1 = 0x00D81074, - HwPfPciePcsEbStatus2 = 0x00D81078, - HwPfPciePcsEbStatus3 = 0x00D8107C, - HwPfPciePcsPllSettingPcieG1 = 0x00D81088, - HwPfPciePcsPllSettingPcieG2 = 0x00D8108C, - HwPfPciePcsPllSettingPcieG3 = 0x00D81090, - HwPfPciePcsControl = 0x00D81094, HwPfPciePcsEqControl = 0x00D81098, - HwPfPciePcsEqTimer = 0x00D8109C, - HwPfPciePcsEqErrStatus = 0x00D810A0, - HwPfPciePcsEqErrCount = 0x00D810A4, - HwPfPciePcsStatus = 0x00D810A8, - HwPfPciePcsMiscRegister = 0x00D810AC, - HwPfPciePcsObsControl = 0x00D810B0, - HwPfPciePcsPrbsCount0 = 0x00D81200, - HwPfPciePcsBistControl0 = 0x00D81204, - HwPfPciePcsBistStaticWord00 = 0x00D81208, - HwPfPciePcsBistStaticWord10 = 0x00D8120C, - HwPfPciePcsBistStaticWord20 = 0x00D81210, - HwPfPciePcsBistStaticWord30 = 0x00D81214, - HwPfPciePcsPrbsCount1 = 0x00D81220, - HwPfPciePcsBistControl1 = 0x00D81224, - HwPfPciePcsBistStaticWord01 = 0x00D81228, - HwPfPciePcsBistStaticWord11 = 0x00D8122C, - HwPfPciePcsBistStaticWord21 = 0x00D81230, - HwPfPciePcsBistStaticWord31 = 0x00D81234, - HwPfPciePcsPrbsCount2 = 0x00D81240, - HwPfPciePcsBistControl2 = 0x00D81244, - HwPfPciePcsBistStaticWord02 = 0x00D81248, - HwPfPciePcsBistStaticWord12 = 0x00D8124C, - HwPfPciePcsBistStaticWord22 = 0x00D81250, - HwPfPciePcsBistStaticWord32 = 0x00D81254, - HwPfPciePcsPrbsCount3 = 0x00D81260, - HwPfPciePcsBistControl3 = 0x00D81264, - HwPfPciePcsBistStaticWord03 = 0x00D81268, - HwPfPciePcsBistStaticWord13 = 0x00D8126C, - HwPfPciePcsBistStaticWord23 = 0x00D81270, - HwPfPciePcsBistStaticWord33 = 0x00D81274, - HwPfPcieGpexLtssmStateCntrl = 0x00D90400, - HwPfPcieGpexLtssmStateStatus = 0x00D90404, - HwPfPcieGpexSkipFreqTimer = 0x00D90408, - HwPfPcieGpexLaneSelect = 0x00D9040C, - HwPfPcieGpexLaneDeskew = 0x00D90410, - HwPfPcieGpexRxErrorStatus = 0x00D90414, - HwPfPcieGpexLaneNumControl = 0x00D90418, - HwPfPcieGpexNFstControl = 0x00D9041C, - HwPfPcieGpexLinkStatus = 0x00D90420, - HwPfPcieGpexAckReplayTimeout = 0x00D90438, - HwPfPcieGpexSeqNumberStatus = 0x00D9043C, - HwPfPcieGpexCoreClkRatio = 0x00D90440, - HwPfPcieGpexDllTholdControl = 0x00D90448, - HwPfPcieGpexPmTimer = 0x00D90450, - HwPfPcieGpexPmeTimeout = 0x00D90454, - HwPfPcieGpexAspmL1Timer = 0x00D90458, - HwPfPcieGpexAspmReqTimer = 0x00D9045C, - HwPfPcieGpexAspmL1Dis = 0x00D90460, - HwPfPcieGpexAdvisoryErrorControl = 0x00D90468, - HwPfPcieGpexId = 0x00D90470, - HwPfPcieGpexClasscode = 0x00D90474, - HwPfPcieGpexSubsystemId = 0x00D90478, - HwPfPcieGpexDeviceCapabilities = 0x00D9047C, - HwPfPcieGpexLinkCapabilities = 0x00D90480, - HwPfPcieGpexFunctionNumber = 0x00D90484, - HwPfPcieGpexPmCapabilities = 0x00D90488, - HwPfPcieGpexFunctionSelect = 0x00D9048C, - HwPfPcieGpexErrorCounter = 0x00D904AC, - HwPfPcieGpexConfigReady = 0x00D904B0, - HwPfPcieGpexFcUpdateTimeout = 0x00D904B8, - HwPfPcieGpexFcUpdateTimer = 0x00D904BC, - HwPfPcieGpexVcBufferLoad = 0x00D904C8, - HwPfPcieGpexVcBufferSizeThold = 0x00D904CC, - HwPfPcieGpexVcBufferSelect = 0x00D904D0, - HwPfPcieGpexBarEnable = 0x00D904D4, - HwPfPcieGpexBarDwordLower = 0x00D904D8, - HwPfPcieGpexBarDwordUpper = 0x00D904DC, - HwPfPcieGpexBarSelect = 0x00D904E0, - HwPfPcieGpexCreditCounterSelect = 0x00D904E4, - HwPfPcieGpexCreditCounterStatus = 0x00D904E8, - HwPfPcieGpexTlpHeaderSelect = 0x00D904EC, - HwPfPcieGpexTlpHeaderDword0 = 0x00D904F0, - HwPfPcieGpexTlpHeaderDword1 = 0x00D904F4, - HwPfPcieGpexTlpHeaderDword2 = 0x00D904F8, - HwPfPcieGpexTlpHeaderDword3 = 0x00D904FC, - HwPfPcieGpexRelaxOrderControl = 0x00D90500, - HwPfPcieGpexBarPrefetch = 0x00D90504, - HwPfPcieGpexFcCheckControl = 0x00D90508, - HwPfPcieGpexFcUpdateTimerTraffic = 0x00D90518, - HwPfPcieGpexPhyControl0 = 0x00D9053C, - HwPfPcieGpexPhyControl1 = 0x00D90544, - HwPfPcieGpexPhyControl2 = 0x00D9054C, - HwPfPcieGpexUserControl0 = 0x00D9055C, - HwPfPcieGpexUncorrErrorStatus = 0x00D905F0, - HwPfPcieGpexRxCplError = 0x00D90620, - HwPfPcieGpexRxCplErrorDword0 = 0x00D90624, - HwPfPcieGpexRxCplErrorDword1 = 0x00D90628, - HwPfPcieGpexRxCplErrorDword2 = 0x00D9062C, - HwPfPcieGpexPabSwResetEn = 0x00D90630, - HwPfPcieGpexGen3Control0 = 0x00D90634, - HwPfPcieGpexGen3Control1 = 0x00D90638, - HwPfPcieGpexGen3Control2 = 0x00D9063C, - HwPfPcieGpexGen2ControlCsr = 0x00D90640, - HwPfPcieGpexTotalVfInitialVf0 = 0x00D90644, - HwPfPcieGpexTotalVfInitialVf1 = 0x00D90648, - HwPfPcieGpexSriovLinkDevId0 = 0x00D90684, - HwPfPcieGpexSriovLinkDevId1 = 0x00D90688, - HwPfPcieGpexSriovPageSize0 = 0x00D906C4, - HwPfPcieGpexSriovPageSize1 = 0x00D906C8, - HwPfPcieGpexIdVersion = 0x00D906FC, - HwPfPcieGpexSriovVfOffsetStride0 = 0x00D90704, - HwPfPcieGpexSriovVfOffsetStride1 = 0x00D90708, - HwPfPcieGpexGen3DeskewControl = 0x00D907B4, - HwPfPcieGpexGen3EqControl = 0x00D907B8, - HwPfPcieGpexBridgeVersion = 0x00D90800, - HwPfPcieGpexBridgeCapability = 0x00D90804, HwPfPcieGpexBridgeControl = 0x00D90808, - HwPfPcieGpexBridgeStatus = 0x00D9080C, - HwPfPcieGpexEngineActivityStatus = 0x00D9081C, - HwPfPcieGpexEngineResetControl = 0x00D90820, HwPfPcieGpexAxiPioControl = 0x00D90840, - HwPfPcieGpexAxiPioStatus = 0x00D90844, - HwPfPcieGpexAmbaSlaveCmdStatus = 0x00D90848, - HwPfPcieGpexPexPioControl = 0x00D908C0, - HwPfPcieGpexPexPioStatus = 0x00D908C4, - HwPfPcieGpexAmbaMasterStatus = 0x00D908C8, - HwPfPcieGpexCsrSlaveCmdStatus = 0x00D90920, - HwPfPcieGpexMailboxAxiControl = 0x00D90A50, - HwPfPcieGpexMailboxAxiData = 0x00D90A54, - HwPfPcieGpexMailboxPexControl = 0x00D90A90, - HwPfPcieGpexMailboxPexData = 0x00D90A94, - HwPfPcieGpexPexInterruptEnable = 0x00D90AD0, - HwPfPcieGpexPexInterruptStatus = 0x00D90AD4, - HwPfPcieGpexPexInterruptAxiPioVector = 0x00D90AD8, - HwPfPcieGpexPexInterruptPexPioVector = 0x00D90AE0, - HwPfPcieGpexPexInterruptMiscVector = 0x00D90AF8, - HwPfPcieGpexAmbaInterruptPioEnable = 0x00D90B00, - HwPfPcieGpexAmbaInterruptMiscEnable = 0x00D90B0C, - HwPfPcieGpexAmbaInterruptPioStatus = 0x00D90B10, - HwPfPcieGpexAmbaInterruptMiscStatus = 0x00D90B1C, - HwPfPcieGpexPexPmControl = 0x00D90B80, - HwPfPcieGpexSlotMisc = 0x00D90B88, - HwPfPcieGpexAxiAddrMappingControl = 0x00D90BA0, - HwPfPcieGpexAxiAddrMappingWindowAxiBase = 0x00D90BA4, - HwPfPcieGpexAxiAddrMappingWindowPexBaseLow = 0x00D90BA8, HwPfPcieGpexAxiAddrMappingWindowPexBaseHigh = 0x00D90BAC, - HwPfPcieGpexPexBarAddrFunc0Bar0 = 0x00D91BA0, - HwPfPcieGpexPexBarAddrFunc0Bar1 = 0x00D91BA4, - HwPfPcieGpexAxiAddrMappingPcieHdrParam = 0x00D95BA0, - HwPfPcieGpexExtAxiAddrMappingAxiBase = 0x00D980A0, - HwPfPcieGpexPexExtBarAddrFunc0Bar0 = 0x00D984A0, - HwPfPcieGpexPexExtBarAddrFunc0Bar1 = 0x00D984A4, - HwPfPcieGpexAmbaInterruptFlrEnable = 0x00D9B960, - HwPfPcieGpexAmbaInterruptFlrStatus = 0x00D9B9A0, - HwPfPcieGpexExtAxiAddrMappingSize = 0x00D9BAF0, - HwPfPcieGpexPexPioAwcacheControl = 0x00D9C300, - HwPfPcieGpexPexPioArcacheControl = 0x00D9C304, - HwPfPcieGpexPabObSizeControlVc0 = 0x00D9C310 }; /* TIP PF Interrupt numbers */ diff --git a/drivers/baseband/acc100/acc100_pmd.h b/drivers/baseband/acc100/acc100_pmd.h index 0c9810c..b325948 100644 --- a/drivers/baseband/acc100/acc100_pmd.h +++ b/drivers/baseband/acc100/acc100_pmd.h @@ -8,6 +8,7 @@ #include "acc100_pf_enum.h" #include "acc100_vf_enum.h" #include "rte_acc100_cfg.h" +#include "acc_common.h" /* Helper macro for logging */ #define rte_bbdev_log(level, fmt, ...) \ @@ -34,64 +35,18 @@ #define ACC100_PF_DEVICE_ID (0x0d5c) #define ACC100_VF_DEVICE_ID (0x0d5d) -/* Values used in filling in descriptors */ -#define ACC100_DMA_DESC_TYPE 2 -#define ACC100_DMA_CODE_BLK_MODE 0 -#define ACC100_DMA_BLKID_FCW 1 -#define ACC100_DMA_BLKID_IN 2 -#define ACC100_DMA_BLKID_OUT_ENC 1 -#define ACC100_DMA_BLKID_OUT_HARD 1 -#define ACC100_DMA_BLKID_OUT_SOFT 2 -#define ACC100_DMA_BLKID_OUT_HARQ 3 -#define ACC100_DMA_BLKID_IN_HARQ 3 - -/* Values used in filling in decode FCWs */ -#define ACC100_FCW_TD_VER 1 -#define ACC100_FCW_TD_EXT_COLD_REG_EN 1 -#define ACC100_FCW_TD_AUTOMAP 0x0f -#define ACC100_FCW_TD_RVIDX_0 2 -#define ACC100_FCW_TD_RVIDX_1 26 -#define ACC100_FCW_TD_RVIDX_2 50 -#define ACC100_FCW_TD_RVIDX_3 74 - /* Values used in writing to the registers */ #define ACC100_REG_IRQ_EN_ALL 0x1FF83FF /* Enable all interrupts */ -/* ACC100 Specific Dimensioning */ -#define ACC100_SIZE_64MBYTE (64*1024*1024) -/* Number of elements in an Info Ring */ -#define ACC100_INFO_RING_NUM_ENTRIES 1024 -/* Number of elements in HARQ layout memory */ -#define ACC100_HARQ_LAYOUT (64*1024*1024) -/* Assume offset for HARQ in memory */ -#define ACC100_HARQ_OFFSET (32*1024) -#define ACC100_HARQ_OFFSET_SHIFT 15 -#define ACC100_HARQ_OFFSET_MASK 0x7ffffff -/* Mask used to calculate an index in an Info Ring array (not a byte offset) */ -#define ACC100_INFO_RING_MASK (ACC100_INFO_RING_NUM_ENTRIES-1) /* Number of Virtual Functions ACC100 supports */ #define ACC100_NUM_VFS 16 #define ACC100_NUM_QGRPS 8 -#define ACC100_NUM_QGRPS_PER_WORD 8 #define ACC100_NUM_AQS 16 -#define MAX_ENQ_BATCH_SIZE 255 -/* All ACC100 Registers alignment are 32bits = 4B */ -#define ACC100_BYTES_IN_WORD 4 -#define ACC100_MAX_E_MBUF 64000 #define ACC100_GRP_ID_SHIFT 10 /* Queue Index Hierarchy */ #define ACC100_VF_ID_SHIFT 4 /* Queue Index Hierarchy */ -#define ACC100_VF_OFFSET_QOS 16 /* offset in Memory specific to QoS Mon */ -#define ACC100_TMPL_PRI_0 0x03020100 -#define ACC100_TMPL_PRI_1 0x07060504 -#define ACC100_TMPL_PRI_2 0x0b0a0908 -#define ACC100_TMPL_PRI_3 0x0f0e0d0c -#define ACC100_QUEUE_ENABLE 0x80000000 /* Bit to mark Queue as Enabled */ #define ACC100_WORDS_IN_ARAM_SIZE (128 * 1024 / 4) -#define ACC100_FDONE 0x80000000 -#define ACC100_SDONE 0x40000000 -#define ACC100_NUM_TMPL 32 /* Mapping of signals for the available engines */ #define ACC100_SIG_UL_5G 0 #define ACC100_SIG_UL_5G_LAST 7 @@ -102,50 +57,10 @@ #define ACC100_SIG_DL_4G 27 #define ACC100_SIG_DL_4G_LAST 31 #define ACC100_NUM_ACCS 5 -#define ACC100_ACCMAP_0 0 -#define ACC100_ACCMAP_1 2 -#define ACC100_ACCMAP_2 1 -#define ACC100_ACCMAP_3 3 -#define ACC100_ACCMAP_4 4 -#define ACC100_PF_VAL 2 - -/* max number of iterations to allocate memory block for all rings */ -#define ACC100_SW_RING_MEM_ALLOC_ATTEMPTS 5 -#define ACC100_MAX_QUEUE_DEPTH 1024 -#define ACC100_DMA_MAX_NUM_POINTERS 14 -#define ACC100_DMA_MAX_NUM_POINTERS_IN 7 -#define ACC100_DMA_DESC_PADDING 8 -#define ACC100_FCW_PADDING 12 -#define ACC100_DESC_FCW_OFFSET 192 -#define ACC100_DESC_SIZE 256 -#define ACC100_DESC_OFFSET (ACC100_DESC_SIZE / 64) -#define ACC100_FCW_TE_BLEN 32 -#define ACC100_FCW_TD_BLEN 24 -#define ACC100_FCW_LE_BLEN 32 -#define ACC100_FCW_LD_BLEN 36 -#define ACC100_5GUL_SIZE_0 16 -#define ACC100_5GUL_SIZE_1 40 -#define ACC100_5GUL_OFFSET_0 36 -#define ACC100_FCW_VER 2 -#define ACC100_MUX_5GDL_DESC 6 -#define ACC100_CMP_ENC_SIZE 20 -#define ACC100_CMP_DEC_SIZE 24 -#define ACC100_ENC_OFFSET (32) -#define ACC100_DEC_OFFSET (80) #define ACC100_EXT_MEM /* Default option with memory external to CPU */ #define ACC100_HARQ_OFFSET_THRESHOLD 1024 -/* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2 */ -#define ACC100_N_ZC_1 66 /* N = 66 Zc for BG 1 */ -#define ACC100_N_ZC_2 50 /* N = 50 Zc for BG 2 */ -#define ACC100_K0_1_1 17 /* K0 fraction numerator for rv 1 and BG 1 */ -#define ACC100_K0_1_2 13 /* K0 fraction numerator for rv 1 and BG 2 */ -#define ACC100_K0_2_1 33 /* K0 fraction numerator for rv 2 and BG 1 */ -#define ACC100_K0_2_2 25 /* K0 fraction numerator for rv 2 and BG 2 */ -#define ACC100_K0_3_1 56 /* K0 fraction numerator for rv 3 and BG 1 */ -#define ACC100_K0_3_2 43 /* K0 fraction numerator for rv 3 and BG 2 */ - /* ACC100 Configuration */ #define ACC100_DDR_ECC_ENABLE #define ACC100_CFG_DMA_ERROR 0x3D7 @@ -159,12 +74,10 @@ #define ACC100_PCIE_QUAD_OFFSET 0x2000 #define ACC100_PCS_EQ 0x6007 #define ACC100_ADAPT 0x8400 -#define ACC100_ENGINE_OFFSET 0x1000 #define ACC100_RESET_HI 0x20100 #define ACC100_RESET_LO 0x20000 #define ACC100_RESET_HARD 0x1FF #define ACC100_ENGINES_MAX 9 -#define ACC100_LONG_WAIT 1000 #define ACC100_GPEX_AXIMAP_NUM 17 #define ACC100_CLOCK_GATING_EN 0x30000 #define ACC100_FABRIC_MODE 0xB @@ -173,292 +86,8 @@ */ #define ACC100_HARQ_DDR (512 * 1) #define ACC100_PRQ_DDR_VER 0x10092020 -#define ACC100_MS_IN_US (1000) #define ACC100_DDR_TRAINING_MAX (5000) -/* ACC100 DMA Descriptor triplet */ -struct acc100_dma_triplet { - uint64_t address; - uint32_t blen:20, - res0:4, - last:1, - dma_ext:1, - res1:2, - blkid:4; -} __rte_packed; - -/* ACC100 DMA Response Descriptor */ -union acc100_dma_rsp_desc { - uint32_t val; - struct { - uint32_t crc_status:1, - synd_ok:1, - dma_err:1, - neg_stop:1, - fcw_err:1, - output_err:1, - input_err:1, - timestampEn:1, - iterCountFrac:8, - iter_cnt:8, - rsrvd3:6, - sdone:1, - fdone:1; - uint32_t add_info_0; - uint32_t add_info_1; - }; -}; - - -/* ACC100 Queue Manager Enqueue PCI Register */ -union acc100_enqueue_reg_fmt { - uint32_t val; - struct { - uint32_t num_elem:8, - addr_offset:3, - rsrvd:1, - req_elem_addr:20; - }; -}; - -/* FEC 4G Uplink Frame Control Word */ -struct __rte_packed acc100_fcw_td { - uint8_t fcw_ver:4, - num_maps:4; /* Unused */ - uint8_t filler:6, /* Unused */ - rsrvd0:1, - bypass_sb_deint:1; - uint16_t k_pos; - uint16_t k_neg; /* Unused */ - uint8_t c_neg; /* Unused */ - uint8_t c; /* Unused */ - uint32_t ea; /* Unused */ - uint32_t eb; /* Unused */ - uint8_t cab; /* Unused */ - uint8_t k0_start_col; /* Unused */ - uint8_t rsrvd1; - uint8_t code_block_mode:1, /* Unused */ - turbo_crc_type:1, - rsrvd2:3, - bypass_teq:1, /* Unused */ - soft_output_en:1, /* Unused */ - ext_td_cold_reg_en:1; - union { /* External Cold register */ - uint32_t ext_td_cold_reg; - struct { - uint32_t min_iter:4, /* Unused */ - max_iter:4, - ext_scale:5, /* Unused */ - rsrvd3:3, - early_stop_en:1, /* Unused */ - sw_soft_out_dis:1, /* Unused */ - sw_et_cont:1, /* Unused */ - sw_soft_out_saturation:1, /* Unused */ - half_iter_on:1, /* Unused */ - raw_decoder_input_on:1, /* Unused */ - rsrvd4:10; - }; - }; -}; - -/* FEC 5GNR Uplink Frame Control Word */ -struct __rte_packed acc100_fcw_ld { - uint32_t FCWversion:4, - qm:4, - nfiller:11, - BG:1, - Zc:9, - res0:1, - synd_precoder:1, - synd_post:1; - uint32_t ncb:16, - k0:16; - uint32_t rm_e:24, - hcin_en:1, - hcout_en:1, - crc_select:1, - bypass_dec:1, - bypass_intlv:1, - so_en:1, - so_bypass_rm:1, - so_bypass_intlv:1; - uint32_t hcin_offset:16, - hcin_size0:16; - uint32_t hcin_size1:16, - hcin_decomp_mode:3, - llr_pack_mode:1, - hcout_comp_mode:3, - res2:1, - dec_convllr:4, - hcout_convllr:4; - uint32_t itmax:7, - itstop:1, - so_it:7, - res3:1, - hcout_offset:16; - uint32_t hcout_size0:16, - hcout_size1:16; - uint32_t gain_i:8, - gain_h:8, - negstop_th:16; - uint32_t negstop_it:7, - negstop_en:1, - res4:24; -}; - -/* FEC 4G Downlink Frame Control Word */ -struct __rte_packed acc100_fcw_te { - uint16_t k_neg; - uint16_t k_pos; - uint8_t c_neg; - uint8_t c; - uint8_t filler; - uint8_t cab; - uint32_t ea:17, - rsrvd0:15; - uint32_t eb:17, - rsrvd1:15; - uint16_t ncb_neg; - uint16_t ncb_pos; - uint8_t rv_idx0:2, - rsrvd2:2, - rv_idx1:2, - rsrvd3:2; - uint8_t bypass_rv_idx0:1, - bypass_rv_idx1:1, - bypass_rm:1, - rsrvd4:5; - uint8_t rsrvd5:1, - rsrvd6:3, - code_block_crc:1, - rsrvd7:3; - uint8_t code_block_mode:1, - rsrvd8:7; - uint64_t rsrvd9; -}; - -/* FEC 5GNR Downlink Frame Control Word */ -struct __rte_packed acc100_fcw_le { - uint32_t FCWversion:4, - qm:4, - nfiller:11, - BG:1, - Zc:9, - res0:3; - uint32_t ncb:16, - k0:16; - uint32_t rm_e:24, - res1:2, - crc_select:1, - res2:1, - bypass_intlv:1, - res3:3; - uint32_t res4_a:12, - mcb_count:3, - res4_b:17; - uint32_t res5; - uint32_t res6; - uint32_t res7; - uint32_t res8; -}; - -/* ACC100 DMA Request Descriptor */ -struct __rte_packed acc100_dma_req_desc { - union { - struct{ - uint32_t type:4, - rsrvd0:26, - sdone:1, - fdone:1; - uint32_t rsrvd1; - uint32_t rsrvd2; - uint32_t pass_param:8, - sdone_enable:1, - irq_enable:1, - timeStampEn:1, - res0:5, - numCBs:4, - res1:4, - m2dlen:4, - d2mlen:4; - }; - struct{ - uint32_t word0; - uint32_t word1; - uint32_t word2; - uint32_t word3; - }; - }; - struct acc100_dma_triplet data_ptrs[ACC100_DMA_MAX_NUM_POINTERS]; - - /* Virtual addresses used to retrieve SW context info */ - union { - void *op_addr; - uint64_t pad1; /* pad to 64 bits */ - }; - /* - * Stores additional information needed for driver processing: - * - last_desc_in_batch - flag used to mark last descriptor (CB) - * in batch - * - cbs_in_tb - stores information about total number of Code Blocks - * in currently processed Transport Block - */ - union { - struct { - union { - struct acc100_fcw_ld fcw_ld; - struct acc100_fcw_td fcw_td; - struct acc100_fcw_le fcw_le; - struct acc100_fcw_te fcw_te; - uint32_t pad2[ACC100_FCW_PADDING]; - }; - uint32_t last_desc_in_batch :8, - cbs_in_tb:8, - pad4 : 16; - }; - uint64_t pad3[ACC100_DMA_DESC_PADDING]; /* pad to 64 bits */ - }; -}; - -/* ACC100 DMA Descriptor */ -union acc100_dma_desc { - struct acc100_dma_req_desc req; - union acc100_dma_rsp_desc rsp; - uint64_t atom_hdr; -}; - - -/* Union describing Info Ring entry */ -union acc100_harq_layout_data { - uint32_t val; - struct { - uint16_t offset; - uint16_t size0; - }; -} __rte_packed; - - -/* Union describing Info Ring entry */ -union acc100_info_ring_data { - uint32_t val; - struct { - union { - uint16_t detailed_info; - struct { - uint16_t aq_id: 4; - uint16_t qg_id: 4; - uint16_t vf_id: 6; - uint16_t reserved: 2; - }; - }; - uint16_t int_nb: 7; - uint16_t msi_0: 1; - uint16_t vf2pf: 6; - uint16_t loop: 1; - uint16_t valid: 1; - }; -} __rte_packed; - struct acc100_registry_addr { unsigned int dma_ring_dl5g_hi; unsigned int dma_ring_dl5g_lo; @@ -545,80 +174,4 @@ struct acc100_registry_addr { .ddr_range = HWVfDmaDdrBaseRangeRoVf, }; -/* Structure associated with each queue. */ -struct __rte_cache_aligned acc100_queue { - union acc100_dma_desc *ring_addr; /* Virtual address of sw ring */ - rte_iova_t ring_addr_iova; /* IOVA address of software ring */ - uint32_t sw_ring_head; /* software ring head */ - uint32_t sw_ring_tail; /* software ring tail */ - /* software ring size (descriptors, not bytes) */ - uint32_t sw_ring_depth; - /* mask used to wrap enqueued descriptors on the sw ring */ - uint32_t sw_ring_wrap_mask; - /* MMIO register used to enqueue descriptors */ - void *mmio_reg_enqueue; - uint8_t vf_id; /* VF ID (max = 63) */ - uint8_t qgrp_id; /* Queue Group ID */ - uint16_t aq_id; /* Atomic Queue ID */ - uint16_t aq_depth; /* Depth of atomic queue */ - uint32_t aq_enqueued; /* Count how many "batches" have been enqueued */ - uint32_t aq_dequeued; /* Count how many "batches" have been dequeued */ - uint32_t irq_enable; /* Enable ops dequeue interrupts if set to 1 */ - struct rte_mempool *fcw_mempool; /* FCW mempool */ - enum rte_bbdev_op_type op_type; /* Type of this Queue: TE or TD */ - /* Internal Buffers for loopback input */ - uint8_t *lb_in; - uint8_t *lb_out; - rte_iova_t lb_in_addr_iova; - rte_iova_t lb_out_addr_iova; - struct acc100_device *d; -}; - -typedef void (*acc10x_fcw_ld_fill_fun_t)(struct rte_bbdev_dec_op *op, - struct acc100_fcw_ld *fcw, - union acc100_harq_layout_data *harq_layout); - -/* Private data structure for each ACC100 device */ -struct acc100_device { - void *mmio_base; /**< Base address of MMIO registers (BAR0) */ - void *sw_rings_base; /* Base addr of un-aligned memory for sw rings */ - void *sw_rings; /* 64MBs of 64MB aligned memory for sw rings */ - rte_iova_t sw_rings_iova; /* IOVA address of sw_rings */ - /* Virtual address of the info memory routed to the this function under - * operation, whether it is PF or VF. - * HW may DMA information data at this location asynchronously - */ - union acc100_info_ring_data *info_ring; - - union acc100_harq_layout_data *harq_layout; - /* Virtual Info Ring head */ - uint16_t info_ring_head; - /* Number of bytes available for each queue in device, depending on - * how many queues are enabled with configure() - */ - uint32_t sw_ring_size; - uint32_t ddr_size; /* Size in kB */ - uint32_t *tail_ptrs; /* Base address of response tail pointer buffer */ - rte_iova_t tail_ptr_iova; /* IOVA address of tail pointers */ - /* Max number of entries available for each queue in device, depending - * on how many queues are enabled with configure() - */ - uint32_t sw_ring_max_depth; - struct rte_acc100_conf acc100_conf; /* ACC100 Initial configuration */ - /* Bitmap capturing which Queues have already been assigned */ - uint16_t q_assigned_bit_map[ACC100_NUM_QGRPS]; - bool pf_device; /**< True if this is a PF ACC100 device */ - bool configured; /**< True if this ACC100 device is configured */ - uint16_t device_variant; /**< Device variant */ - acc10x_fcw_ld_fill_fun_t fcw_ld_fill; /**< 5GUL FCW generation function */ -}; - -/** - * Structure with details about RTE_BBDEV_EVENT_DEQUEUE event. It's passed to - * the callback function. - */ -struct acc100_deq_intr_details { - uint16_t queue_id; -}; - #endif /* _RTE_ACC100_PMD_H_ */ diff --git a/drivers/baseband/acc100/acc101_pmd.h b/drivers/baseband/acc100/acc101_pmd.h index 9d8862c..37df008 100644 --- a/drivers/baseband/acc100/acc101_pmd.h +++ b/drivers/baseband/acc100/acc101_pmd.h @@ -11,16 +11,9 @@ #define ACC101_NUM_VFS 16 #define ACC101_NUM_QGRPS 8 #define ACC101_NUM_AQS 16 -/* All ACC101 Registers alignment are 32bits = 4B */ -#define ACC101_BYTES_IN_WORD 4 -#define ACC101_TMPL_PRI_0 0x03020100 -#define ACC101_TMPL_PRI_1 0x07060504 -#define ACC101_TMPL_PRI_2 0x0b0a0908 -#define ACC101_TMPL_PRI_3 0x0f0e0d0c #define ACC101_WORDS_IN_ARAM_SIZE (128 * 1024 / 4) -#define ACC101_NUM_TMPL 32 /* Mapping of signals for the available engines */ #define ACC101_SIG_UL_5G 0 #define ACC101_SIG_UL_5G_LAST 8 @@ -31,7 +24,6 @@ #define ACC101_SIG_DL_4G 27 #define ACC101_SIG_DL_4G_LAST 31 #define ACC101_NUM_ACCS 5 -#define ACC101_PF_VAL 2 /* ACC101 Configuration */ #define ACC101_CFG_DMA_ERROR 0x3D7 @@ -39,8 +31,6 @@ #define ACC101_CFG_QMGR_HI_P 0x0F0F #define ACC101_CFG_PCI_AXI 0xC003 #define ACC101_CFG_PCI_BRIDGE 0x40006033 -#define ACC101_ENGINE_OFFSET 0x1000 -#define ACC101_LONG_WAIT 1000 #define ACC101_GPEX_AXIMAP_NUM 17 #define ACC101_CLOCK_GATING_EN 0x30000 #define ACC101_DMA_INBOUND 0x104 diff --git a/drivers/baseband/acc100/acc_common.h b/drivers/baseband/acc100/acc_common.h new file mode 100644 index 0000000..894ecb6 --- /dev/null +++ b/drivers/baseband/acc100/acc_common.h @@ -0,0 +1,1388 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Intel Corporation + */ + +#ifndef _ACC_COMMON_H_ +#define _ACC_COMMON_H_ + +#include "rte_acc_common_cfg.h" + +/* Values used in filling in descriptors */ +#define ACC_DMA_DESC_TYPE 2 +#define ACC_DMA_BLKID_FCW 1 +#define ACC_DMA_BLKID_IN 2 +#define ACC_DMA_BLKID_OUT_ENC 1 +#define ACC_DMA_BLKID_OUT_HARD 1 +#define ACC_DMA_BLKID_OUT_SOFT 2 +#define ACC_DMA_BLKID_OUT_HARQ 3 +#define ACC_DMA_BLKID_IN_HARQ 3 +#define ACC_DMA_BLKID_IN_MLD_R 3 + +/* Values used in filling in decode FCWs */ +#define ACC_FCW_TD_VER 1 +#define ACC_FCW_TD_EXT_COLD_REG_EN 1 +#define ACC_FCW_TD_AUTOMAP 0x0f +#define ACC_FCW_TD_RVIDX_0 2 +#define ACC_FCW_TD_RVIDX_1 26 +#define ACC_FCW_TD_RVIDX_2 50 +#define ACC_FCW_TD_RVIDX_3 74 + +#define ACC_SIZE_64MBYTE (64*1024*1024) +/* Number of elements in an Info Ring */ +#define ACC_INFO_RING_NUM_ENTRIES 1024 +/* Number of elements in HARQ layout memory + * 128M x 32kB = 4GB addressable memory + */ +#define ACC_HARQ_LAYOUT (128 * 1024 * 1024) +/* Assume offset for HARQ in memory */ +#define ACC_HARQ_OFFSET (32 * 1024) +#define ACC_HARQ_OFFSET_SHIFT 15 +#define ACC_HARQ_OFFSET_MASK 0x7ffffff +#define ACC_HARQ_OFFSET_THRESHOLD 1024 +/* Mask used to calculate an index in an Info Ring array (not a byte offset) */ +#define ACC_INFO_RING_MASK (ACC_INFO_RING_NUM_ENTRIES-1) + +#define MAX_ENQ_BATCH_SIZE 255 + +/* All ACC100 Registers alignment are 32bits = 4B */ +#define ACC_BYTES_IN_WORD 4 +#define ACC_MAX_E_MBUF 64000 + +#define ACC_VF_OFFSET_QOS 16 /* offset in Memory specific to QoS Mon */ +#define ACC_TMPL_PRI_0 0x03020100 +#define ACC_TMPL_PRI_1 0x07060504 +#define ACC_TMPL_PRI_2 0x0b0a0908 +#define ACC_TMPL_PRI_3 0x0f0e0d0c +#define ACC_TMPL_PRI_4 0x13121110 +#define ACC_TMPL_PRI_5 0x17161514 +#define ACC_TMPL_PRI_6 0x1b1a1918 +#define ACC_TMPL_PRI_7 0x1f1e1d1c +#define ACC_QUEUE_ENABLE 0x80000000 /* Bit to mark Queue as Enabled */ +#define ACC_FDONE 0x80000000 +#define ACC_SDONE 0x40000000 + +#define ACC_NUM_TMPL 32 + +#define ACC_ACCMAP_0 0 +#define ACC_ACCMAP_1 2 +#define ACC_ACCMAP_2 1 +#define ACC_ACCMAP_3 3 +#define ACC_ACCMAP_4 4 +#define ACC_ACCMAP_5 5 +#define ACC_PF_VAL 2 + +/* max number of iterations to allocate memory block for all rings */ +#define ACC_SW_RING_MEM_ALLOC_ATTEMPTS 5 +#define ACC_MAX_QUEUE_DEPTH 1024 +#define ACC_DMA_MAX_NUM_POINTERS 14 +#define ACC_DMA_MAX_NUM_POINTERS_IN 7 +#define ACC_DMA_DESC_PADDINGS 8 +#define ACC_FCW_PADDING 12 +#define ACC_DESC_FCW_OFFSET 192 +#define ACC_DESC_SIZE 256 +#define ACC_DESC_OFFSET (ACC_DESC_SIZE / 64) +#define ACC_FCW_TE_BLEN 32 +#define ACC_FCW_TD_BLEN 24 +#define ACC_FCW_LE_BLEN 32 +#define ACC_FCW_LD_BLEN 36 +#define ACC_FCW_FFT_BLEN 28 +#define ACC_5GUL_SIZE_0 16 +#define ACC_5GUL_SIZE_1 40 +#define ACC_5GUL_OFFSET_0 36 +#define ACC_COMPANION_PTRS 8 +#define ACC_FCW_VER 2 +#define ACC_MUX_5GDL_DESC 6 +#define ACC_CMP_ENC_SIZE 20 +#define ACC_CMP_DEC_SIZE 24 +#define ACC_ENC_OFFSET (32) +#define ACC_DEC_OFFSET (80) +#define ACC_LIMIT_DL_MUX_BITS 534 +#define ACC_NUM_QGRPS_PER_WORD 8 +#define ACC_MAX_NUM_QGRPS 32 + +/* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2 */ +#define ACC_N_ZC_1 66 /* N = 66 Zc for BG 1 */ +#define ACC_N_ZC_2 50 /* N = 50 Zc for BG 2 */ +#define ACC_K_ZC_1 22 /* K = 22 Zc for BG 1 */ +#define ACC_K_ZC_2 10 /* K = 10 Zc for BG 2 */ +#define ACC_K0_1_1 17 /* K0 fraction numerator for rv 1 and BG 1 */ +#define ACC_K0_1_2 13 /* K0 fraction numerator for rv 1 and BG 2 */ +#define ACC_K0_2_1 33 /* K0 fraction numerator for rv 2 and BG 1 */ +#define ACC_K0_2_2 25 /* K0 fraction numerator for rv 2 and BG 2 */ +#define ACC_K0_3_1 56 /* K0 fraction numerator for rv 3 and BG 1 */ +#define ACC_K0_3_2 43 /* K0 fraction numerator for rv 3 and BG 2 */ + +#define ACC_ENGINE_OFFSET 0x1000 +#define ACC_LONG_WAIT 1000 +#define ACC_MS_IN_US (1000) + +#define ACC_ALGO_SPA 0 +#define ACC_ALGO_MSA 1 + +/* Helper macro for logging */ +#define rte_acc_log(level, fmt, ...) \ + rte_log(RTE_LOG_ ## level, RTE_LOG_NOTICE, fmt "\n", \ + ##__VA_ARGS__) + +/* ACC100 DMA Descriptor triplet */ +struct acc_dma_triplet { + uint64_t address; + uint32_t blen:20, + res0:4, + last:1, + dma_ext:1, + res1:2, + blkid:4; +} __rte_packed; + + +/* ACC100 Queue Manager Enqueue PCI Register */ +union acc_enqueue_reg_fmt { + uint32_t val; + struct { + uint32_t num_elem:8, + addr_offset:3, + rsrvd:1, + req_elem_addr:20; + }; +}; + +/* FEC 4G Uplink Frame Control Word */ +struct __rte_packed acc_fcw_td { + uint8_t fcw_ver:4, + num_maps:4; /* Unused in ACC100 */ + uint8_t filler:6, /* Unused in ACC100 */ + rsrvd0:1, + bypass_sb_deint:1; + uint16_t k_pos; + uint16_t k_neg; /* Unused in ACC100 */ + uint8_t c_neg; /* Unused in ACC100 */ + uint8_t c; /* Unused in ACC100 */ + uint32_t ea; /* Unused in ACC100 */ + uint32_t eb; /* Unused in ACC100 */ + uint8_t cab; /* Unused in ACC100 */ + uint8_t k0_start_col; /* Unused in ACC100 */ + uint8_t rsrvd1; + uint8_t code_block_mode:1, /* Unused in ACC100 */ + turbo_crc_type:1, + rsrvd2:3, + bypass_teq:1, /* Unused in ACC100 */ + soft_output_en:1, /* Unused in ACC100 */ + ext_td_cold_reg_en:1; + union { /* External Cold register */ + uint32_t ext_td_cold_reg; + struct { + uint32_t min_iter:4, /* Unused in ACC100 */ + max_iter:4, + ext_scale:5, /* Unused in ACC100 */ + rsrvd3:3, + early_stop_en:1, /* Unused in ACC100 */ + sw_soft_out_dis:1, /* Unused in ACC100 */ + sw_et_cont:1, /* Unused in ACC100 */ + sw_soft_out_saturation:1, /* Unused in ACC100 */ + half_iter_on:1, /* Unused in ACC100 */ + raw_decoder_input_on:1, /* Unused in ACC100 */ + rsrvd4:10; + }; + }; +}; + +/* FEC 4G Downlink Frame Control Word */ +struct __rte_packed acc_fcw_te { + uint16_t k_neg; + uint16_t k_pos; + uint8_t c_neg; + uint8_t c; + uint8_t filler; + uint8_t cab; + uint32_t ea:17, + rsrvd0:15; + uint32_t eb:17, + rsrvd1:15; + uint16_t ncb_neg; + uint16_t ncb_pos; + uint8_t rv_idx0:2, + rsrvd2:2, + rv_idx1:2, + rsrvd3:2; + uint8_t bypass_rv_idx0:1, + bypass_rv_idx1:1, + bypass_rm:1, + rsrvd4:5; + uint8_t rsrvd5:1, + rsrvd6:3, + code_block_crc:1, + rsrvd7:3; + uint8_t code_block_mode:1, + rsrvd8:7; + uint64_t rsrvd9; +}; + +/* FEC 5GNR Downlink Frame Control Word */ +struct __rte_packed acc_fcw_le { + uint32_t FCWversion:4, + qm:4, + nfiller:11, + BG:1, + Zc:9, + res0:3; + uint32_t ncb:16, + k0:16; + uint32_t rm_e:22, + res1:4, + crc_select:1, + res2:1, + bypass_intlv:1, + res3:3; + uint32_t res4_a:12, + mcb_count:3, + res4_b:1, + C:8, + Cab:8; + uint32_t rm_e_b:22, + res5:10; + uint32_t res6; + uint32_t res7; + uint32_t res8; +}; + +/* FEC 5GNR Uplink Frame Control Word */ +struct __rte_packed acc_fcw_ld { + uint32_t FCWversion:4, + qm:4, + nfiller:11, + BG:1, + Zc:9, + cnu_algo:1, /* Not supported in ACC100 */ + synd_precoder:1, + synd_post:1; + uint32_t ncb:16, + k0:16; + uint32_t rm_e:24, + hcin_en:1, + hcout_en:1, + crc_select:1, + bypass_dec:1, + bypass_intlv:1, + so_en:1, + so_bypass_rm:1, + so_bypass_intlv:1; + uint32_t hcin_offset:16, + hcin_size0:16; + uint32_t hcin_size1:16, + hcin_decomp_mode:3, + llr_pack_mode:1, + hcout_comp_mode:3, + saturate_input:1, /* Not supported in ACC200 */ + dec_convllr:4, + hcout_convllr:4; + uint32_t itmax:7, + itstop:1, + so_it:7, + minsum_offset:1, /* Not supported in ACC200 */ + hcout_offset:16; + uint32_t hcout_size0:16, + hcout_size1:16; + uint32_t gain_i:8, + gain_h:8, + negstop_th:16; + uint32_t negstop_it:7, + negstop_en:1, + tb_crc_select:2, /* Not supported in ACC100 */ + dec_llrclip:2, /* Not supported in ACC200 */ + tb_trailer_size:20; /* Not supported in ACC100 */ +}; + +/* FFT Frame Control Word */ +struct __rte_packed acc_fcw_fft { + uint32_t in_frame_size:16, + leading_pad_size:16; + uint32_t out_frame_size:16, + leading_depad_size:16; + uint32_t cs_window_sel; + uint32_t cs_window_sel2:16, + cs_enable_bmap:16; + uint32_t num_antennas:8, + idft_size:8, + dft_size:8, + cs_offset:8; + uint32_t idft_shift:8, + dft_shift:8, + cs_multiplier:16; + uint32_t bypass:2, + fp16_in:1, /* Not supported in ACC200 */ + fp16_out:1, + exp_adj:4, + power_shift:4, + power_en:1, + res:19; +}; + +/* MLD-TS Frame Control Word */ +struct __rte_packed acc_fcw_mldts { + uint32_t fcw_version:4, + res0:12, + nrb:13, /* 1 to 1925 */ + res1:3; + uint32_t NLayers:2, /* 1: 2L... 3: 4L */ + res2:14, + Qmod0:2, /* 0: 2...3: 8 */ + res3_0:2, + Qmod1:2, + res3_1:2, + Qmod2:2, + res3_2:2, + Qmod3:2, + res3_3:2; + uint32_t Rrep:3, /* 0 to 5 */ + res4:1, + Crep:3, /* 0 to 6 */ + res5:25; + uint32_t pad0; + uint32_t pad1; + uint32_t pad2; + uint32_t pad3; + uint32_t pad4; +}; + +/* DMA Response Descriptor */ +union acc_dma_rsp_desc { + uint32_t val; + struct { + uint32_t crc_status:1, + synd_ok:1, + dma_err:1, + neg_stop:1, + fcw_err:1, + output_truncate:1, + input_err:1, + tsen_pagefault:1, + iterCountFrac:8, + iter_cnt:8, + engine_hung:1, + core_reset:5, + sdone:1, + fdone:1; + uint32_t add_info_0; + uint32_t add_info_1; + }; +}; + +/* DMA Request Descriptor */ +struct __rte_packed acc_dma_req_desc { + union { + struct{ + uint32_t type:4, + rsrvd0:26, + sdone:1, + fdone:1; + uint32_t ib_ant_offset:16, /* Not supported in ACC100 */ + res2:12, + num_ant:4; + uint32_t ob_ant_offset:16, + ob_cyc_offset:12, + num_cs:4; + uint32_t pass_param:8, + sdone_enable:1, + irq_enable:1, + timeStampEn:1, + dltb:1, /* Not supported in ACC200 */ + res0:4, + numCBs:8, + m2dlen:4, + d2mlen:4; + }; + struct{ + uint32_t word0; + uint32_t word1; + uint32_t word2; + uint32_t word3; + }; + }; + struct acc_dma_triplet data_ptrs[ACC_DMA_MAX_NUM_POINTERS]; + + /* Virtual addresses used to retrieve SW context info */ + union { + void *op_addr; + uint64_t pad1; /* pad to 64 bits */ + }; + /* + * Stores additional information needed for driver processing: + * - last_desc_in_batch - flag used to mark last descriptor (CB) + * in batch + * - cbs_in_tb - stores information about total number of Code Blocks + * in currently processed Transport Block + */ + union { + struct { + union { + struct acc_fcw_ld fcw_ld; + struct acc_fcw_td fcw_td; + struct acc_fcw_le fcw_le; + struct acc_fcw_te fcw_te; + struct acc_fcw_fft fcw_fft; + struct acc_fcw_mldts fcw_mldts; + uint32_t pad2[ACC_FCW_PADDING]; + }; + uint32_t last_desc_in_batch :8, + cbs_in_tb:8, + pad4 : 16; + }; + uint64_t pad3[ACC_DMA_DESC_PADDINGS]; /* pad to 64 bits */ + }; +}; + +/* ACC100 DMA Descriptor */ +union acc_dma_desc { + struct acc_dma_req_desc req; + union acc_dma_rsp_desc rsp; + uint64_t atom_hdr; +}; + +/* Union describing Info Ring entry */ +union acc_info_ring_data { + uint32_t val; + struct { + union { + uint16_t detailed_info; + struct { + uint16_t aq_id: 4; + uint16_t qg_id: 4; + uint16_t vf_id: 6; + uint16_t reserved: 2; + }; + }; + uint16_t int_nb: 7; + uint16_t msi_0: 1; + uint16_t vf2pf: 6; + uint16_t loop: 1; + uint16_t valid: 1; + }; + struct { + uint32_t aq_id_3: 6; + uint32_t qg_id_3: 5; + uint32_t vf_id_3: 6; + uint32_t int_nb_3: 6; + uint32_t msi_0_3: 1; + uint32_t vf2pf_3: 6; + uint32_t loop_3: 1; + uint32_t valid_3: 1; + }; +} __rte_packed; + +struct __rte_packed acc_pad_ptr { + void *op_addr; + uint64_t pad1; /* pad to 64 bits */ +}; + +struct __rte_packed acc_ptrs { + struct acc_pad_ptr ptr[ACC_COMPANION_PTRS]; +}; + +/* Union describing Info Ring entry */ +union acc_harq_layout_data { + uint32_t val; + struct { + uint16_t offset; + uint16_t size0; + }; +} __rte_packed; + +/** + * Structure with details about RTE_BBDEV_EVENT_DEQUEUE event. It's passed to + * the callback function. + */ +struct acc_deq_intr_details { + uint16_t queue_id; +}; + +/* TIP VF2PF Comms */ +enum { + ACC_VF2PF_STATUS_REQUEST = 0, + ACC_VF2PF_USING_VF = 1, +}; + + +typedef void (*acc10x_fcw_ld_fill_fun_t)(struct rte_bbdev_dec_op *op, + struct acc_fcw_ld *fcw, + union acc_harq_layout_data *harq_layout); + +/* Private data structure for each ACC100 device */ +struct acc_device { + void *mmio_base; /**< Base address of MMIO registers (BAR0) */ + void *sw_rings_base; /* Base addr of un-aligned memory for sw rings */ + void *sw_rings; /* 64MBs of 64MB aligned memory for sw rings */ + rte_iova_t sw_rings_iova; /* IOVA address of sw_rings */ + /* Virtual address of the info memory routed to the this function under + * operation, whether it is PF or VF. + * HW may DMA information data at this location asynchronously + */ + union acc_info_ring_data *info_ring; + + union acc_harq_layout_data *harq_layout; + /* Virtual Info Ring head */ + uint16_t info_ring_head; + /* Number of bytes available for each queue in device, depending on + * how many queues are enabled with configure() + */ + uint32_t sw_ring_size; + uint32_t ddr_size; /* Size in kB */ + uint32_t *tail_ptrs; /* Base address of response tail pointer buffer */ + rte_iova_t tail_ptr_iova; /* IOVA address of tail pointers */ + /* Max number of entries available for each queue in device, depending + * on how many queues are enabled with configure() + */ + uint32_t sw_ring_max_depth; + struct rte_acc_conf acc_conf; /* ACC100 Initial configuration */ + /* Bitmap capturing which Queues have already been assigned */ + uint64_t q_assigned_bit_map[ACC_MAX_NUM_QGRPS]; + bool pf_device; /**< True if this is a PF ACC100 device */ + bool configured; /**< True if this ACC100 device is configured */ + uint16_t device_variant; /**< Device variant */ + acc10x_fcw_ld_fill_fun_t fcw_ld_fill; /**< 5GUL FCW generation function */ +}; + +/* Structure associated with each queue. */ +struct __rte_cache_aligned acc_queue { + union acc_dma_desc *ring_addr; /* Virtual address of sw ring */ + rte_iova_t ring_addr_iova; /* IOVA address of software ring */ + uint32_t sw_ring_head; /* software ring head */ + uint32_t sw_ring_tail; /* software ring tail */ + /* software ring size (descriptors, not bytes) */ + uint32_t sw_ring_depth; + /* mask used to wrap enqueued descriptors on the sw ring */ + uint32_t sw_ring_wrap_mask; + /* Virtual address of companion ring */ + struct acc_ptrs *companion_ring_addr; + /* MMIO register used to enqueue descriptors */ + void *mmio_reg_enqueue; + uint8_t vf_id; /* VF ID (max = 63) */ + uint8_t qgrp_id; /* Queue Group ID */ + uint16_t aq_id; /* Atomic Queue ID */ + uint16_t aq_depth; /* Depth of atomic queue */ + uint32_t aq_enqueued; /* Count how many "batches" have been enqueued */ + uint32_t aq_dequeued; /* Count how many "batches" have been dequeued */ + uint32_t irq_enable; /* Enable ops dequeue interrupts if set to 1 */ + struct rte_mempool *fcw_mempool; /* FCW mempool */ + enum rte_bbdev_op_type op_type; /* Type of this Queue: TE or TD */ + /* Internal Buffers for loopback input */ + uint8_t *lb_in; + uint8_t *lb_out; + rte_iova_t lb_in_addr_iova; + rte_iova_t lb_out_addr_iova; + int8_t *derm_buffer; /* interim buffer for de-rm in SDK */ + struct acc_device *d; +}; + +/* Write to MMIO register address */ +static inline void +mmio_write(void *addr, uint32_t value) +{ + *((volatile uint32_t *)(addr)) = rte_cpu_to_le_32(value); +} + +/* Write a register of a ACC100 device */ +static inline void +acc_reg_write(struct acc_device *d, uint32_t offset, uint32_t value) +{ + void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset); + mmio_write(reg_addr, value); + usleep(ACC_LONG_WAIT); +} + +/* Read a register of a ACC100 device */ +static inline uint32_t +acc_reg_read(struct acc_device *d, uint32_t offset) +{ + + void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset); + uint32_t ret = *((volatile uint32_t *)(reg_addr)); + return rte_le_to_cpu_32(ret); +} + +/* Basic Implementation of Log2 for exact 2^N */ +static inline uint32_t +log2_basic(uint32_t value) +{ + return (value == 0) ? 0 : rte_bsf32(value); +} + +/* Calculate memory alignment offset assuming alignment is 2^N */ +static inline uint32_t +calc_mem_alignment_offset(void *unaligned_virt_mem, uint32_t alignment) +{ + rte_iova_t unaligned_phy_mem = rte_malloc_virt2iova(unaligned_virt_mem); + return (uint32_t)(alignment - + (unaligned_phy_mem & (alignment-1))); +} + +static void +free_base_addresses(void **base_addrs, int size) +{ + int i; + for (i = 0; i < size; i++) + rte_free(base_addrs[i]); +} + +/* Read flag value 0/1 from bitmap */ +static inline bool +check_bit(uint32_t bitmap, uint32_t bitmask) +{ + return bitmap & bitmask; +} + +static inline char * +mbuf_append(struct rte_mbuf *m_head, struct rte_mbuf *m, uint16_t len) +{ + if (unlikely(len > rte_pktmbuf_tailroom(m))) + return NULL; + + char *tail = (char *)m->buf_addr + m->data_off + m->data_len; + m->data_len = (uint16_t)(m->data_len + len); + m_head->pkt_len = (m_head->pkt_len + len); + return tail; +} + + +static inline uint32_t +get_desc_len(void) +{ + return sizeof(union acc_dma_desc); +} + +/* Allocate the 2 * 64MB block for the sw rings */ +static inline int +alloc_2x64mb_sw_rings_mem(struct rte_bbdev *dev, struct acc_device *d, + int socket) +{ + uint32_t sw_ring_size = ACC_SIZE_64MBYTE; + d->sw_rings_base = rte_zmalloc_socket(dev->device->driver->name, + 2 * sw_ring_size, RTE_CACHE_LINE_SIZE, socket); + if (d->sw_rings_base == NULL) { + rte_acc_log(ERR, "Failed to allocate memory for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + return -ENOMEM; + } + uint32_t next_64mb_align_offset = calc_mem_alignment_offset( + d->sw_rings_base, ACC_SIZE_64MBYTE); + d->sw_rings = RTE_PTR_ADD(d->sw_rings_base, next_64mb_align_offset); + d->sw_rings_iova = rte_malloc_virt2iova(d->sw_rings_base) + + next_64mb_align_offset; + d->sw_ring_size = ACC_MAX_QUEUE_DEPTH * get_desc_len(); + d->sw_ring_max_depth = ACC_MAX_QUEUE_DEPTH; + + return 0; +} + +/* Attempt to allocate minimised memory space for sw rings */ +static inline void +alloc_sw_rings_min_mem(struct rte_bbdev *dev, struct acc_device *d, + uint16_t num_queues, int socket) +{ + rte_iova_t sw_rings_base_iova, next_64mb_align_addr_iova; + uint32_t next_64mb_align_offset; + rte_iova_t sw_ring_iova_end_addr; + void *base_addrs[ACC_SW_RING_MEM_ALLOC_ATTEMPTS]; + void *sw_rings_base; + int i = 0; + uint32_t q_sw_ring_size = ACC_MAX_QUEUE_DEPTH * get_desc_len(); + uint32_t dev_sw_ring_size = q_sw_ring_size * num_queues; + /* Free first in case this is a reconfiguration */ + rte_free(d->sw_rings_base); + + /* Find an aligned block of memory to store sw rings */ + while (i < ACC_SW_RING_MEM_ALLOC_ATTEMPTS) { + /* + * sw_ring allocated memory is guaranteed to be aligned to + * q_sw_ring_size at the condition that the requested size is + * less than the page size + */ + sw_rings_base = rte_zmalloc_socket( + dev->device->driver->name, + dev_sw_ring_size, q_sw_ring_size, socket); + + if (sw_rings_base == NULL) { + rte_acc_log(ERR, + "Failed to allocate memory for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + break; + } + + sw_rings_base_iova = rte_malloc_virt2iova(sw_rings_base); + next_64mb_align_offset = calc_mem_alignment_offset( + sw_rings_base, ACC_SIZE_64MBYTE); + next_64mb_align_addr_iova = sw_rings_base_iova + + next_64mb_align_offset; + sw_ring_iova_end_addr = sw_rings_base_iova + dev_sw_ring_size; + + /* Check if the end of the sw ring memory block is before the + * start of next 64MB aligned mem address + */ + if (sw_ring_iova_end_addr < next_64mb_align_addr_iova) { + d->sw_rings_iova = sw_rings_base_iova; + d->sw_rings = sw_rings_base; + d->sw_rings_base = sw_rings_base; + d->sw_ring_size = q_sw_ring_size; + d->sw_ring_max_depth = ACC_MAX_QUEUE_DEPTH; + break; + } + /* Store the address of the unaligned mem block */ + base_addrs[i] = sw_rings_base; + i++; + } + + /* Free all unaligned blocks of mem allocated in the loop */ + free_base_addresses(base_addrs, i); +} + +/* + * Find queue_id of a device queue based on details from the Info Ring. + * If a queue isn't found UINT16_MAX is returned. + */ +static inline uint16_t +get_queue_id_from_ring_info(struct rte_bbdev_data *data, + const union acc_info_ring_data ring_data) +{ + uint16_t queue_id; + + for (queue_id = 0; queue_id < data->num_queues; ++queue_id) { + struct acc_queue *acc_q = + data->queues[queue_id].queue_private; + if (acc_q != NULL && acc_q->aq_id == ring_data.aq_id && + acc_q->qgrp_id == ring_data.qg_id && + acc_q->vf_id == ring_data.vf_id) + return queue_id; + } + + return UINT16_MAX; +} + +/* Fill in a frame control word for turbo encoding. */ +static inline void +acc_fcw_te_fill(const struct rte_bbdev_enc_op *op, struct acc_fcw_te *fcw) +{ + fcw->code_block_mode = op->turbo_enc.code_block_mode; + if (fcw->code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { + fcw->k_neg = op->turbo_enc.tb_params.k_neg; + fcw->k_pos = op->turbo_enc.tb_params.k_pos; + fcw->c_neg = op->turbo_enc.tb_params.c_neg; + fcw->c = op->turbo_enc.tb_params.c; + fcw->ncb_neg = op->turbo_enc.tb_params.ncb_neg; + fcw->ncb_pos = op->turbo_enc.tb_params.ncb_pos; + + if (check_bit(op->turbo_enc.op_flags, + RTE_BBDEV_TURBO_RATE_MATCH)) { + fcw->bypass_rm = 0; + fcw->cab = op->turbo_enc.tb_params.cab; + fcw->ea = op->turbo_enc.tb_params.ea; + fcw->eb = op->turbo_enc.tb_params.eb; + } else { + /* E is set to the encoding output size when RM is + * bypassed. + */ + fcw->bypass_rm = 1; + fcw->cab = fcw->c_neg; + fcw->ea = 3 * fcw->k_neg + 12; + fcw->eb = 3 * fcw->k_pos + 12; + } + } else { /* For CB mode */ + fcw->k_pos = op->turbo_enc.cb_params.k; + fcw->ncb_pos = op->turbo_enc.cb_params.ncb; + + if (check_bit(op->turbo_enc.op_flags, + RTE_BBDEV_TURBO_RATE_MATCH)) { + fcw->bypass_rm = 0; + fcw->eb = op->turbo_enc.cb_params.e; + } else { + /* E is set to the encoding output size when RM is + * bypassed. + */ + fcw->bypass_rm = 1; + fcw->eb = 3 * fcw->k_pos + 12; + } + } + + fcw->bypass_rv_idx1 = check_bit(op->turbo_enc.op_flags, + RTE_BBDEV_TURBO_RV_INDEX_BYPASS); + fcw->code_block_crc = check_bit(op->turbo_enc.op_flags, + RTE_BBDEV_TURBO_CRC_24B_ATTACH); + fcw->rv_idx1 = op->turbo_enc.rv_index; +} + +/* Compute value of k0. + * Based on 3GPP 38.212 Table 5.4.2.1-2 + * Starting position of different redundancy versions, k0 + */ +static inline uint16_t +get_k0(uint16_t n_cb, uint16_t z_c, uint8_t bg, uint8_t rv_index) +{ + if (rv_index == 0) + return 0; + uint16_t n = (bg == 1 ? ACC_N_ZC_1 : ACC_N_ZC_2) * z_c; + if (n_cb == n) { + if (rv_index == 1) + return (bg == 1 ? ACC_K0_1_1 : ACC_K0_1_2) * z_c; + else if (rv_index == 2) + return (bg == 1 ? ACC_K0_2_1 : ACC_K0_2_2) * z_c; + else + return (bg == 1 ? ACC_K0_3_1 : ACC_K0_3_2) * z_c; + } + /* LBRM case - includes a division by N */ + if (unlikely(z_c == 0)) + return 0; + if (rv_index == 1) + return (((bg == 1 ? ACC_K0_1_1 : ACC_K0_1_2) * n_cb) + / n) * z_c; + else if (rv_index == 2) + return (((bg == 1 ? ACC_K0_2_1 : ACC_K0_2_2) * n_cb) + / n) * z_c; + else + return (((bg == 1 ? ACC_K0_3_1 : ACC_K0_3_2) * n_cb) + / n) * z_c; +} + +/* Fill in a frame control word for LDPC encoding. */ +static inline void +acc_fcw_le_fill(const struct rte_bbdev_enc_op *op, + struct acc_fcw_le *fcw, int num_cb, uint32_t default_e) +{ + fcw->qm = op->ldpc_enc.q_m; + fcw->nfiller = op->ldpc_enc.n_filler; + fcw->BG = (op->ldpc_enc.basegraph - 1); + fcw->Zc = op->ldpc_enc.z_c; + fcw->ncb = op->ldpc_enc.n_cb; + fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_enc.basegraph, + op->ldpc_enc.rv_index); + fcw->rm_e = (default_e == 0) ? op->ldpc_enc.cb_params.e : default_e; + fcw->crc_select = check_bit(op->ldpc_enc.op_flags, + RTE_BBDEV_LDPC_CRC_24B_ATTACH); + fcw->bypass_intlv = check_bit(op->ldpc_enc.op_flags, + RTE_BBDEV_LDPC_INTERLEAVER_BYPASS); + fcw->mcb_count = num_cb; +} + + +static inline void +acc_enqueue_status(struct rte_bbdev_queue_data *q_data, + enum rte_bbdev_enqueue_status status) +{ + q_data->enqueue_status = status; + q_data->queue_stats.enqueue_status_count[status]++; + + rte_acc_log(WARNING, "Enqueue Status: %s %#"PRIx64"", + rte_bbdev_enqueue_status_str(status), + q_data->queue_stats.enqueue_status_count[status]); +} + +static inline void +acc_enqueue_invalid(struct rte_bbdev_queue_data *q_data) +{ + acc_enqueue_status(q_data, RTE_BBDEV_ENQ_STATUS_INVALID_OP); +} + +static inline void +acc_enqueue_ring_full(struct rte_bbdev_queue_data *q_data) +{ + acc_enqueue_status(q_data, RTE_BBDEV_ENQ_STATUS_RING_FULL); +} + +static inline void +acc_enqueue_queue_full(struct rte_bbdev_queue_data *q_data) +{ + acc_enqueue_status(q_data, RTE_BBDEV_ENQ_STATUS_QUEUE_FULL); +} + +/* Enqueue a number of operations to HW and update software rings */ +static inline void +acc_dma_enqueue(struct acc_queue *q, uint16_t n, + struct rte_bbdev_stats *queue_stats) +{ + union acc_enqueue_reg_fmt enq_req; +#ifdef RTE_BBDEV_OFFLOAD_COST + uint64_t start_time = 0; + queue_stats->acc_offload_cycles = 0; +#else + RTE_SET_USED(queue_stats); +#endif + + enq_req.val = 0; + /* Setting offset, 100b for 256 DMA Desc */ + enq_req.addr_offset = ACC_DESC_OFFSET; + + /* Split ops into batches */ + do { + union acc_dma_desc *desc; + uint16_t enq_batch_size; + uint64_t offset; + rte_iova_t req_elem_addr; + + enq_batch_size = RTE_MIN(n, MAX_ENQ_BATCH_SIZE); + + /* Set flag on last descriptor in a batch */ + desc = q->ring_addr + ((q->sw_ring_head + enq_batch_size - 1) & + q->sw_ring_wrap_mask); + desc->req.last_desc_in_batch = 1; + + /* Calculate the 1st descriptor's address */ + offset = ((q->sw_ring_head & q->sw_ring_wrap_mask) * + sizeof(union acc_dma_desc)); + req_elem_addr = q->ring_addr_iova + offset; + + /* Fill enqueue struct */ + enq_req.num_elem = enq_batch_size; + /* low 6 bits are not needed */ + enq_req.req_elem_addr = (uint32_t)(req_elem_addr >> 6); + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "Req sdone", desc, sizeof(*desc)); +#endif + rte_acc_log(DEBUG, "Enqueue %u reqs (phys %#"PRIx64") to reg %p", + enq_batch_size, + req_elem_addr, + (void *)q->mmio_reg_enqueue); + + rte_wmb(); + +#ifdef RTE_BBDEV_OFFLOAD_COST + /* Start time measurement for enqueue function offload. */ + start_time = rte_rdtsc_precise(); +#endif + rte_acc_log(DEBUG, "Debug : MMIO Enqueue"); + mmio_write(q->mmio_reg_enqueue, enq_req.val); + +#ifdef RTE_BBDEV_OFFLOAD_COST + queue_stats->acc_offload_cycles += + rte_rdtsc_precise() - start_time; +#endif + + q->aq_enqueued++; + q->sw_ring_head += enq_batch_size; + n -= enq_batch_size; + + } while (n); + + +} + +/* Number of available descriptor in ring to enqueue */ +static inline uint32_t +acc_ring_avail_enq(struct acc_queue *q) +{ + return (q->sw_ring_depth - 1 + q->sw_ring_tail - q->sw_ring_head) % q->sw_ring_depth; +} + +/* Number of available descriptor in ring to dequeue */ +static inline uint32_t +acc_ring_avail_deq(struct acc_queue *q) +{ + return (q->sw_ring_depth + q->sw_ring_head - q->sw_ring_tail) % q->sw_ring_depth; +} + +/* Check room in AQ for the enqueues batches into Qmgr */ +static inline int32_t +acc_aq_avail(struct rte_bbdev_queue_data *q_data, uint16_t num_ops) +{ + struct acc_queue *q = q_data->queue_private; + int32_t aq_avail = q->aq_depth - + ((q->aq_enqueued - q->aq_dequeued + + ACC_MAX_QUEUE_DEPTH) % ACC_MAX_QUEUE_DEPTH) + - (num_ops >> 7); + if (aq_avail <= 0) + acc_enqueue_queue_full(q_data); + return aq_avail; +} + +/* Convert offset to harq index for harq_layout structure */ +static inline uint32_t hq_index(uint32_t offset) +{ + return (offset >> ACC_HARQ_OFFSET_SHIFT) & ACC_HARQ_OFFSET_MASK; +} + +/* Calculates number of CBs in processed encoder TB based on 'r' and input + * length. + */ +static inline uint8_t +get_num_cbs_in_tb_ldpc_enc(struct rte_bbdev_op_ldpc_enc *ldpc_enc) +{ + uint8_t c, r, crc24_bits = 0; + uint16_t k = (ldpc_enc->basegraph == 1 ? 22 : 10) * ldpc_enc->z_c + - ldpc_enc->n_filler; + uint8_t cbs_in_tb = 0; + int32_t length; + + length = ldpc_enc->input.length; + r = ldpc_enc->tb_params.r; + c = ldpc_enc->tb_params.c; + crc24_bits = 0; + if (check_bit(ldpc_enc->op_flags, RTE_BBDEV_LDPC_CRC_24B_ATTACH)) + crc24_bits = 24; + while (length > 0 && r < c) { + length -= (k - crc24_bits) >> 3; + r++; + cbs_in_tb++; + } + return cbs_in_tb; +} + +/* Calculates number of CBs in processed encoder TB based on 'r' and input + * length. + */ +static inline uint8_t +get_num_cbs_in_tb_enc(struct rte_bbdev_op_turbo_enc *turbo_enc) +{ + uint8_t c, c_neg, r, crc24_bits = 0; + uint16_t k, k_neg, k_pos; + uint8_t cbs_in_tb = 0; + int32_t length; + + length = turbo_enc->input.length; + r = turbo_enc->tb_params.r; + c = turbo_enc->tb_params.c; + c_neg = turbo_enc->tb_params.c_neg; + k_neg = turbo_enc->tb_params.k_neg; + k_pos = turbo_enc->tb_params.k_pos; + crc24_bits = 0; + if (check_bit(turbo_enc->op_flags, RTE_BBDEV_TURBO_CRC_24B_ATTACH)) + crc24_bits = 24; + while (length > 0 && r < c) { + k = (r < c_neg) ? k_neg : k_pos; + length -= (k - crc24_bits) >> 3; + r++; + cbs_in_tb++; + } + + return cbs_in_tb; +} + +/* Calculates number of CBs in processed decoder TB based on 'r' and input + * length. + */ +static inline uint16_t +get_num_cbs_in_tb_dec(struct rte_bbdev_op_turbo_dec *turbo_dec) +{ + uint8_t c, c_neg, r = 0; + uint16_t kw, k, k_neg, k_pos, cbs_in_tb = 0; + int32_t length; + + length = turbo_dec->input.length; + r = turbo_dec->tb_params.r; + c = turbo_dec->tb_params.c; + c_neg = turbo_dec->tb_params.c_neg; + k_neg = turbo_dec->tb_params.k_neg; + k_pos = turbo_dec->tb_params.k_pos; + while (length > 0 && r < c) { + k = (r < c_neg) ? k_neg : k_pos; + kw = RTE_ALIGN_CEIL(k + 4, 32) * 3; + length -= kw; + r++; + cbs_in_tb++; + } + + return cbs_in_tb; +} + +/* Calculates number of CBs in processed decoder TB based on 'r' and input + * length. + */ +static inline uint16_t +get_num_cbs_in_tb_ldpc_dec(struct rte_bbdev_op_ldpc_dec *ldpc_dec) +{ + uint16_t r, cbs_in_tb = 0; + int32_t length = ldpc_dec->input.length; + r = ldpc_dec->tb_params.r; + while (length > 0 && r < ldpc_dec->tb_params.c) { + length -= (r < ldpc_dec->tb_params.cab) ? + ldpc_dec->tb_params.ea : + ldpc_dec->tb_params.eb; + r++; + cbs_in_tb++; + } + return cbs_in_tb; +} + +/* Check we can mux encode operations with common FCW */ +static inline int16_t +check_mux(struct rte_bbdev_enc_op **ops, uint16_t num) { + uint16_t i; + if (num <= 1) + return 1; + for (i = 1; i < num; ++i) { + /* Only mux compatible code blocks */ + if (memcmp((uint8_t *)(&ops[i]->ldpc_enc) + ACC_ENC_OFFSET, + (uint8_t *)(&ops[0]->ldpc_enc) + + ACC_ENC_OFFSET, + ACC_CMP_ENC_SIZE) != 0) + return i; + } + /* Avoid multiplexing small inbound size frames */ + int Kp = (ops[0]->ldpc_enc.basegraph == 1 ? 22 : 10) * + ops[0]->ldpc_enc.z_c - ops[0]->ldpc_enc.n_filler; + if (Kp <= ACC_LIMIT_DL_MUX_BITS) + return 1; + return num; +} + +/* Check we can mux encode operations with common FCW */ +static inline bool +cmp_ldpc_dec_op(struct rte_bbdev_dec_op **ops) { + /* Only mux compatible code blocks */ + if (memcmp((uint8_t *)(&ops[0]->ldpc_dec) + ACC_DEC_OFFSET, + (uint8_t *)(&ops[1]->ldpc_dec) + + ACC_DEC_OFFSET, ACC_CMP_DEC_SIZE) != 0) { + return false; + } else + return true; +} + +/** + * Fills descriptor with data pointers of one block type. + * + * @param desc + * Pointer to DMA descriptor. + * @param input + * Pointer to pointer to input data which will be encoded. It can be changed + * and points to next segment in scatter-gather case. + * @param offset + * Input offset in rte_mbuf structure. It is used for calculating the point + * where data is starting. + * @param cb_len + * Length of currently processed Code Block + * @param seg_total_left + * It indicates how many bytes still left in segment (mbuf) for further + * processing. + * @param op_flags + * Store information about device capabilities + * @param next_triplet + * Index for ACC200 DMA Descriptor triplet + * @param scattergather + * Flag to support scatter-gather for the mbuf + * + * @return + * Returns index of next triplet on success, other value if lengths of + * pkt and processed cb do not match. + * + */ +static inline int +acc_dma_fill_blk_type_in(struct acc_dma_req_desc *desc, + struct rte_mbuf **input, uint32_t *offset, uint32_t cb_len, + uint32_t *seg_total_left, int next_triplet, + bool scattergather) +{ + uint32_t part_len; + struct rte_mbuf *m = *input; + if (scattergather) + part_len = (*seg_total_left < cb_len) ? + *seg_total_left : cb_len; + else + part_len = cb_len; + cb_len -= part_len; + *seg_total_left -= part_len; + + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(m, *offset); + desc->data_ptrs[next_triplet].blen = part_len; + desc->data_ptrs[next_triplet].blkid = ACC_DMA_BLKID_IN; + desc->data_ptrs[next_triplet].last = 0; + desc->data_ptrs[next_triplet].dma_ext = 0; + *offset += part_len; + next_triplet++; + + while (cb_len > 0) { + if (next_triplet < ACC_DMA_MAX_NUM_POINTERS_IN && m->next != NULL) { + + m = m->next; + *seg_total_left = rte_pktmbuf_data_len(m); + part_len = (*seg_total_left < cb_len) ? + *seg_total_left : + cb_len; + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(m, 0); + desc->data_ptrs[next_triplet].blen = part_len; + desc->data_ptrs[next_triplet].blkid = + ACC_DMA_BLKID_IN; + desc->data_ptrs[next_triplet].last = 0; + desc->data_ptrs[next_triplet].dma_ext = 0; + cb_len -= part_len; + *seg_total_left -= part_len; + /* Initializing offset for next segment (mbuf) */ + *offset = part_len; + next_triplet++; + } else { + rte_acc_log(ERR, + "Some data still left for processing: " + "data_left: %u, next_triplet: %u, next_mbuf: %p", + cb_len, next_triplet, m->next); + return -EINVAL; + } + } + /* Storing new mbuf as it could be changed in scatter-gather case*/ + *input = m; + + return next_triplet; +} + +/* Fills descriptor with data pointers of one block type. + * Returns index of next triplet + */ +static inline int +acc_dma_fill_blk_type(struct acc_dma_req_desc *desc, + struct rte_mbuf *mbuf, uint32_t offset, + uint32_t len, int next_triplet, int blk_id) +{ + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(mbuf, offset); + desc->data_ptrs[next_triplet].blen = len; + desc->data_ptrs[next_triplet].blkid = blk_id; + desc->data_ptrs[next_triplet].last = 0; + desc->data_ptrs[next_triplet].dma_ext = 0; + next_triplet++; + + return next_triplet; +} + +static inline void +acc_header_init(struct acc_dma_req_desc *desc) +{ + desc->word0 = ACC_DMA_DESC_TYPE; + desc->word1 = 0; /**< Timestamp could be disabled */ + desc->word2 = 0; + desc->word3 = 0; + desc->numCBs = 1; +} + +#ifdef RTE_LIBRTE_BBDEV_DEBUG +/* Check if any input data is unexpectedly left for processing */ +static inline int +check_mbuf_total_left(uint32_t mbuf_total_left) +{ + if (mbuf_total_left == 0) + return 0; + rte_acc_log(ERR, + "Some date still left for processing: mbuf_total_left = %u", + mbuf_total_left); + return -EINVAL; +} +#endif + +static inline int +acc_dma_desc_te_fill(struct rte_bbdev_enc_op *op, + struct acc_dma_req_desc *desc, struct rte_mbuf **input, + struct rte_mbuf *output, uint32_t *in_offset, + uint32_t *out_offset, uint32_t *out_length, + uint32_t *mbuf_total_left, uint32_t *seg_total_left, uint8_t r) +{ + int next_triplet = 1; /* FCW already done */ + uint32_t e, ea, eb, length; + uint16_t k, k_neg, k_pos; + uint8_t cab, c_neg; + + desc->word0 = ACC_DMA_DESC_TYPE; + desc->word1 = 0; /**< Timestamp could be disabled */ + desc->word2 = 0; + desc->word3 = 0; + desc->numCBs = 1; + + if (op->turbo_enc.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { + ea = op->turbo_enc.tb_params.ea; + eb = op->turbo_enc.tb_params.eb; + cab = op->turbo_enc.tb_params.cab; + k_neg = op->turbo_enc.tb_params.k_neg; + k_pos = op->turbo_enc.tb_params.k_pos; + c_neg = op->turbo_enc.tb_params.c_neg; + e = (r < cab) ? ea : eb; + k = (r < c_neg) ? k_neg : k_pos; + } else { + e = op->turbo_enc.cb_params.e; + k = op->turbo_enc.cb_params.k; + } + + if (check_bit(op->turbo_enc.op_flags, RTE_BBDEV_TURBO_CRC_24B_ATTACH)) + length = (k - 24) >> 3; + else + length = k >> 3; + + if (unlikely((*mbuf_total_left == 0) || (*mbuf_total_left < length))) { + rte_acc_log(ERR, + "Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u", + *mbuf_total_left, length); + return -1; + } + + next_triplet = acc_dma_fill_blk_type_in(desc, input, in_offset, + length, seg_total_left, next_triplet, + check_bit(op->turbo_enc.op_flags, + RTE_BBDEV_TURBO_ENC_SCATTER_GATHER)); + if (unlikely(next_triplet < 0)) { + rte_acc_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + desc->data_ptrs[next_triplet - 1].last = 1; + desc->m2dlen = next_triplet; + *mbuf_total_left -= length; + + /* Set output length */ + if (check_bit(op->turbo_enc.op_flags, RTE_BBDEV_TURBO_RATE_MATCH)) + /* Integer round up division by 8 */ + *out_length = (e + 7) >> 3; + else + *out_length = (k >> 3) * 3 + 2; + + next_triplet = acc_dma_fill_blk_type(desc, output, *out_offset, + *out_length, next_triplet, ACC_DMA_BLKID_OUT_ENC); + if (unlikely(next_triplet < 0)) { + rte_acc_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + op->turbo_enc.output.length += *out_length; + *out_offset += *out_length; + desc->data_ptrs[next_triplet - 1].last = 1; + desc->d2mlen = next_triplet - desc->m2dlen; + + desc->op_addr = op; + + return 0; +} + +static inline int +acc_pci_remove(struct rte_pci_device *pci_dev) +{ + struct rte_bbdev *bbdev; + int ret; + uint8_t dev_id; + + if (pci_dev == NULL) + return -EINVAL; + + /* Find device */ + bbdev = rte_bbdev_get_named_dev(pci_dev->device.name); + if (bbdev == NULL) { + rte_acc_log(CRIT, + "Couldn't find HW dev \"%s\" to uninitialise it", + pci_dev->device.name); + return -ENODEV; + } + dev_id = bbdev->data->dev_id; + + /* free device private memory before close */ + rte_free(bbdev->data->dev_private); + + /* Close device */ + ret = rte_bbdev_close(dev_id); + if (ret < 0) + rte_acc_log(ERR, + "Device %i failed to close during uninit: %i", + dev_id, ret); + + /* release bbdev from library */ + rte_bbdev_release(bbdev); + + return 0; +} + +#endif /* _ACC_COMMON_H_ */ diff --git a/drivers/baseband/acc100/rte_acc100_cfg.h b/drivers/baseband/acc100/rte_acc100_cfg.h index b70803d..732c03b 100644 --- a/drivers/baseband/acc100/rte_acc100_cfg.h +++ b/drivers/baseband/acc100/rte_acc100_cfg.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2020 Intel Corporation + * Copyright(c) 2022 Intel Corporation */ #ifndef _RTE_ACC100_CFG_H_ @@ -18,76 +18,12 @@ #include <stdint.h> #include <stdbool.h> +#include "rte_acc_common_cfg.h" #ifdef __cplusplus extern "C" { #endif -/**< Number of Virtual Functions ACC100 supports */ -#define RTE_ACC100_NUM_VFS 16 -/** - * Definition of Queue Topology for ACC100 Configuration - * Some level of details is abstracted out to expose a clean interface - * given that comprehensive flexibility is not required - */ -struct rte_acc100_queue_topology { - /** Number of QGroups in incremental order of priority */ - uint16_t num_qgroups; - /** - * All QGroups have the same number of AQs here. - * Note : Could be made a 16-array if more flexibility is really - * required - */ - uint16_t num_aqs_per_groups; - /** - * Depth of the AQs is the same of all QGroups here. Log2 Enum : 2^N - * Note : Could be made a 16-array if more flexibility is really - * required - */ - uint16_t aq_depth_log2; - /** - * Index of the first Queue Group Index - assuming contiguity - * Initialized as -1 - */ - int8_t first_qgroup_index; -}; - -/** - * Definition of Arbitration related parameters for ACC100 Configuration - */ -struct rte_acc100_arbitration { - /** Default Weight for VF Fairness Arbitration */ - uint16_t round_robin_weight; - uint32_t gbr_threshold1; /**< Guaranteed Bitrate Threshold 1 */ - uint32_t gbr_threshold2; /**< Guaranteed Bitrate Threshold 2 */ -}; - -/** - * Structure to pass ACC100 configuration. - * Note: all VF Bundles will have the same configuration. - */ -struct rte_acc100_conf { - bool pf_mode_en; /**< 1 if PF is used for dataplane, 0 for VFs */ - /** 1 if input '1' bit is represented by a positive LLR value, 0 if '1' - * bit is represented by a negative value. - */ - bool input_pos_llr_1_bit; - /** 1 if output '1' bit is represented by a positive value, 0 if '1' - * bit is represented by a negative value. - */ - bool output_pos_llr_1_bit; - uint16_t num_vf_bundles; /**< Number of VF bundles to setup */ - /** Queue topology for each operation type */ - struct rte_acc100_queue_topology q_ul_4g; - struct rte_acc100_queue_topology q_dl_4g; - struct rte_acc100_queue_topology q_ul_5g; - struct rte_acc100_queue_topology q_dl_5g; - /** Arbitration configuration for each operation type */ - struct rte_acc100_arbitration arb_ul_4g[RTE_ACC100_NUM_VFS]; - struct rte_acc100_arbitration arb_dl_4g[RTE_ACC100_NUM_VFS]; - struct rte_acc100_arbitration arb_ul_5g[RTE_ACC100_NUM_VFS]; - struct rte_acc100_arbitration arb_dl_5g[RTE_ACC100_NUM_VFS]; -}; /** * Configure a ACC100/ACC101 device in PF mode notably for bbdev-test @@ -104,7 +40,7 @@ struct rte_acc100_conf { */ __rte_experimental int -rte_acc10x_configure(const char *dev_name, struct rte_acc100_conf *conf); +rte_acc10x_configure(const char *dev_name, struct rte_acc_conf *conf); #ifdef __cplusplus } diff --git a/drivers/baseband/acc100/rte_acc100_pmd.c b/drivers/baseband/acc100/rte_acc100_pmd.c index 18ec04a..385a65d 100644 --- a/drivers/baseband/acc100/rte_acc100_pmd.c +++ b/drivers/baseband/acc100/rte_acc100_pmd.c @@ -30,48 +30,6 @@ RTE_LOG_REGISTER_DEFAULT(acc100_logtype, NOTICE); #endif -/* Write to MMIO register address */ -static inline void -mmio_write(void *addr, uint32_t value) -{ - *((volatile uint32_t *)(addr)) = rte_cpu_to_le_32(value); -} - -/* Write a register of a ACC100 device */ -static inline void -acc100_reg_write(struct acc100_device *d, uint32_t offset, uint32_t value) -{ - void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset); - mmio_write(reg_addr, value); - usleep(ACC100_LONG_WAIT); -} - -/* Read a register of a ACC100 device */ -static inline uint32_t -acc100_reg_read(struct acc100_device *d, uint32_t offset) -{ - - void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset); - uint32_t ret = *((volatile uint32_t *)(reg_addr)); - return rte_le_to_cpu_32(ret); -} - -/* Basic Implementation of Log2 for exact 2^N */ -static inline uint32_t -log2_basic(uint32_t value) -{ - return (value == 0) ? 0 : rte_bsf32(value); -} - -/* Calculate memory alignment offset assuming alignment is 2^N */ -static inline uint32_t -calc_mem_alignment_offset(void *unaligned_virt_mem, uint32_t alignment) -{ - rte_iova_t unaligned_phy_mem = rte_malloc_virt2iova(unaligned_virt_mem); - return (uint32_t)(alignment - - (unaligned_phy_mem & (alignment-1))); -} - /* Calculate the offset of the enqueue register */ static inline uint32_t queue_offset(bool pf_device, uint8_t vf_id, uint8_t qgrp_id, uint16_t aq_id) @@ -88,17 +46,17 @@ /* Return the accelerator enum for a Queue Group Index */ static inline int -accFromQgid(int qg_idx, const struct rte_acc100_conf *acc100_conf) +accFromQgid(int qg_idx, const struct rte_acc_conf *acc_conf) { int accQg[ACC100_NUM_QGRPS]; int NumQGroupsPerFn[NUM_ACC]; int acc, qgIdx, qgIndex = 0; for (qgIdx = 0; qgIdx < ACC100_NUM_QGRPS; qgIdx++) accQg[qgIdx] = 0; - NumQGroupsPerFn[UL_4G] = acc100_conf->q_ul_4g.num_qgroups; - NumQGroupsPerFn[UL_5G] = acc100_conf->q_ul_5g.num_qgroups; - NumQGroupsPerFn[DL_4G] = acc100_conf->q_dl_4g.num_qgroups; - NumQGroupsPerFn[DL_5G] = acc100_conf->q_dl_5g.num_qgroups; + NumQGroupsPerFn[UL_4G] = acc_conf->q_ul_4g.num_qgroups; + NumQGroupsPerFn[UL_5G] = acc_conf->q_ul_5g.num_qgroups; + NumQGroupsPerFn[DL_4G] = acc_conf->q_dl_4g.num_qgroups; + NumQGroupsPerFn[DL_5G] = acc_conf->q_dl_5g.num_qgroups; for (acc = UL_4G; acc < NUM_ACC; acc++) for (qgIdx = 0; qgIdx < NumQGroupsPerFn[acc]; qgIdx++) accQg[qgIndex++] = acc; @@ -108,23 +66,23 @@ /* Return the queue topology for a Queue Group Index */ static inline void -qtopFromAcc(struct rte_acc100_queue_topology **qtop, int acc_enum, - struct rte_acc100_conf *acc100_conf) +qtopFromAcc(struct rte_acc_queue_topology **qtop, int acc_enum, + struct rte_acc_conf *acc_conf) { - struct rte_acc100_queue_topology *p_qtop; + struct rte_acc_queue_topology *p_qtop; p_qtop = NULL; switch (acc_enum) { case UL_4G: - p_qtop = &(acc100_conf->q_ul_4g); + p_qtop = &(acc_conf->q_ul_4g); break; case UL_5G: - p_qtop = &(acc100_conf->q_ul_5g); + p_qtop = &(acc_conf->q_ul_5g); break; case DL_4G: - p_qtop = &(acc100_conf->q_dl_4g); + p_qtop = &(acc_conf->q_dl_4g); break; case DL_5G: - p_qtop = &(acc100_conf->q_dl_5g); + p_qtop = &(acc_conf->q_dl_5g); break; default: /* NOTREACHED */ @@ -136,11 +94,11 @@ /* Return the AQ depth for a Queue Group Index */ static inline int -aqDepth(int qg_idx, struct rte_acc100_conf *acc100_conf) +aqDepth(int qg_idx, struct rte_acc_conf *acc_conf) { - struct rte_acc100_queue_topology *q_top = NULL; - int acc_enum = accFromQgid(qg_idx, acc100_conf); - qtopFromAcc(&q_top, acc_enum, acc100_conf); + struct rte_acc_queue_topology *q_top = NULL; + int acc_enum = accFromQgid(qg_idx, acc_conf); + qtopFromAcc(&q_top, acc_enum, acc_conf); if (unlikely(q_top == NULL)) return 1; return RTE_MAX(1, q_top->aq_depth_log2); @@ -148,39 +106,39 @@ /* Return the AQ depth for a Queue Group Index */ static inline int -aqNum(int qg_idx, struct rte_acc100_conf *acc100_conf) +aqNum(int qg_idx, struct rte_acc_conf *acc_conf) { - struct rte_acc100_queue_topology *q_top = NULL; - int acc_enum = accFromQgid(qg_idx, acc100_conf); - qtopFromAcc(&q_top, acc_enum, acc100_conf); + struct rte_acc_queue_topology *q_top = NULL; + int acc_enum = accFromQgid(qg_idx, acc_conf); + qtopFromAcc(&q_top, acc_enum, acc_conf); if (unlikely(q_top == NULL)) return 0; return q_top->num_aqs_per_groups; } static void -initQTop(struct rte_acc100_conf *acc100_conf) +initQTop(struct rte_acc_conf *acc_conf) { - acc100_conf->q_ul_4g.num_aqs_per_groups = 0; - acc100_conf->q_ul_4g.num_qgroups = 0; - acc100_conf->q_ul_4g.first_qgroup_index = -1; - acc100_conf->q_ul_5g.num_aqs_per_groups = 0; - acc100_conf->q_ul_5g.num_qgroups = 0; - acc100_conf->q_ul_5g.first_qgroup_index = -1; - acc100_conf->q_dl_4g.num_aqs_per_groups = 0; - acc100_conf->q_dl_4g.num_qgroups = 0; - acc100_conf->q_dl_4g.first_qgroup_index = -1; - acc100_conf->q_dl_5g.num_aqs_per_groups = 0; - acc100_conf->q_dl_5g.num_qgroups = 0; - acc100_conf->q_dl_5g.first_qgroup_index = -1; + acc_conf->q_ul_4g.num_aqs_per_groups = 0; + acc_conf->q_ul_4g.num_qgroups = 0; + acc_conf->q_ul_4g.first_qgroup_index = -1; + acc_conf->q_ul_5g.num_aqs_per_groups = 0; + acc_conf->q_ul_5g.num_qgroups = 0; + acc_conf->q_ul_5g.first_qgroup_index = -1; + acc_conf->q_dl_4g.num_aqs_per_groups = 0; + acc_conf->q_dl_4g.num_qgroups = 0; + acc_conf->q_dl_4g.first_qgroup_index = -1; + acc_conf->q_dl_5g.num_aqs_per_groups = 0; + acc_conf->q_dl_5g.num_qgroups = 0; + acc_conf->q_dl_5g.first_qgroup_index = -1; } static inline void -updateQtop(uint8_t acc, uint8_t qg, struct rte_acc100_conf *acc100_conf, - struct acc100_device *d) { +updateQtop(uint8_t acc, uint8_t qg, struct rte_acc_conf *acc_conf, + struct acc_device *d) { uint32_t reg; - struct rte_acc100_queue_topology *q_top = NULL; - qtopFromAcc(&q_top, acc, acc100_conf); + struct rte_acc_queue_topology *q_top = NULL; + qtopFromAcc(&q_top, acc, acc_conf); if (unlikely(q_top == NULL)) return; uint16_t aq; @@ -188,17 +146,17 @@ if (q_top->first_qgroup_index == -1) { q_top->first_qgroup_index = qg; /* Can be optimized to assume all are enabled by default */ - reg = acc100_reg_read(d, queue_offset(d->pf_device, + reg = acc_reg_read(d, queue_offset(d->pf_device, 0, qg, ACC100_NUM_AQS - 1)); - if (reg & ACC100_QUEUE_ENABLE) { + if (reg & ACC_QUEUE_ENABLE) { q_top->num_aqs_per_groups = ACC100_NUM_AQS; return; } q_top->num_aqs_per_groups = 0; for (aq = 0; aq < ACC100_NUM_AQS; aq++) { - reg = acc100_reg_read(d, queue_offset(d->pf_device, + reg = acc_reg_read(d, queue_offset(d->pf_device, 0, qg, aq)); - if (reg & ACC100_QUEUE_ENABLE) + if (reg & ACC_QUEUE_ENABLE) q_top->num_aqs_per_groups++; } } @@ -208,8 +166,8 @@ static inline void fetch_acc100_config(struct rte_bbdev *dev) { - struct acc100_device *d = dev->data->dev_private; - struct rte_acc100_conf *acc100_conf = &d->acc100_conf; + struct acc_device *d = dev->data->dev_private; + struct rte_acc_conf *acc_conf = &d->acc_conf; const struct acc100_registry_addr *reg_addr; uint8_t acc, qg; uint32_t reg, reg_aq, reg_len0, reg_len1; @@ -225,201 +183,80 @@ else reg_addr = &vf_reg_addr; - d->ddr_size = (1 + acc100_reg_read(d, reg_addr->ddr_range)) << 10; + d->ddr_size = (1 + acc_reg_read(d, reg_addr->ddr_range)) << 10; /* Single VF Bundle by VF */ - acc100_conf->num_vf_bundles = 1; - initQTop(acc100_conf); - - struct rte_acc100_queue_topology *q_top = NULL; - int qman_func_id[ACC100_NUM_ACCS] = {ACC100_ACCMAP_0, ACC100_ACCMAP_1, - ACC100_ACCMAP_2, ACC100_ACCMAP_3, ACC100_ACCMAP_4}; - reg = acc100_reg_read(d, reg_addr->qman_group_func); - for (qg = 0; qg < ACC100_NUM_QGRPS_PER_WORD; qg++) { - reg_aq = acc100_reg_read(d, + acc_conf->num_vf_bundles = 1; + initQTop(acc_conf); + + struct rte_acc_queue_topology *q_top = NULL; + int qman_func_id[ACC100_NUM_ACCS] = {ACC_ACCMAP_0, ACC_ACCMAP_1, + ACC_ACCMAP_2, ACC_ACCMAP_3, ACC_ACCMAP_4}; + reg = acc_reg_read(d, reg_addr->qman_group_func); + for (qg = 0; qg < ACC_NUM_QGRPS_PER_WORD; qg++) { + reg_aq = acc_reg_read(d, queue_offset(d->pf_device, 0, qg, 0)); - if (reg_aq & ACC100_QUEUE_ENABLE) { + if (reg_aq & ACC_QUEUE_ENABLE) { uint32_t idx = (reg >> (qg * 4)) & 0x7; if (idx < ACC100_NUM_ACCS) { acc = qman_func_id[idx]; - updateQtop(acc, qg, acc100_conf, d); + updateQtop(acc, qg, acc_conf, d); } } } /* Check the depth of the AQs*/ - reg_len0 = acc100_reg_read(d, reg_addr->depth_log0_offset); - reg_len1 = acc100_reg_read(d, reg_addr->depth_log1_offset); + reg_len0 = acc_reg_read(d, reg_addr->depth_log0_offset); + reg_len1 = acc_reg_read(d, reg_addr->depth_log1_offset); for (acc = 0; acc < NUM_ACC; acc++) { - qtopFromAcc(&q_top, acc, acc100_conf); - if (q_top->first_qgroup_index < ACC100_NUM_QGRPS_PER_WORD) + qtopFromAcc(&q_top, acc, acc_conf); + if (q_top->first_qgroup_index < ACC_NUM_QGRPS_PER_WORD) q_top->aq_depth_log2 = (reg_len0 >> (q_top->first_qgroup_index * 4)) & 0xF; else q_top->aq_depth_log2 = (reg_len1 >> ((q_top->first_qgroup_index - - ACC100_NUM_QGRPS_PER_WORD) * 4)) + ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF; } /* Read PF mode */ if (d->pf_device) { - reg_mode = acc100_reg_read(d, HWPfHiPfMode); - acc100_conf->pf_mode_en = (reg_mode == ACC100_PF_VAL) ? 1 : 0; + reg_mode = acc_reg_read(d, HWPfHiPfMode); + acc_conf->pf_mode_en = (reg_mode == ACC_PF_VAL) ? 1 : 0; } rte_bbdev_log_debug( "%s Config LLR SIGN IN/OUT %s %s QG %u %u %u %u AQ %u %u %u %u Len %u %u %u %u\n", (d->pf_device) ? "PF" : "VF", - (acc100_conf->input_pos_llr_1_bit) ? "POS" : "NEG", - (acc100_conf->output_pos_llr_1_bit) ? "POS" : "NEG", - acc100_conf->q_ul_4g.num_qgroups, - acc100_conf->q_dl_4g.num_qgroups, - acc100_conf->q_ul_5g.num_qgroups, - acc100_conf->q_dl_5g.num_qgroups, - acc100_conf->q_ul_4g.num_aqs_per_groups, - acc100_conf->q_dl_4g.num_aqs_per_groups, - acc100_conf->q_ul_5g.num_aqs_per_groups, - acc100_conf->q_dl_5g.num_aqs_per_groups, - acc100_conf->q_ul_4g.aq_depth_log2, - acc100_conf->q_dl_4g.aq_depth_log2, - acc100_conf->q_ul_5g.aq_depth_log2, - acc100_conf->q_dl_5g.aq_depth_log2); -} - -static void -free_base_addresses(void **base_addrs, int size) -{ - int i; - for (i = 0; i < size; i++) - rte_free(base_addrs[i]); -} - -static inline uint32_t -get_desc_len(void) -{ - return sizeof(union acc100_dma_desc); -} - -/* Allocate the 2 * 64MB block for the sw rings */ -static int -alloc_2x64mb_sw_rings_mem(struct rte_bbdev *dev, struct acc100_device *d, - int socket) -{ - uint32_t sw_ring_size = ACC100_SIZE_64MBYTE; - d->sw_rings_base = rte_zmalloc_socket(dev->device->driver->name, - 2 * sw_ring_size, RTE_CACHE_LINE_SIZE, socket); - if (d->sw_rings_base == NULL) { - rte_bbdev_log(ERR, "Failed to allocate memory for %s:%u", - dev->device->driver->name, - dev->data->dev_id); - return -ENOMEM; - } - uint32_t next_64mb_align_offset = calc_mem_alignment_offset( - d->sw_rings_base, ACC100_SIZE_64MBYTE); - d->sw_rings = RTE_PTR_ADD(d->sw_rings_base, next_64mb_align_offset); - d->sw_rings_iova = rte_malloc_virt2iova(d->sw_rings_base) + - next_64mb_align_offset; - d->sw_ring_size = ACC100_MAX_QUEUE_DEPTH * get_desc_len(); - d->sw_ring_max_depth = ACC100_MAX_QUEUE_DEPTH; - - return 0; -} - -/* Attempt to allocate minimised memory space for sw rings */ -static void -alloc_sw_rings_min_mem(struct rte_bbdev *dev, struct acc100_device *d, - uint16_t num_queues, int socket) -{ - rte_iova_t sw_rings_base_iova, next_64mb_align_addr_iova; - uint32_t next_64mb_align_offset; - rte_iova_t sw_ring_iova_end_addr; - void *base_addrs[ACC100_SW_RING_MEM_ALLOC_ATTEMPTS]; - void *sw_rings_base; - int i = 0; - uint32_t q_sw_ring_size = ACC100_MAX_QUEUE_DEPTH * get_desc_len(); - uint32_t dev_sw_ring_size = q_sw_ring_size * num_queues; - - /* Find an aligned block of memory to store sw rings */ - while (i < ACC100_SW_RING_MEM_ALLOC_ATTEMPTS) { - /* - * sw_ring allocated memory is guaranteed to be aligned to - * q_sw_ring_size at the condition that the requested size is - * less than the page size - */ - sw_rings_base = rte_zmalloc_socket( - dev->device->driver->name, - dev_sw_ring_size, q_sw_ring_size, socket); - - if (sw_rings_base == NULL) { - rte_bbdev_log(ERR, - "Failed to allocate memory for %s:%u", - dev->device->driver->name, - dev->data->dev_id); - break; - } - - sw_rings_base_iova = rte_malloc_virt2iova(sw_rings_base); - next_64mb_align_offset = calc_mem_alignment_offset( - sw_rings_base, ACC100_SIZE_64MBYTE); - next_64mb_align_addr_iova = sw_rings_base_iova + - next_64mb_align_offset; - sw_ring_iova_end_addr = sw_rings_base_iova + dev_sw_ring_size; - - /* Check if the end of the sw ring memory block is before the - * start of next 64MB aligned mem address - */ - if (sw_ring_iova_end_addr < next_64mb_align_addr_iova) { - d->sw_rings_iova = sw_rings_base_iova; - d->sw_rings = sw_rings_base; - d->sw_rings_base = sw_rings_base; - d->sw_ring_size = q_sw_ring_size; - d->sw_ring_max_depth = ACC100_MAX_QUEUE_DEPTH; - break; - } - /* Store the address of the unaligned mem block */ - base_addrs[i] = sw_rings_base; - i++; - } - - /* Free all unaligned blocks of mem allocated in the loop */ - free_base_addresses(base_addrs, i); -} - -/* - * Find queue_id of a device queue based on details from the Info Ring. - * If a queue isn't found UINT16_MAX is returned. - */ -static inline uint16_t -get_queue_id_from_ring_info(struct rte_bbdev_data *data, - const union acc100_info_ring_data ring_data) -{ - uint16_t queue_id; - - for (queue_id = 0; queue_id < data->num_queues; ++queue_id) { - struct acc100_queue *acc100_q = - data->queues[queue_id].queue_private; - if (acc100_q != NULL && acc100_q->aq_id == ring_data.aq_id && - acc100_q->qgrp_id == ring_data.qg_id && - acc100_q->vf_id == ring_data.vf_id) - return queue_id; - } - - return UINT16_MAX; + (acc_conf->input_pos_llr_1_bit) ? "POS" : "NEG", + (acc_conf->output_pos_llr_1_bit) ? "POS" : "NEG", + acc_conf->q_ul_4g.num_qgroups, + acc_conf->q_dl_4g.num_qgroups, + acc_conf->q_ul_5g.num_qgroups, + acc_conf->q_dl_5g.num_qgroups, + acc_conf->q_ul_4g.num_aqs_per_groups, + acc_conf->q_dl_4g.num_aqs_per_groups, + acc_conf->q_ul_5g.num_aqs_per_groups, + acc_conf->q_dl_5g.num_aqs_per_groups, + acc_conf->q_ul_4g.aq_depth_log2, + acc_conf->q_dl_4g.aq_depth_log2, + acc_conf->q_ul_5g.aq_depth_log2, + acc_conf->q_dl_5g.aq_depth_log2); } /* Checks PF Info Ring to find the interrupt cause and handles it accordingly */ static inline void -acc100_check_ir(struct acc100_device *acc100_dev) +acc100_check_ir(struct acc_device *acc100_dev) { - volatile union acc100_info_ring_data *ring_data; + volatile union acc_info_ring_data *ring_data; uint16_t info_ring_head = acc100_dev->info_ring_head; if (acc100_dev->info_ring == NULL) return; ring_data = acc100_dev->info_ring + (acc100_dev->info_ring_head & - ACC100_INFO_RING_MASK); + ACC_INFO_RING_MASK); while (ring_data->valid) { if ((ring_data->int_nb < ACC100_PF_INT_DMA_DL_DESC_IRQ) || ( @@ -431,7 +268,7 @@ ring_data->val = 0; info_ring_head++; ring_data = acc100_dev->info_ring + - (info_ring_head & ACC100_INFO_RING_MASK); + (info_ring_head & ACC_INFO_RING_MASK); } } @@ -439,12 +276,12 @@ static inline void acc100_pf_interrupt_handler(struct rte_bbdev *dev) { - struct acc100_device *acc100_dev = dev->data->dev_private; - volatile union acc100_info_ring_data *ring_data; - struct acc100_deq_intr_details deq_intr_det; + struct acc_device *acc100_dev = dev->data->dev_private; + volatile union acc_info_ring_data *ring_data; + struct acc_deq_intr_details deq_intr_det; ring_data = acc100_dev->info_ring + (acc100_dev->info_ring_head & - ACC100_INFO_RING_MASK); + ACC_INFO_RING_MASK); while (ring_data->valid) { @@ -481,7 +318,7 @@ ++acc100_dev->info_ring_head; ring_data = acc100_dev->info_ring + (acc100_dev->info_ring_head & - ACC100_INFO_RING_MASK); + ACC_INFO_RING_MASK); } } @@ -489,12 +326,12 @@ static inline void acc100_vf_interrupt_handler(struct rte_bbdev *dev) { - struct acc100_device *acc100_dev = dev->data->dev_private; - volatile union acc100_info_ring_data *ring_data; - struct acc100_deq_intr_details deq_intr_det; + struct acc_device *acc100_dev = dev->data->dev_private; + volatile union acc_info_ring_data *ring_data; + struct acc_deq_intr_details deq_intr_det; ring_data = acc100_dev->info_ring + (acc100_dev->info_ring_head & - ACC100_INFO_RING_MASK); + ACC_INFO_RING_MASK); while (ring_data->valid) { @@ -533,7 +370,7 @@ ring_data->valid = 0; ++acc100_dev->info_ring_head; ring_data = acc100_dev->info_ring + (acc100_dev->info_ring_head - & ACC100_INFO_RING_MASK); + & ACC_INFO_RING_MASK); } } @@ -542,7 +379,7 @@ acc100_dev_interrupt_handler(void *cb_arg) { struct rte_bbdev *dev = cb_arg; - struct acc100_device *acc100_dev = dev->data->dev_private; + struct acc_device *acc100_dev = dev->data->dev_private; /* Read info ring */ if (acc100_dev->pf_device) @@ -555,7 +392,7 @@ static int allocate_info_ring(struct rte_bbdev *dev) { - struct acc100_device *d = dev->data->dev_private; + struct acc_device *d = dev->data->dev_private; const struct acc100_registry_addr *reg_addr; rte_iova_t info_ring_iova; uint32_t phys_low, phys_high; @@ -570,7 +407,7 @@ reg_addr = &vf_reg_addr; /* Allocate InfoRing */ d->info_ring = rte_zmalloc_socket("Info Ring", - ACC100_INFO_RING_NUM_ENTRIES * + ACC_INFO_RING_NUM_ENTRIES * sizeof(*d->info_ring), RTE_CACHE_LINE_SIZE, dev->data->socket_id); if (d->info_ring == NULL) { @@ -585,11 +422,11 @@ /* Setup Info Ring */ phys_high = (uint32_t)(info_ring_iova >> 32); phys_low = (uint32_t)(info_ring_iova); - acc100_reg_write(d, reg_addr->info_ring_hi, phys_high); - acc100_reg_write(d, reg_addr->info_ring_lo, phys_low); - acc100_reg_write(d, reg_addr->info_ring_en, ACC100_REG_IRQ_EN_ALL); - d->info_ring_head = (acc100_reg_read(d, reg_addr->info_ring_ptr) & - 0xFFF) / sizeof(union acc100_info_ring_data); + acc_reg_write(d, reg_addr->info_ring_hi, phys_high); + acc_reg_write(d, reg_addr->info_ring_lo, phys_low); + acc_reg_write(d, reg_addr->info_ring_en, ACC100_REG_IRQ_EN_ALL); + d->info_ring_head = (acc_reg_read(d, reg_addr->info_ring_ptr) & + 0xFFF) / sizeof(union acc_info_ring_data); return 0; } @@ -599,11 +436,11 @@ acc100_setup_queues(struct rte_bbdev *dev, uint16_t num_queues, int socket_id) { uint32_t phys_low, phys_high, value; - struct acc100_device *d = dev->data->dev_private; + struct acc_device *d = dev->data->dev_private; const struct acc100_registry_addr *reg_addr; int ret; - if (d->pf_device && !d->acc100_conf.pf_mode_en) { + if (d->pf_device && !d->acc_conf.pf_mode_en) { rte_bbdev_log(NOTICE, "%s has PF mode disabled. This PF can't be used.", dev->data->name); @@ -629,7 +466,7 @@ * Note : Assuming only VF0 bundle is used for PF mode */ phys_high = (uint32_t)(d->sw_rings_iova >> 32); - phys_low = (uint32_t)(d->sw_rings_iova & ~(ACC100_SIZE_64MBYTE-1)); + phys_low = (uint32_t)(d->sw_rings_iova & ~(ACC_SIZE_64MBYTE-1)); /* Choose correct registry addresses for the device type */ if (d->pf_device) @@ -642,23 +479,23 @@ /* Release AXI from PF */ if (d->pf_device) - acc100_reg_write(d, HWPfDmaAxiControl, 1); + acc_reg_write(d, HWPfDmaAxiControl, 1); - acc100_reg_write(d, reg_addr->dma_ring_ul5g_hi, phys_high); - acc100_reg_write(d, reg_addr->dma_ring_ul5g_lo, phys_low); - acc100_reg_write(d, reg_addr->dma_ring_dl5g_hi, phys_high); - acc100_reg_write(d, reg_addr->dma_ring_dl5g_lo, phys_low); - acc100_reg_write(d, reg_addr->dma_ring_ul4g_hi, phys_high); - acc100_reg_write(d, reg_addr->dma_ring_ul4g_lo, phys_low); - acc100_reg_write(d, reg_addr->dma_ring_dl4g_hi, phys_high); - acc100_reg_write(d, reg_addr->dma_ring_dl4g_lo, phys_low); + acc_reg_write(d, reg_addr->dma_ring_ul5g_hi, phys_high); + acc_reg_write(d, reg_addr->dma_ring_ul5g_lo, phys_low); + acc_reg_write(d, reg_addr->dma_ring_dl5g_hi, phys_high); + acc_reg_write(d, reg_addr->dma_ring_dl5g_lo, phys_low); + acc_reg_write(d, reg_addr->dma_ring_ul4g_hi, phys_high); + acc_reg_write(d, reg_addr->dma_ring_ul4g_lo, phys_low); + acc_reg_write(d, reg_addr->dma_ring_dl4g_hi, phys_high); + acc_reg_write(d, reg_addr->dma_ring_dl4g_lo, phys_low); /* * Configure Ring Size to the max queue ring size * (used for wrapping purpose) */ value = log2_basic(d->sw_ring_size / 64); - acc100_reg_write(d, reg_addr->ring_size, value); + acc_reg_write(d, reg_addr->ring_size, value); /* Configure tail pointer for use when SDONE enabled */ d->tail_ptrs = rte_zmalloc_socket( @@ -676,14 +513,14 @@ phys_high = (uint32_t)(d->tail_ptr_iova >> 32); phys_low = (uint32_t)(d->tail_ptr_iova); - acc100_reg_write(d, reg_addr->tail_ptrs_ul5g_hi, phys_high); - acc100_reg_write(d, reg_addr->tail_ptrs_ul5g_lo, phys_low); - acc100_reg_write(d, reg_addr->tail_ptrs_dl5g_hi, phys_high); - acc100_reg_write(d, reg_addr->tail_ptrs_dl5g_lo, phys_low); - acc100_reg_write(d, reg_addr->tail_ptrs_ul4g_hi, phys_high); - acc100_reg_write(d, reg_addr->tail_ptrs_ul4g_lo, phys_low); - acc100_reg_write(d, reg_addr->tail_ptrs_dl4g_hi, phys_high); - acc100_reg_write(d, reg_addr->tail_ptrs_dl4g_lo, phys_low); + acc_reg_write(d, reg_addr->tail_ptrs_ul5g_hi, phys_high); + acc_reg_write(d, reg_addr->tail_ptrs_ul5g_lo, phys_low); + acc_reg_write(d, reg_addr->tail_ptrs_dl5g_hi, phys_high); + acc_reg_write(d, reg_addr->tail_ptrs_dl5g_lo, phys_low); + acc_reg_write(d, reg_addr->tail_ptrs_ul4g_hi, phys_high); + acc_reg_write(d, reg_addr->tail_ptrs_ul4g_lo, phys_low); + acc_reg_write(d, reg_addr->tail_ptrs_dl4g_hi, phys_high); + acc_reg_write(d, reg_addr->tail_ptrs_dl4g_lo, phys_low); ret = allocate_info_ring(dev); if (ret < 0) { @@ -694,7 +531,7 @@ } d->harq_layout = rte_zmalloc_socket("HARQ Layout", - ACC100_HARQ_LAYOUT * sizeof(*d->harq_layout), + ACC_HARQ_LAYOUT * sizeof(*d->harq_layout), RTE_CACHE_LINE_SIZE, dev->data->socket_id); if (d->harq_layout == NULL) { rte_bbdev_log(ERR, "Failed to allocate harq_layout for %s:%u", @@ -718,7 +555,7 @@ acc100_intr_enable(struct rte_bbdev *dev) { int ret; - struct acc100_device *d = dev->data->dev_private; + struct acc_device *d = dev->data->dev_private; /* Only MSI are currently supported */ if (rte_intr_type_get(dev->intr_handle) == RTE_INTR_HANDLE_VFIO_MSI || @@ -762,7 +599,7 @@ static int acc100_dev_close(struct rte_bbdev *dev) { - struct acc100_device *d = dev->data->dev_private; + struct acc_device *d = dev->data->dev_private; acc100_check_ir(d); if (d->sw_rings_base != NULL) { rte_free(d->tail_ptrs); @@ -771,7 +608,7 @@ d->sw_rings_base = NULL; } /* Ensure all in flight HW transactions are completed */ - usleep(ACC100_LONG_WAIT); + usleep(ACC_LONG_WAIT); return 0; } @@ -784,12 +621,12 @@ acc100_find_free_queue_idx(struct rte_bbdev *dev, const struct rte_bbdev_queue_conf *conf) { - struct acc100_device *d = dev->data->dev_private; + struct acc_device *d = dev->data->dev_private; int op_2_acc[5] = {0, UL_4G, DL_4G, UL_5G, DL_5G}; int acc = op_2_acc[conf->op_type]; - struct rte_acc100_queue_topology *qtop = NULL; + struct rte_acc_queue_topology *qtop = NULL; - qtopFromAcc(&qtop, acc, &(d->acc100_conf)); + qtopFromAcc(&qtop, acc, &(d->acc_conf)); if (qtop == NULL) return -1; /* Identify matching QGroup Index which are sorted in priority order */ @@ -821,8 +658,8 @@ acc100_queue_setup(struct rte_bbdev *dev, uint16_t queue_id, const struct rte_bbdev_queue_conf *conf) { - struct acc100_device *d = dev->data->dev_private; - struct acc100_queue *q; + struct acc_device *d = dev->data->dev_private; + struct acc_queue *q; int16_t q_idx; /* Allocate the queue data structure. */ @@ -842,37 +679,37 @@ q->ring_addr_iova = d->sw_rings_iova + (d->sw_ring_size * queue_id); /* Prepare the Ring with default descriptor format */ - union acc100_dma_desc *desc = NULL; + union acc_dma_desc *desc = NULL; unsigned int desc_idx, b_idx; int fcw_len = (conf->op_type == RTE_BBDEV_OP_LDPC_ENC ? - ACC100_FCW_LE_BLEN : (conf->op_type == RTE_BBDEV_OP_TURBO_DEC ? - ACC100_FCW_TD_BLEN : ACC100_FCW_LD_BLEN)); + ACC_FCW_LE_BLEN : (conf->op_type == RTE_BBDEV_OP_TURBO_DEC ? + ACC_FCW_TD_BLEN : ACC_FCW_LD_BLEN)); for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) { desc = q->ring_addr + desc_idx; - desc->req.word0 = ACC100_DMA_DESC_TYPE; + desc->req.word0 = ACC_DMA_DESC_TYPE; desc->req.word1 = 0; /**< Timestamp */ desc->req.word2 = 0; desc->req.word3 = 0; - uint64_t fcw_offset = (desc_idx << 8) + ACC100_DESC_FCW_OFFSET; + uint64_t fcw_offset = (desc_idx << 8) + ACC_DESC_FCW_OFFSET; desc->req.data_ptrs[0].address = q->ring_addr_iova + fcw_offset; desc->req.data_ptrs[0].blen = fcw_len; - desc->req.data_ptrs[0].blkid = ACC100_DMA_BLKID_FCW; + desc->req.data_ptrs[0].blkid = ACC_DMA_BLKID_FCW; desc->req.data_ptrs[0].last = 0; desc->req.data_ptrs[0].dma_ext = 0; - for (b_idx = 1; b_idx < ACC100_DMA_MAX_NUM_POINTERS - 1; + for (b_idx = 1; b_idx < ACC_DMA_MAX_NUM_POINTERS - 1; b_idx++) { - desc->req.data_ptrs[b_idx].blkid = ACC100_DMA_BLKID_IN; + desc->req.data_ptrs[b_idx].blkid = ACC_DMA_BLKID_IN; desc->req.data_ptrs[b_idx].last = 1; desc->req.data_ptrs[b_idx].dma_ext = 0; b_idx++; desc->req.data_ptrs[b_idx].blkid = - ACC100_DMA_BLKID_OUT_ENC; + ACC_DMA_BLKID_OUT_ENC; desc->req.data_ptrs[b_idx].last = 1; desc->req.data_ptrs[b_idx].dma_ext = 0; } /* Preset some fields of LDPC FCW */ - desc->req.fcw_ld.FCWversion = ACC100_FCW_VER; + desc->req.fcw_ld.FCWversion = ACC_FCW_VER; desc->req.fcw_ld.gain_i = 1; desc->req.fcw_ld.gain_h = 1; } @@ -925,8 +762,8 @@ q->vf_id = (q_idx >> ACC100_VF_ID_SHIFT) & 0x3F; q->aq_id = q_idx & 0xF; q->aq_depth = (conf->op_type == RTE_BBDEV_OP_TURBO_DEC) ? - (1 << d->acc100_conf.q_ul_4g.aq_depth_log2) : - (1 << d->acc100_conf.q_dl_4g.aq_depth_log2); + (1 << d->acc_conf.q_ul_4g.aq_depth_log2) : + (1 << d->acc_conf.q_dl_4g.aq_depth_log2); q->mmio_reg_enqueue = RTE_PTR_ADD(d->mmio_base, queue_offset(d->pf_device, @@ -945,8 +782,8 @@ static int acc100_queue_release(struct rte_bbdev *dev, uint16_t q_id) { - struct acc100_device *d = dev->data->dev_private; - struct acc100_queue *q = dev->data->queues[q_id].queue_private; + struct acc_device *d = dev->data->dev_private; + struct acc_queue *q = dev->data->queues[q_id].queue_private; if (q != NULL) { /* Mark the Queue as un-assigned */ @@ -966,7 +803,7 @@ acc100_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info) { - struct acc100_device *d = dev->data->dev_private; + struct acc_device *d = dev->data->dev_private; int i; static const struct rte_bbdev_op_cap bbdev_capabilities[] = { @@ -1056,7 +893,7 @@ static struct rte_bbdev_queue_conf default_queue_conf; default_queue_conf.socket = dev->data->socket_id; - default_queue_conf.queue_size = ACC100_MAX_QUEUE_DEPTH; + default_queue_conf.queue_size = ACC_MAX_QUEUE_DEPTH; dev_info->driver_name = dev->device->driver->name; @@ -1066,27 +903,27 @@ /* Expose number of queues */ dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; - dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = d->acc100_conf.q_ul_4g.num_aqs_per_groups * - d->acc100_conf.q_ul_4g.num_qgroups; - dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = d->acc100_conf.q_dl_4g.num_aqs_per_groups * - d->acc100_conf.q_dl_4g.num_qgroups; - dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = d->acc100_conf.q_ul_5g.num_aqs_per_groups * - d->acc100_conf.q_ul_5g.num_qgroups; - dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = d->acc100_conf.q_dl_5g.num_aqs_per_groups * - d->acc100_conf.q_dl_5g.num_qgroups; - dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = d->acc100_conf.q_ul_4g.num_qgroups; - dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = d->acc100_conf.q_dl_4g.num_qgroups; - dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = d->acc100_conf.q_ul_5g.num_qgroups; - dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = d->acc100_conf.q_dl_5g.num_qgroups; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = d->acc_conf.q_ul_4g.num_aqs_per_groups * + d->acc_conf.q_ul_4g.num_qgroups; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = d->acc_conf.q_dl_4g.num_aqs_per_groups * + d->acc_conf.q_dl_4g.num_qgroups; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = d->acc_conf.q_ul_5g.num_aqs_per_groups * + d->acc_conf.q_ul_5g.num_qgroups; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = d->acc_conf.q_dl_5g.num_aqs_per_groups * + d->acc_conf.q_dl_5g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = d->acc_conf.q_ul_4g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = d->acc_conf.q_dl_4g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = d->acc_conf.q_ul_5g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = d->acc_conf.q_dl_5g.num_qgroups; dev_info->max_num_queues = 0; for (i = RTE_BBDEV_OP_TURBO_DEC; i <= RTE_BBDEV_OP_LDPC_ENC; i++) dev_info->max_num_queues += dev_info->num_queues[i]; - dev_info->queue_size_lim = ACC100_MAX_QUEUE_DEPTH; + dev_info->queue_size_lim = ACC_MAX_QUEUE_DEPTH; dev_info->hardware_accelerated = true; dev_info->max_dl_queue_priority = - d->acc100_conf.q_dl_4g.num_qgroups - 1; + d->acc_conf.q_dl_4g.num_qgroups - 1; dev_info->max_ul_queue_priority = - d->acc100_conf.q_ul_4g.num_qgroups - 1; + d->acc_conf.q_ul_4g.num_qgroups - 1; dev_info->default_queue_conf = default_queue_conf; dev_info->cpu_flag_reqs = NULL; dev_info->min_alignment = 64; @@ -1103,7 +940,7 @@ static int acc100_queue_intr_enable(struct rte_bbdev *dev, uint16_t queue_id) { - struct acc100_queue *q = dev->data->queues[queue_id].queue_private; + struct acc_queue *q = dev->data->queues[queue_id].queue_private; if (rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_VFIO_MSI && rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_UIO) @@ -1116,7 +953,7 @@ static int acc100_queue_intr_disable(struct rte_bbdev *dev, uint16_t queue_id) { - struct acc100_queue *q = dev->data->queues[queue_id].queue_private; + struct acc_queue *q = dev->data->queues[queue_id].queue_private; if (rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_VFIO_MSI && rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_UIO) @@ -1159,132 +996,10 @@ {.device_id = 0}, }; -/* Read flag value 0/1 from bitmap */ -static inline bool -check_bit(uint32_t bitmap, uint32_t bitmask) -{ - return bitmap & bitmask; -} - -static inline char * -mbuf_append(struct rte_mbuf *m_head, struct rte_mbuf *m, uint16_t len) -{ - if (unlikely(len > rte_pktmbuf_tailroom(m))) - return NULL; - - char *tail = (char *)m->buf_addr + m->data_off + m->data_len; - m->data_len = (uint16_t)(m->data_len + len); - m_head->pkt_len = (m_head->pkt_len + len); - return tail; -} - -/* Fill in a frame control word for turbo encoding. */ -static inline void -acc100_fcw_te_fill(const struct rte_bbdev_enc_op *op, struct acc100_fcw_te *fcw) -{ - fcw->code_block_mode = op->turbo_enc.code_block_mode; - if (fcw->code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { - fcw->k_neg = op->turbo_enc.tb_params.k_neg; - fcw->k_pos = op->turbo_enc.tb_params.k_pos; - fcw->c_neg = op->turbo_enc.tb_params.c_neg; - fcw->c = op->turbo_enc.tb_params.c; - fcw->ncb_neg = op->turbo_enc.tb_params.ncb_neg; - fcw->ncb_pos = op->turbo_enc.tb_params.ncb_pos; - - if (check_bit(op->turbo_enc.op_flags, - RTE_BBDEV_TURBO_RATE_MATCH)) { - fcw->bypass_rm = 0; - fcw->cab = op->turbo_enc.tb_params.cab; - fcw->ea = op->turbo_enc.tb_params.ea; - fcw->eb = op->turbo_enc.tb_params.eb; - } else { - /* E is set to the encoding output size when RM is - * bypassed. - */ - fcw->bypass_rm = 1; - fcw->cab = fcw->c_neg; - fcw->ea = 3 * fcw->k_neg + 12; - fcw->eb = 3 * fcw->k_pos + 12; - } - } else { /* For CB mode */ - fcw->k_pos = op->turbo_enc.cb_params.k; - fcw->ncb_pos = op->turbo_enc.cb_params.ncb; - - if (check_bit(op->turbo_enc.op_flags, - RTE_BBDEV_TURBO_RATE_MATCH)) { - fcw->bypass_rm = 0; - fcw->eb = op->turbo_enc.cb_params.e; - } else { - /* E is set to the encoding output size when RM is - * bypassed. - */ - fcw->bypass_rm = 1; - fcw->eb = 3 * fcw->k_pos + 12; - } - } - - fcw->bypass_rv_idx1 = check_bit(op->turbo_enc.op_flags, - RTE_BBDEV_TURBO_RV_INDEX_BYPASS); - fcw->code_block_crc = check_bit(op->turbo_enc.op_flags, - RTE_BBDEV_TURBO_CRC_24B_ATTACH); - fcw->rv_idx1 = op->turbo_enc.rv_index; -} - -/* Compute value of k0. - * Based on 3GPP 38.212 Table 5.4.2.1-2 - * Starting position of different redundancy versions, k0 - */ -static inline uint16_t -get_k0(uint16_t n_cb, uint16_t z_c, uint8_t bg, uint8_t rv_index) -{ - if (rv_index == 0) - return 0; - uint16_t n = (bg == 1 ? ACC100_N_ZC_1 : ACC100_N_ZC_2) * z_c; - if (n_cb == n) { - if (rv_index == 1) - return (bg == 1 ? ACC100_K0_1_1 : ACC100_K0_1_2) * z_c; - else if (rv_index == 2) - return (bg == 1 ? ACC100_K0_2_1 : ACC100_K0_2_2) * z_c; - else - return (bg == 1 ? ACC100_K0_3_1 : ACC100_K0_3_2) * z_c; - } - /* LBRM case - includes a division by N */ - if (unlikely(z_c == 0)) - return 0; - if (rv_index == 1) - return (((bg == 1 ? ACC100_K0_1_1 : ACC100_K0_1_2) * n_cb) - / n) * z_c; - else if (rv_index == 2) - return (((bg == 1 ? ACC100_K0_2_1 : ACC100_K0_2_2) * n_cb) - / n) * z_c; - else - return (((bg == 1 ? ACC100_K0_3_1 : ACC100_K0_3_2) * n_cb) - / n) * z_c; -} - -/* Fill in a frame control word for LDPC encoding. */ -static inline void -acc100_fcw_le_fill(const struct rte_bbdev_enc_op *op, - struct acc100_fcw_le *fcw, int num_cb) -{ - fcw->qm = op->ldpc_enc.q_m; - fcw->nfiller = op->ldpc_enc.n_filler; - fcw->BG = (op->ldpc_enc.basegraph - 1); - fcw->Zc = op->ldpc_enc.z_c; - fcw->ncb = op->ldpc_enc.n_cb; - fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_enc.basegraph, - op->ldpc_enc.rv_index); - fcw->rm_e = op->ldpc_enc.cb_params.e; - fcw->crc_select = check_bit(op->ldpc_enc.op_flags, - RTE_BBDEV_LDPC_CRC_24B_ATTACH); - fcw->bypass_intlv = check_bit(op->ldpc_enc.op_flags, - RTE_BBDEV_LDPC_INTERLEAVER_BYPASS); - fcw->mcb_count = num_cb; -} /* Fill in a frame control word for turbo decoding. */ static inline void -acc100_fcw_td_fill(const struct rte_bbdev_dec_op *op, struct acc100_fcw_td *fcw) +acc100_fcw_td_fill(const struct rte_bbdev_dec_op *op, struct acc_fcw_td *fcw) { /* Note : Early termination is always enabled for 4GUL */ fcw->fcw_ver = 1; @@ -1304,13 +1019,13 @@ #ifdef RTE_LIBRTE_BBDEV_DEBUG static inline bool -is_acc100(struct acc100_queue *q) +is_acc100(struct acc_queue *q) { return (q->d->device_variant == ACC100_VARIANT); } static inline bool -validate_op_required(struct acc100_queue *q) +validate_op_required(struct acc_queue *q) { return is_acc100(q); } @@ -1318,8 +1033,8 @@ /* Fill in a frame control word for LDPC decoding. */ static inline void -acc100_fcw_ld_fill(struct rte_bbdev_dec_op *op, struct acc100_fcw_ld *fcw, - union acc100_harq_layout_data *harq_layout) +acc100_fcw_ld_fill(struct rte_bbdev_dec_op *op, struct acc_fcw_ld *fcw, + union acc_harq_layout_data *harq_layout) { uint16_t harq_out_length, harq_in_length, ncb_p, k0_p, parity_offset; uint16_t harq_index; @@ -1362,13 +1077,13 @@ fcw->llr_pack_mode = check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_LLR_COMPRESSION); harq_index = op->ldpc_dec.harq_combined_output.offset / - ACC100_HARQ_OFFSET; + ACC_HARQ_OFFSET; #ifdef ACC100_EXT_MEM /* Limit cases when HARQ pruning is valid */ harq_prun = ((op->ldpc_dec.harq_combined_output.offset % - ACC100_HARQ_OFFSET) == 0) && + ACC_HARQ_OFFSET) == 0) && (op->ldpc_dec.harq_combined_output.offset <= UINT16_MAX - * ACC100_HARQ_OFFSET); + * ACC_HARQ_OFFSET); #endif if (fcw->hcin_en > 0) { harq_in_length = op->ldpc_dec.harq_combined_input.length; @@ -1423,7 +1138,7 @@ harq_out_length = (uint16_t) fcw->hcin_size0; harq_out_length = RTE_MIN(RTE_MAX(harq_out_length, l), ncb_p); harq_out_length = (harq_out_length + 0x3F) & 0xFFC0; - if ((k0_p > fcw->hcin_size0 + ACC100_HARQ_OFFSET_THRESHOLD) && + if ((k0_p > fcw->hcin_size0 + ACC_HARQ_OFFSET_THRESHOLD) && harq_prun) { fcw->hcout_size0 = (uint16_t) fcw->hcin_size0; fcw->hcout_offset = k0_p & 0xFFC0; @@ -1442,16 +1157,10 @@ } } -/* Convert offset to harq index for harq_layout structure */ -static inline uint32_t hq_index(uint32_t offset) -{ - return (offset >> ACC100_HARQ_OFFSET_SHIFT) & ACC100_HARQ_OFFSET_MASK; -} - /* Fill in a frame control word for LDPC decoding for ACC101 */ static inline void -acc101_fcw_ld_fill(struct rte_bbdev_dec_op *op, struct acc100_fcw_ld *fcw, - union acc100_harq_layout_data *harq_layout) +acc101_fcw_ld_fill(struct rte_bbdev_dec_op *op, struct acc_fcw_ld *fcw, + union acc_harq_layout_data *harq_layout) { uint16_t harq_out_length, harq_in_length, ncb_p, k0_p, parity_offset; uint32_t harq_index; @@ -1564,215 +1273,9 @@ static inline uint32_t hq_index(uint32_t offset) } } -/** - * Fills descriptor with data pointers of one block type. - * - * @param desc - * Pointer to DMA descriptor. - * @param input - * Pointer to pointer to input data which will be encoded. It can be changed - * and points to next segment in scatter-gather case. - * @param offset - * Input offset in rte_mbuf structure. It is used for calculating the point - * where data is starting. - * @param cb_len - * Length of currently processed Code Block - * @param seg_total_left - * It indicates how many bytes still left in segment (mbuf) for further - * processing. - * @param op_flags - * Store information about device capabilities - * @param next_triplet - * Index for ACC100 DMA Descriptor triplet - * - * @return - * Returns index of next triplet on success, other value if lengths of - * pkt and processed cb do not match. - * - */ -static inline int -acc100_dma_fill_blk_type_in(struct acc100_dma_req_desc *desc, - struct rte_mbuf **input, uint32_t *offset, uint32_t cb_len, - uint32_t *seg_total_left, int next_triplet) -{ - uint32_t part_len; - struct rte_mbuf *m = *input; - - part_len = (*seg_total_left < cb_len) ? *seg_total_left : cb_len; - cb_len -= part_len; - *seg_total_left -= part_len; - - desc->data_ptrs[next_triplet].address = - rte_pktmbuf_iova_offset(m, *offset); - desc->data_ptrs[next_triplet].blen = part_len; - desc->data_ptrs[next_triplet].blkid = ACC100_DMA_BLKID_IN; - desc->data_ptrs[next_triplet].last = 0; - desc->data_ptrs[next_triplet].dma_ext = 0; - *offset += part_len; - next_triplet++; - - while (cb_len > 0) { - if (next_triplet < ACC100_DMA_MAX_NUM_POINTERS_IN && m->next != NULL) { - - m = m->next; - *seg_total_left = rte_pktmbuf_data_len(m); - part_len = (*seg_total_left < cb_len) ? - *seg_total_left : - cb_len; - desc->data_ptrs[next_triplet].address = - rte_pktmbuf_iova_offset(m, 0); - desc->data_ptrs[next_triplet].blen = part_len; - desc->data_ptrs[next_triplet].blkid = - ACC100_DMA_BLKID_IN; - desc->data_ptrs[next_triplet].last = 0; - desc->data_ptrs[next_triplet].dma_ext = 0; - cb_len -= part_len; - *seg_total_left -= part_len; - /* Initializing offset for next segment (mbuf) */ - *offset = part_len; - next_triplet++; - } else { - rte_bbdev_log(ERR, - "Some data still left for processing: " - "data_left: %u, next_triplet: %u, next_mbuf: %p", - cb_len, next_triplet, m->next); - return -EINVAL; - } - } - /* Storing new mbuf as it could be changed in scatter-gather case*/ - *input = m; - - return next_triplet; -} - -/* Fills descriptor with data pointers of one block type. - * Returns index of next triplet on success, other value if lengths of - * output data and processed mbuf do not match. - */ -static inline int -acc100_dma_fill_blk_type_out(struct acc100_dma_req_desc *desc, - struct rte_mbuf *output, uint32_t out_offset, - uint32_t output_len, int next_triplet, int blk_id) -{ - desc->data_ptrs[next_triplet].address = - rte_pktmbuf_iova_offset(output, out_offset); - desc->data_ptrs[next_triplet].blen = output_len; - desc->data_ptrs[next_triplet].blkid = blk_id; - desc->data_ptrs[next_triplet].last = 0; - desc->data_ptrs[next_triplet].dma_ext = 0; - next_triplet++; - - return next_triplet; -} - -static inline void -acc100_header_init(struct acc100_dma_req_desc *desc) -{ - desc->word0 = ACC100_DMA_DESC_TYPE; - desc->word1 = 0; /**< Timestamp could be disabled */ - desc->word2 = 0; - desc->word3 = 0; - desc->numCBs = 1; -} - -#ifdef RTE_LIBRTE_BBDEV_DEBUG -/* Check if any input data is unexpectedly left for processing */ -static inline int -check_mbuf_total_left(uint32_t mbuf_total_left) -{ - if (mbuf_total_left == 0) - return 0; - rte_bbdev_log(ERR, - "Some date still left for processing: mbuf_total_left = %u", - mbuf_total_left); - return -EINVAL; -} -#endif - -static inline int -acc100_dma_desc_te_fill(struct rte_bbdev_enc_op *op, - struct acc100_dma_req_desc *desc, struct rte_mbuf **input, - struct rte_mbuf *output, uint32_t *in_offset, - uint32_t *out_offset, uint32_t *out_length, - uint32_t *mbuf_total_left, uint32_t *seg_total_left, uint8_t r) -{ - int next_triplet = 1; /* FCW already done */ - uint32_t e, ea, eb, length; - uint16_t k, k_neg, k_pos; - uint8_t cab, c_neg; - - desc->word0 = ACC100_DMA_DESC_TYPE; - desc->word1 = 0; /**< Timestamp could be disabled */ - desc->word2 = 0; - desc->word3 = 0; - desc->numCBs = 1; - - if (op->turbo_enc.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { - ea = op->turbo_enc.tb_params.ea; - eb = op->turbo_enc.tb_params.eb; - cab = op->turbo_enc.tb_params.cab; - k_neg = op->turbo_enc.tb_params.k_neg; - k_pos = op->turbo_enc.tb_params.k_pos; - c_neg = op->turbo_enc.tb_params.c_neg; - e = (r < cab) ? ea : eb; - k = (r < c_neg) ? k_neg : k_pos; - } else { - e = op->turbo_enc.cb_params.e; - k = op->turbo_enc.cb_params.k; - } - - if (check_bit(op->turbo_enc.op_flags, RTE_BBDEV_TURBO_CRC_24B_ATTACH)) - length = (k - 24) >> 3; - else - length = k >> 3; - - if (unlikely((*mbuf_total_left == 0) || (*mbuf_total_left < length))) { - rte_bbdev_log(ERR, - "Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u", - *mbuf_total_left, length); - return -1; - } - - next_triplet = acc100_dma_fill_blk_type_in(desc, input, in_offset, - length, seg_total_left, next_triplet); - if (unlikely(next_triplet < 0)) { - rte_bbdev_log(ERR, - "Mismatch between data to process and mbuf data length in bbdev_op: %p", - op); - return -1; - } - desc->data_ptrs[next_triplet - 1].last = 1; - desc->m2dlen = next_triplet; - *mbuf_total_left -= length; - - /* Set output length */ - if (check_bit(op->turbo_enc.op_flags, RTE_BBDEV_TURBO_RATE_MATCH)) - /* Integer round up division by 8 */ - *out_length = (e + 7) >> 3; - else - *out_length = (k >> 3) * 3 + 2; - - next_triplet = acc100_dma_fill_blk_type_out(desc, output, *out_offset, - *out_length, next_triplet, ACC100_DMA_BLKID_OUT_ENC); - if (unlikely(next_triplet < 0)) { - rte_bbdev_log(ERR, - "Mismatch between data to process and mbuf data length in bbdev_op: %p", - op); - return -1; - } - op->turbo_enc.output.length += *out_length; - *out_offset += *out_length; - desc->data_ptrs[next_triplet - 1].last = 1; - desc->d2mlen = next_triplet - desc->m2dlen; - - desc->op_addr = op; - - return 0; -} - static inline int acc100_dma_desc_le_fill(struct rte_bbdev_enc_op *op, - struct acc100_dma_req_desc *desc, struct rte_mbuf **input, + struct acc_dma_req_desc *desc, struct rte_mbuf **input, struct rte_mbuf *output, uint32_t *in_offset, uint32_t *out_offset, uint32_t *out_length, uint32_t *mbuf_total_left, uint32_t *seg_total_left) @@ -1781,7 +1284,7 @@ static inline uint32_t hq_index(uint32_t offset) uint16_t K, in_length_in_bits, in_length_in_bytes; struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc; - acc100_header_init(desc); + acc_header_init(desc); K = (enc->basegraph == 1 ? 22 : 10) * enc->z_c; in_length_in_bits = K - enc->n_filler; @@ -1798,9 +1301,9 @@ static inline uint32_t hq_index(uint32_t offset) return -1; } - next_triplet = acc100_dma_fill_blk_type_in(desc, input, in_offset, + next_triplet = acc_dma_fill_blk_type_in(desc, input, in_offset, in_length_in_bytes, - seg_total_left, next_triplet); + seg_total_left, next_triplet, false); if (unlikely(next_triplet < 0)) { rte_bbdev_log(ERR, "Mismatch between data to process and mbuf data length in bbdev_op: %p", @@ -1815,8 +1318,8 @@ static inline uint32_t hq_index(uint32_t offset) /* Integer round up division by 8 */ *out_length = (enc->cb_params.e + 7) >> 3; - next_triplet = acc100_dma_fill_blk_type_out(desc, output, *out_offset, - *out_length, next_triplet, ACC100_DMA_BLKID_OUT_ENC); + next_triplet = acc_dma_fill_blk_type(desc, output, *out_offset, + *out_length, next_triplet, ACC_DMA_BLKID_OUT_ENC); op->ldpc_enc.output.length += *out_length; *out_offset += *out_length; desc->data_ptrs[next_triplet - 1].last = 1; @@ -1830,7 +1333,7 @@ static inline uint32_t hq_index(uint32_t offset) static inline int acc100_dma_desc_td_fill(struct rte_bbdev_dec_op *op, - struct acc100_dma_req_desc *desc, struct rte_mbuf **input, + struct acc_dma_req_desc *desc, struct rte_mbuf **input, struct rte_mbuf *h_output, struct rte_mbuf *s_output, uint32_t *in_offset, uint32_t *h_out_offset, uint32_t *s_out_offset, uint32_t *h_out_length, @@ -1842,7 +1345,7 @@ static inline uint32_t hq_index(uint32_t offset) uint16_t crc24_overlap = 0; uint32_t e, kw; - desc->word0 = ACC100_DMA_DESC_TYPE; + desc->word0 = ACC_DMA_DESC_TYPE; desc->word1 = 0; /**< Timestamp could be disabled */ desc->word2 = 0; desc->word3 = 0; @@ -1887,8 +1390,10 @@ static inline uint32_t hq_index(uint32_t offset) return -1; } - next_triplet = acc100_dma_fill_blk_type_in(desc, input, in_offset, kw, - seg_total_left, next_triplet); + next_triplet = acc_dma_fill_blk_type_in(desc, input, in_offset, kw, + seg_total_left, next_triplet, + check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_DEC_SCATTER_GATHER)); if (unlikely(next_triplet < 0)) { rte_bbdev_log(ERR, "Mismatch between data to process and mbuf data length in bbdev_op: %p", @@ -1899,10 +1404,10 @@ static inline uint32_t hq_index(uint32_t offset) desc->m2dlen = next_triplet; *mbuf_total_left -= kw; - next_triplet = acc100_dma_fill_blk_type_out( + next_triplet = acc_dma_fill_blk_type( desc, h_output, *h_out_offset, (k - crc24_overlap) >> 3, next_triplet, - ACC100_DMA_BLKID_OUT_HARD); + ACC_DMA_BLKID_OUT_HARD); if (unlikely(next_triplet < 0)) { rte_bbdev_log(ERR, "Mismatch between data to process and mbuf data length in bbdev_op: %p", @@ -1926,9 +1431,9 @@ static inline uint32_t hq_index(uint32_t offset) else *s_out_length = (k * 3) + 12; - next_triplet = acc100_dma_fill_blk_type_out(desc, s_output, + next_triplet = acc_dma_fill_blk_type(desc, s_output, *s_out_offset, *s_out_length, next_triplet, - ACC100_DMA_BLKID_OUT_SOFT); + ACC_DMA_BLKID_OUT_SOFT); if (unlikely(next_triplet < 0)) { rte_bbdev_log(ERR, "Mismatch between data to process and mbuf data length in bbdev_op: %p", @@ -1950,12 +1455,12 @@ static inline uint32_t hq_index(uint32_t offset) static inline int acc100_dma_desc_ld_fill(struct rte_bbdev_dec_op *op, - struct acc100_dma_req_desc *desc, + struct acc_dma_req_desc *desc, struct rte_mbuf **input, struct rte_mbuf *h_output, uint32_t *in_offset, uint32_t *h_out_offset, uint32_t *h_out_length, uint32_t *mbuf_total_left, uint32_t *seg_total_left, - struct acc100_fcw_ld *fcw) + struct acc_fcw_ld *fcw) { struct rte_bbdev_op_ldpc_dec *dec = &op->ldpc_dec; int next_triplet = 1; /* FCW already done */ @@ -1965,7 +1470,7 @@ static inline uint32_t hq_index(uint32_t offset) bool h_comp = check_bit(dec->op_flags, RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION); - acc100_header_init(desc); + acc_header_init(desc); if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP)) @@ -1988,9 +1493,11 @@ static inline uint32_t hq_index(uint32_t offset) return -1; } - next_triplet = acc100_dma_fill_blk_type_in(desc, input, + next_triplet = acc_dma_fill_blk_type_in(desc, input, in_offset, input_length, - seg_total_left, next_triplet); + seg_total_left, next_triplet, + check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_DEC_SCATTER_GATHER)); if (unlikely(next_triplet < 0)) { rte_bbdev_log(ERR, @@ -2007,16 +1514,16 @@ static inline uint32_t hq_index(uint32_t offset) desc->data_ptrs[next_triplet].address = dec->harq_combined_input.offset; desc->data_ptrs[next_triplet].blen = h_p_size; - desc->data_ptrs[next_triplet].blkid = ACC100_DMA_BLKID_IN_HARQ; + desc->data_ptrs[next_triplet].blkid = ACC_DMA_BLKID_IN_HARQ; desc->data_ptrs[next_triplet].dma_ext = 1; #ifndef ACC100_EXT_MEM - acc100_dma_fill_blk_type_out( + acc_dma_fill_blk_type( desc, op->ldpc_dec.harq_combined_input.data, op->ldpc_dec.harq_combined_input.offset, h_p_size, next_triplet, - ACC100_DMA_BLKID_IN_HARQ); + ACC_DMA_BLKID_IN_HARQ); #endif next_triplet++; } @@ -2025,9 +1532,9 @@ static inline uint32_t hq_index(uint32_t offset) desc->m2dlen = next_triplet; *mbuf_total_left -= input_length; - next_triplet = acc100_dma_fill_blk_type_out(desc, h_output, + next_triplet = acc_dma_fill_blk_type(desc, h_output, *h_out_offset, output_length >> 3, next_triplet, - ACC100_DMA_BLKID_OUT_HARD); + ACC_DMA_BLKID_OUT_HARD); if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE)) { @@ -2045,16 +1552,16 @@ static inline uint32_t hq_index(uint32_t offset) desc->data_ptrs[next_triplet].address = dec->harq_combined_output.offset; desc->data_ptrs[next_triplet].blen = h_p_size; - desc->data_ptrs[next_triplet].blkid = ACC100_DMA_BLKID_OUT_HARQ; + desc->data_ptrs[next_triplet].blkid = ACC_DMA_BLKID_OUT_HARQ; desc->data_ptrs[next_triplet].dma_ext = 1; #ifndef ACC100_EXT_MEM - acc100_dma_fill_blk_type_out( + acc_dma_fill_blk_type( desc, dec->harq_combined_output.data, dec->harq_combined_output.offset, h_p_size, next_triplet, - ACC100_DMA_BLKID_OUT_HARQ); + ACC_DMA_BLKID_OUT_HARQ); #endif next_triplet++; } @@ -2072,11 +1579,11 @@ static inline uint32_t hq_index(uint32_t offset) static inline void acc100_dma_desc_ld_update(struct rte_bbdev_dec_op *op, - struct acc100_dma_req_desc *desc, + struct acc_dma_req_desc *desc, struct rte_mbuf *input, struct rte_mbuf *h_output, uint32_t *in_offset, uint32_t *h_out_offset, uint32_t *h_out_length, - union acc100_harq_layout_data *harq_layout) + union acc_harq_layout_data *harq_layout) { int next_triplet = 1; /* FCW already done */ desc->data_ptrs[next_triplet].address = @@ -2108,10 +1615,10 @@ static inline uint32_t hq_index(uint32_t offset) op->ldpc_dec.harq_combined_output.length = prev_op->ldpc_dec.harq_combined_output.length; int16_t hq_idx = op->ldpc_dec.harq_combined_output.offset / - ACC100_HARQ_OFFSET; + ACC_HARQ_OFFSET; int16_t prev_hq_idx = prev_op->ldpc_dec.harq_combined_output.offset - / ACC100_HARQ_OFFSET; + / ACC_HARQ_OFFSET; harq_layout[hq_idx].val = harq_layout[prev_hq_idx].val; #ifndef ACC100_EXT_MEM struct rte_bbdev_op_data ho = @@ -2126,84 +1633,10 @@ static inline uint32_t hq_index(uint32_t offset) desc->op_addr = op; } - -/* Enqueue a number of operations to HW and update software rings */ -static inline void -acc100_dma_enqueue(struct acc100_queue *q, uint16_t n, - struct rte_bbdev_stats *queue_stats) -{ - union acc100_enqueue_reg_fmt enq_req; -#ifdef RTE_BBDEV_OFFLOAD_COST - uint64_t start_time = 0; - queue_stats->acc_offload_cycles = 0; -#else - RTE_SET_USED(queue_stats); -#endif - - enq_req.val = 0; - /* Setting offset, 100b for 256 DMA Desc */ - enq_req.addr_offset = ACC100_DESC_OFFSET; - - /* Split ops into batches */ - do { - union acc100_dma_desc *desc; - uint16_t enq_batch_size; - uint64_t offset; - rte_iova_t req_elem_addr; - - enq_batch_size = RTE_MIN(n, MAX_ENQ_BATCH_SIZE); - - /* Set flag on last descriptor in a batch */ - desc = q->ring_addr + ((q->sw_ring_head + enq_batch_size - 1) & - q->sw_ring_wrap_mask); - desc->req.last_desc_in_batch = 1; - - /* Calculate the 1st descriptor's address */ - offset = ((q->sw_ring_head & q->sw_ring_wrap_mask) * - sizeof(union acc100_dma_desc)); - req_elem_addr = q->ring_addr_iova + offset; - - /* Fill enqueue struct */ - enq_req.num_elem = enq_batch_size; - /* low 6 bits are not needed */ - enq_req.req_elem_addr = (uint32_t)(req_elem_addr >> 6); - -#ifdef RTE_LIBRTE_BBDEV_DEBUG - rte_memdump(stderr, "Req sdone", desc, sizeof(*desc)); -#endif - rte_bbdev_log_debug( - "Enqueue %u reqs (phys %#"PRIx64") to reg %p", - enq_batch_size, - req_elem_addr, - (void *)q->mmio_reg_enqueue); - - rte_wmb(); - -#ifdef RTE_BBDEV_OFFLOAD_COST - /* Start time measurement for enqueue function offload. */ - start_time = rte_rdtsc_precise(); -#endif - rte_bbdev_log(DEBUG, "Debug : MMIO Enqueue"); - mmio_write(q->mmio_reg_enqueue, enq_req.val); - -#ifdef RTE_BBDEV_OFFLOAD_COST - queue_stats->acc_offload_cycles += - rte_rdtsc_precise() - start_time; -#endif - - q->aq_enqueued++; - q->sw_ring_head += enq_batch_size; - n -= enq_batch_size; - - } while (n); - - -} - #ifdef RTE_LIBRTE_BBDEV_DEBUG /* Validates turbo encoder parameters */ static inline int -validate_enc_op(struct rte_bbdev_enc_op *op, struct acc100_queue *q) +validate_enc_op(struct rte_bbdev_enc_op *op, struct acc_queue *q) { struct rte_bbdev_op_turbo_enc *turbo_enc = &op->turbo_enc; struct rte_bbdev_op_enc_turbo_cb_params *cb = NULL; @@ -2344,7 +1777,7 @@ static inline uint32_t hq_index(uint32_t offset) } /* Validates LDPC encoder parameters */ static inline int -validate_ldpc_enc_op(struct rte_bbdev_enc_op *op, struct acc100_queue *q) +validate_ldpc_enc_op(struct rte_bbdev_enc_op *op, struct acc_queue *q) { struct rte_bbdev_op_ldpc_enc *ldpc_enc = &op->ldpc_enc; @@ -2400,7 +1833,7 @@ static inline uint32_t hq_index(uint32_t offset) /* Validates LDPC decoder parameters */ static inline int -validate_ldpc_dec_op(struct rte_bbdev_dec_op *op, struct acc100_queue *q) +validate_ldpc_dec_op(struct rte_bbdev_dec_op *op, struct acc_queue *q) { struct rte_bbdev_op_ldpc_dec *ldpc_dec = &op->ldpc_dec; @@ -2448,10 +1881,10 @@ static inline uint32_t hq_index(uint32_t offset) /* Enqueue one encode operations for ACC100 device in CB mode */ static inline int -enqueue_enc_one_op_cb(struct acc100_queue *q, struct rte_bbdev_enc_op *op, +enqueue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op *op, uint16_t total_enqueued_cbs) { - union acc100_dma_desc *desc = NULL; + union acc_dma_desc *desc = NULL; int ret; uint32_t in_offset, out_offset, out_length, mbuf_total_left, seg_total_left; @@ -2468,7 +1901,7 @@ static inline uint32_t hq_index(uint32_t offset) uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask); desc = q->ring_addr + desc_idx; - acc100_fcw_te_fill(op, &desc->req.fcw_te); + acc_fcw_te_fill(op, &desc->req.fcw_te); input = op->turbo_enc.input.data; output_head = output = op->turbo_enc.output.data; @@ -2479,7 +1912,7 @@ static inline uint32_t hq_index(uint32_t offset) seg_total_left = rte_pktmbuf_data_len(op->turbo_enc.input.data) - in_offset; - ret = acc100_dma_desc_te_fill(op, &desc->req, &input, output, + ret = acc_dma_desc_te_fill(op, &desc->req, &input, output, &in_offset, &out_offset, &out_length, &mbuf_total_left, &seg_total_left, 0); @@ -2501,10 +1934,10 @@ static inline uint32_t hq_index(uint32_t offset) /* Enqueue one encode operations for ACC100 device in CB mode */ static inline int -enqueue_ldpc_enc_n_op_cb(struct acc100_queue *q, struct rte_bbdev_enc_op **ops, +enqueue_ldpc_enc_n_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ops, uint16_t total_enqueued_cbs, int16_t num) { - union acc100_dma_desc *desc = NULL; + union acc_dma_desc *desc = NULL; uint32_t out_length; struct rte_mbuf *output_head, *output; int i, next_triplet; @@ -2522,10 +1955,10 @@ static inline uint32_t hq_index(uint32_t offset) uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask); desc = q->ring_addr + desc_idx; - acc100_fcw_le_fill(ops[0], &desc->req.fcw_le, num); + acc_fcw_le_fill(ops[0], &desc->req.fcw_le, num, 0); /** This could be done at polling */ - acc100_header_init(&desc->req); + acc_header_init(&desc->req); desc->req.numCBs = num; in_length_in_bytes = ops[0]->ldpc_enc.input.data->data_len; @@ -2564,10 +1997,10 @@ static inline uint32_t hq_index(uint32_t offset) /* Enqueue one encode operations for ACC100 device in CB mode */ static inline int -enqueue_ldpc_enc_one_op_cb(struct acc100_queue *q, struct rte_bbdev_enc_op *op, +enqueue_ldpc_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op *op, uint16_t total_enqueued_cbs) { - union acc100_dma_desc *desc = NULL; + union acc_dma_desc *desc = NULL; int ret; uint32_t in_offset, out_offset, out_length, mbuf_total_left, seg_total_left; @@ -2584,7 +2017,7 @@ static inline uint32_t hq_index(uint32_t offset) uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask); desc = q->ring_addr + desc_idx; - acc100_fcw_le_fill(op, &desc->req.fcw_le, 1); + acc_fcw_le_fill(op, &desc->req.fcw_le, 1, 0); input = op->ldpc_enc.input.data; output_head = output = op->ldpc_enc.output.data; @@ -2619,10 +2052,10 @@ static inline uint32_t hq_index(uint32_t offset) /* Enqueue one encode operations for ACC100 device in TB mode. */ static inline int -enqueue_enc_one_op_tb(struct acc100_queue *q, struct rte_bbdev_enc_op *op, +enqueue_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op *op, uint16_t total_enqueued_cbs, uint8_t cbs_in_tb) { - union acc100_dma_desc *desc = NULL; + union acc_dma_desc *desc = NULL; int ret; uint8_t r, c; uint32_t in_offset, out_offset, out_length, mbuf_total_left, @@ -2641,8 +2074,8 @@ static inline uint32_t hq_index(uint32_t offset) uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask); desc = q->ring_addr + desc_idx; - uint64_t fcw_offset = (desc_idx << 8) + ACC100_DESC_FCW_OFFSET; - acc100_fcw_te_fill(op, &desc->req.fcw_te); + uint64_t fcw_offset = (desc_idx << 8) + ACC_DESC_FCW_OFFSET; + acc_fcw_te_fill(op, &desc->req.fcw_te); input = op->turbo_enc.input.data; output_head = output = op->turbo_enc.output.data; @@ -2660,9 +2093,9 @@ static inline uint32_t hq_index(uint32_t offset) desc = q->ring_addr + ((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask); desc->req.data_ptrs[0].address = q->ring_addr_iova + fcw_offset; - desc->req.data_ptrs[0].blen = ACC100_FCW_TE_BLEN; + desc->req.data_ptrs[0].blen = ACC_FCW_TE_BLEN; - ret = acc100_dma_desc_te_fill(op, &desc->req, &input, output, + ret = acc_dma_desc_te_fill(op, &desc->req, &input, output, &in_offset, &out_offset, &out_length, &mbuf_total_left, &seg_total_left, r); if (unlikely(ret < 0)) @@ -2705,7 +2138,7 @@ static inline uint32_t hq_index(uint32_t offset) #ifdef RTE_LIBRTE_BBDEV_DEBUG /* Validates turbo decoder parameters */ static inline int -validate_dec_op(struct rte_bbdev_dec_op *op, struct acc100_queue *q) +validate_dec_op(struct rte_bbdev_dec_op *op, struct acc_queue *q) { struct rte_bbdev_op_turbo_dec *turbo_dec = &op->turbo_dec; struct rte_bbdev_op_dec_turbo_cb_params *cb = NULL; @@ -2843,10 +2276,10 @@ static inline uint32_t hq_index(uint32_t offset) /** Enqueue one decode operations for ACC100 device in CB mode */ static inline int -enqueue_dec_one_op_cb(struct acc100_queue *q, struct rte_bbdev_dec_op *op, +enqueue_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op, uint16_t total_enqueued_cbs) { - union acc100_dma_desc *desc = NULL; + union acc_dma_desc *desc = NULL; int ret; uint32_t in_offset, h_out_offset, s_out_offset, s_out_length, h_out_length, mbuf_total_left, seg_total_left; @@ -2915,10 +2348,10 @@ static inline uint32_t hq_index(uint32_t offset) } static inline int -harq_loopback(struct acc100_queue *q, struct rte_bbdev_dec_op *op, +harq_loopback(struct acc_queue *q, struct rte_bbdev_dec_op *op, uint16_t total_enqueued_cbs) { - struct acc100_fcw_ld *fcw; - union acc100_dma_desc *desc; + struct acc_fcw_ld *fcw; + union acc_dma_desc *desc; int next_triplet = 1; struct rte_mbuf *hq_output_head, *hq_output; uint16_t harq_dma_length_in, harq_dma_length_out; @@ -2943,24 +2376,24 @@ static inline uint32_t hq_index(uint32_t offset) bool ddr_mem_in = check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_INTERNAL_HARQ_MEMORY_IN_ENABLE); - union acc100_harq_layout_data *harq_layout = q->d->harq_layout; + union acc_harq_layout_data *harq_layout = q->d->harq_layout; uint16_t harq_index = (ddr_mem_in ? op->ldpc_dec.harq_combined_input.offset : op->ldpc_dec.harq_combined_output.offset) - / ACC100_HARQ_OFFSET; + / ACC_HARQ_OFFSET; uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask); desc = q->ring_addr + desc_idx; fcw = &desc->req.fcw_ld; /* Set the FCW from loopback into DDR */ - memset(fcw, 0, sizeof(struct acc100_fcw_ld)); - fcw->FCWversion = ACC100_FCW_VER; + memset(fcw, 0, sizeof(struct acc_fcw_ld)); + fcw->FCWversion = ACC_FCW_VER; fcw->qm = 2; fcw->Zc = 384; - if (harq_in_length < 16 * ACC100_N_ZC_1) + if (harq_in_length < 16 * ACC_N_ZC_1) fcw->Zc = 16; - fcw->ncb = fcw->Zc * ACC100_N_ZC_1; + fcw->ncb = fcw->Zc * ACC_N_ZC_1; fcw->rm_e = 2; fcw->hcin_en = 1; fcw->hcout_en = 1; @@ -2990,32 +2423,32 @@ static inline uint32_t hq_index(uint32_t offset) fcw->gain_h = 1; /* Set the prefix of descriptor. This could be done at polling */ - acc100_header_init(&desc->req); + acc_header_init(&desc->req); /* Null LLR input for Decoder */ desc->req.data_ptrs[next_triplet].address = q->lb_in_addr_iova; desc->req.data_ptrs[next_triplet].blen = 2; - desc->req.data_ptrs[next_triplet].blkid = ACC100_DMA_BLKID_IN; + desc->req.data_ptrs[next_triplet].blkid = ACC_DMA_BLKID_IN; desc->req.data_ptrs[next_triplet].last = 0; desc->req.data_ptrs[next_triplet].dma_ext = 0; next_triplet++; /* HARQ Combine input from either Memory interface */ if (!ddr_mem_in) { - next_triplet = acc100_dma_fill_blk_type_out(&desc->req, + next_triplet = acc_dma_fill_blk_type(&desc->req, op->ldpc_dec.harq_combined_input.data, op->ldpc_dec.harq_combined_input.offset, harq_dma_length_in, next_triplet, - ACC100_DMA_BLKID_IN_HARQ); + ACC_DMA_BLKID_IN_HARQ); } else { desc->req.data_ptrs[next_triplet].address = op->ldpc_dec.harq_combined_input.offset; desc->req.data_ptrs[next_triplet].blen = harq_dma_length_in; desc->req.data_ptrs[next_triplet].blkid = - ACC100_DMA_BLKID_IN_HARQ; + ACC_DMA_BLKID_IN_HARQ; desc->req.data_ptrs[next_triplet].dma_ext = 1; next_triplet++; } @@ -3025,8 +2458,8 @@ static inline uint32_t hq_index(uint32_t offset) /* Dropped decoder hard output */ desc->req.data_ptrs[next_triplet].address = q->lb_out_addr_iova; - desc->req.data_ptrs[next_triplet].blen = ACC100_BYTES_IN_WORD; - desc->req.data_ptrs[next_triplet].blkid = ACC100_DMA_BLKID_OUT_HARD; + desc->req.data_ptrs[next_triplet].blen = ACC_BYTES_IN_WORD; + desc->req.data_ptrs[next_triplet].blkid = ACC_DMA_BLKID_OUT_HARD; desc->req.data_ptrs[next_triplet].last = 0; desc->req.data_ptrs[next_triplet].dma_ext = 0; next_triplet++; @@ -3040,19 +2473,19 @@ static inline uint32_t hq_index(uint32_t offset) desc->req.data_ptrs[next_triplet].blen = harq_dma_length_out; desc->req.data_ptrs[next_triplet].blkid = - ACC100_DMA_BLKID_OUT_HARQ; + ACC_DMA_BLKID_OUT_HARQ; desc->req.data_ptrs[next_triplet].dma_ext = 1; next_triplet++; } else { hq_output_head = op->ldpc_dec.harq_combined_output.data; hq_output = op->ldpc_dec.harq_combined_output.data; - next_triplet = acc100_dma_fill_blk_type_out( + next_triplet = acc_dma_fill_blk_type( &desc->req, op->ldpc_dec.harq_combined_output.data, op->ldpc_dec.harq_combined_output.offset, harq_dma_length_out, next_triplet, - ACC100_DMA_BLKID_OUT_HARQ); + ACC_DMA_BLKID_OUT_HARQ); /* HARQ output */ mbuf_append(hq_output_head, hq_output, harq_dma_length_out); op->ldpc_dec.harq_combined_output.length = @@ -3068,7 +2501,7 @@ static inline uint32_t hq_index(uint32_t offset) /** Enqueue one decode operations for ACC100 device in CB mode */ static inline int -enqueue_ldpc_dec_one_op_cb(struct acc100_queue *q, struct rte_bbdev_dec_op *op, +enqueue_ldpc_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op, uint16_t total_enqueued_cbs, bool same_op) { int ret; @@ -3085,7 +2518,7 @@ static inline uint32_t hq_index(uint32_t offset) return -EINVAL; } #endif - union acc100_dma_desc *desc; + union acc_dma_desc *desc; uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask); desc = q->ring_addr + desc_idx; @@ -3102,36 +2535,36 @@ static inline uint32_t hq_index(uint32_t offset) return -EFAULT; } #endif - union acc100_harq_layout_data *harq_layout = q->d->harq_layout; + union acc_harq_layout_data *harq_layout = q->d->harq_layout; if (same_op) { - union acc100_dma_desc *prev_desc; + union acc_dma_desc *prev_desc; desc_idx = ((q->sw_ring_head + total_enqueued_cbs - 1) & q->sw_ring_wrap_mask); prev_desc = q->ring_addr + desc_idx; uint8_t *prev_ptr = (uint8_t *) prev_desc; uint8_t *new_ptr = (uint8_t *) desc; /* Copy first 4 words and BDESCs */ - rte_memcpy(new_ptr, prev_ptr, ACC100_5GUL_SIZE_0); - rte_memcpy(new_ptr + ACC100_5GUL_OFFSET_0, - prev_ptr + ACC100_5GUL_OFFSET_0, - ACC100_5GUL_SIZE_1); + rte_memcpy(new_ptr, prev_ptr, ACC_5GUL_SIZE_0); + rte_memcpy(new_ptr + ACC_5GUL_OFFSET_0, + prev_ptr + ACC_5GUL_OFFSET_0, + ACC_5GUL_SIZE_1); desc->req.op_addr = prev_desc->req.op_addr; /* Copy FCW */ - rte_memcpy(new_ptr + ACC100_DESC_FCW_OFFSET, - prev_ptr + ACC100_DESC_FCW_OFFSET, - ACC100_FCW_LD_BLEN); + rte_memcpy(new_ptr + ACC_DESC_FCW_OFFSET, + prev_ptr + ACC_DESC_FCW_OFFSET, + ACC_FCW_LD_BLEN); acc100_dma_desc_ld_update(op, &desc->req, input, h_output, &in_offset, &h_out_offset, &h_out_length, harq_layout); } else { - struct acc100_fcw_ld *fcw; + struct acc_fcw_ld *fcw; uint32_t seg_total_left; fcw = &desc->req.fcw_ld; q->d->fcw_ld_fill(op, fcw, harq_layout); /* Special handling when overusing mbuf */ - if (fcw->rm_e < ACC100_MAX_E_MBUF) + if (fcw->rm_e < ACC_MAX_E_MBUF) seg_total_left = rte_pktmbuf_data_len(input) - in_offset; else @@ -3171,10 +2604,10 @@ static inline uint32_t hq_index(uint32_t offset) /* Enqueue one decode operations for ACC100 device in TB mode */ static inline int -enqueue_ldpc_dec_one_op_tb(struct acc100_queue *q, struct rte_bbdev_dec_op *op, +enqueue_ldpc_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op, uint16_t total_enqueued_cbs, uint8_t cbs_in_tb) { - union acc100_dma_desc *desc = NULL; + union acc_dma_desc *desc = NULL; int ret; uint8_t r, c; uint32_t in_offset, h_out_offset, @@ -3193,8 +2626,8 @@ static inline uint32_t hq_index(uint32_t offset) uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask); desc = q->ring_addr + desc_idx; - uint64_t fcw_offset = (desc_idx << 8) + ACC100_DESC_FCW_OFFSET; - union acc100_harq_layout_data *harq_layout = q->d->harq_layout; + uint64_t fcw_offset = (desc_idx << 8) + ACC_DESC_FCW_OFFSET; + union acc_harq_layout_data *harq_layout = q->d->harq_layout; q->d->fcw_ld_fill(op, &desc->req.fcw_ld, harq_layout); input = op->ldpc_dec.input.data; @@ -3214,7 +2647,7 @@ static inline uint32_t hq_index(uint32_t offset) desc = q->ring_addr + ((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask); desc->req.data_ptrs[0].address = q->ring_addr_iova + fcw_offset; - desc->req.data_ptrs[0].blen = ACC100_FCW_LD_BLEN; + desc->req.data_ptrs[0].blen = ACC_FCW_LD_BLEN; ret = acc100_dma_desc_ld_fill(op, &desc->req, &input, h_output, &in_offset, &h_out_offset, &h_out_length, @@ -3260,10 +2693,10 @@ static inline uint32_t hq_index(uint32_t offset) /* Enqueue one decode operations for ACC100 device in TB mode */ static inline int -enqueue_dec_one_op_tb(struct acc100_queue *q, struct rte_bbdev_dec_op *op, +enqueue_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op, uint16_t total_enqueued_cbs, uint8_t cbs_in_tb) { - union acc100_dma_desc *desc = NULL; + union acc_dma_desc *desc = NULL; int ret; uint8_t r, c; uint32_t in_offset, h_out_offset, s_out_offset, s_out_length, @@ -3283,7 +2716,7 @@ static inline uint32_t hq_index(uint32_t offset) uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask); desc = q->ring_addr + desc_idx; - uint64_t fcw_offset = (desc_idx << 8) + ACC100_DESC_FCW_OFFSET; + uint64_t fcw_offset = (desc_idx << 8) + ACC_DESC_FCW_OFFSET; acc100_fcw_td_fill(op, &desc->req.fcw_td); input = op->turbo_dec.input.data; @@ -3305,7 +2738,7 @@ static inline uint32_t hq_index(uint32_t offset) desc = q->ring_addr + ((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask); desc->req.data_ptrs[0].address = q->ring_addr_iova + fcw_offset; - desc->req.data_ptrs[0].blen = ACC100_FCW_TD_BLEN; + desc->req.data_ptrs[0].blen = ACC_FCW_TD_BLEN; ret = acc100_dma_desc_td_fill(op, &desc->req, &input, h_output, s_output, &in_offset, &h_out_offset, &s_out_offset, &h_out_length, &s_out_length, @@ -3360,91 +2793,15 @@ static inline uint32_t hq_index(uint32_t offset) return current_enqueued_cbs; } -/* Calculates number of CBs in processed encoder TB based on 'r' and input - * length. - */ -static inline uint8_t -get_num_cbs_in_tb_enc(struct rte_bbdev_op_turbo_enc *turbo_enc) -{ - uint8_t c, c_neg, r, crc24_bits = 0; - uint16_t k, k_neg, k_pos; - uint8_t cbs_in_tb = 0; - int32_t length; - - length = turbo_enc->input.length; - r = turbo_enc->tb_params.r; - c = turbo_enc->tb_params.c; - c_neg = turbo_enc->tb_params.c_neg; - k_neg = turbo_enc->tb_params.k_neg; - k_pos = turbo_enc->tb_params.k_pos; - crc24_bits = 0; - if (check_bit(turbo_enc->op_flags, RTE_BBDEV_TURBO_CRC_24B_ATTACH)) - crc24_bits = 24; - while (length > 0 && r < c) { - k = (r < c_neg) ? k_neg : k_pos; - length -= (k - crc24_bits) >> 3; - r++; - cbs_in_tb++; - } - - return cbs_in_tb; -} - -/* Calculates number of CBs in processed decoder TB based on 'r' and input - * length. - */ -static inline uint16_t -get_num_cbs_in_tb_dec(struct rte_bbdev_op_turbo_dec *turbo_dec) -{ - uint8_t c, c_neg, r = 0; - uint16_t kw, k, k_neg, k_pos, cbs_in_tb = 0; - int32_t length; - - length = turbo_dec->input.length; - r = turbo_dec->tb_params.r; - c = turbo_dec->tb_params.c; - c_neg = turbo_dec->tb_params.c_neg; - k_neg = turbo_dec->tb_params.k_neg; - k_pos = turbo_dec->tb_params.k_pos; - while (length > 0 && r < c) { - k = (r < c_neg) ? k_neg : k_pos; - kw = RTE_ALIGN_CEIL(k + 4, 32) * 3; - length -= kw; - r++; - cbs_in_tb++; - } - - return cbs_in_tb; -} - -/* Calculates number of CBs in processed decoder TB based on 'r' and input - * length. - */ -static inline uint16_t -get_num_cbs_in_tb_ldpc_dec(struct rte_bbdev_op_ldpc_dec *ldpc_dec) -{ - uint16_t r, cbs_in_tb = 0; - int32_t length = ldpc_dec->input.length; - r = ldpc_dec->tb_params.r; - while (length > 0 && r < ldpc_dec->tb_params.c) { - length -= (r < ldpc_dec->tb_params.cab) ? - ldpc_dec->tb_params.ea : - ldpc_dec->tb_params.eb; - r++; - cbs_in_tb++; - } - return cbs_in_tb; -} - /* Enqueue encode operations for ACC100 device in CB mode. */ static uint16_t acc100_enqueue_enc_cb(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_enc_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; int32_t avail = q->sw_ring_depth + q->sw_ring_tail - q->sw_ring_head; uint16_t i; - union acc100_dma_desc *desc; + union acc_dma_desc *desc; int ret; for (i = 0; i < num; ++i) { @@ -3467,7 +2824,7 @@ static inline uint32_t hq_index(uint32_t offset) desc->req.sdone_enable = 1; desc->req.irq_enable = q->irq_enable; - acc100_dma_enqueue(q, i, &q_data->queue_stats); + acc_dma_enqueue(q, i, &q_data->queue_stats); /* Update stats */ q_data->queue_stats.enqueued_count += i; @@ -3475,32 +2832,15 @@ static inline uint32_t hq_index(uint32_t offset) return i; } -/* Check we can mux encode operations with common FCW */ -static inline bool -check_mux(struct rte_bbdev_enc_op **ops, uint16_t num) { - uint16_t i; - if (num <= 1) - return false; - for (i = 1; i < num; ++i) { - /* Only mux compatible code blocks */ - if (memcmp((uint8_t *)(&ops[i]->ldpc_enc) + ACC100_ENC_OFFSET, - (uint8_t *)(&ops[0]->ldpc_enc) + - ACC100_ENC_OFFSET, - ACC100_CMP_ENC_SIZE) != 0) - return false; - } - return true; -} - /** Enqueue encode operations for ACC100 device in CB mode. */ static inline uint16_t acc100_enqueue_ldpc_enc_cb(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_enc_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; int32_t avail = q->sw_ring_depth + q->sw_ring_tail - q->sw_ring_head; uint16_t i = 0; - union acc100_dma_desc *desc; + union acc_dma_desc *desc; int ret, desc_idx = 0; int16_t enq, left = num; @@ -3508,7 +2848,7 @@ static inline uint32_t hq_index(uint32_t offset) if (unlikely(avail < 1)) break; avail--; - enq = RTE_MIN(left, ACC100_MUX_5GDL_DESC); + enq = RTE_MIN(left, ACC_MUX_5GDL_DESC); if (check_mux(&ops[i], enq)) { ret = enqueue_ldpc_enc_n_op_cb(q, &ops[i], desc_idx, enq); @@ -3534,7 +2874,7 @@ static inline uint32_t hq_index(uint32_t offset) desc->req.sdone_enable = 1; desc->req.irq_enable = q->irq_enable; - acc100_dma_enqueue(q, desc_idx, &q_data->queue_stats); + acc_dma_enqueue(q, desc_idx, &q_data->queue_stats); /* Update stats */ q_data->queue_stats.enqueued_count += i; @@ -3548,7 +2888,7 @@ static inline uint32_t hq_index(uint32_t offset) acc100_enqueue_enc_tb(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_enc_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; int32_t avail = q->sw_ring_depth + q->sw_ring_tail - q->sw_ring_head; uint16_t i, enqueued_cbs = 0; uint8_t cbs_in_tb; @@ -3569,7 +2909,7 @@ static inline uint32_t hq_index(uint32_t offset) if (unlikely(enqueued_cbs == 0)) return 0; /* Nothing to enqueue */ - acc100_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); + acc_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); /* Update stats */ q_data->queue_stats.enqueued_count += i; @@ -3610,10 +2950,10 @@ static inline uint32_t hq_index(uint32_t offset) acc100_enqueue_dec_cb(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; int32_t avail = q->sw_ring_depth + q->sw_ring_tail - q->sw_ring_head; uint16_t i; - union acc100_dma_desc *desc; + union acc_dma_desc *desc; int ret; for (i = 0; i < num; ++i) { @@ -3636,7 +2976,7 @@ static inline uint32_t hq_index(uint32_t offset) desc->req.sdone_enable = 1; desc->req.irq_enable = q->irq_enable; - acc100_dma_enqueue(q, i, &q_data->queue_stats); + acc_dma_enqueue(q, i, &q_data->queue_stats); /* Update stats */ q_data->queue_stats.enqueued_count += i; @@ -3645,25 +2985,12 @@ static inline uint32_t hq_index(uint32_t offset) return i; } -/* Check we can mux encode operations with common FCW */ -static inline bool -cmp_ldpc_dec_op(struct rte_bbdev_dec_op **ops) { - /* Only mux compatible code blocks */ - if (memcmp((uint8_t *)(&ops[0]->ldpc_dec) + ACC100_DEC_OFFSET, - (uint8_t *)(&ops[1]->ldpc_dec) + - ACC100_DEC_OFFSET, ACC100_CMP_DEC_SIZE) != 0) { - return false; - } else - return true; -} - - /* Enqueue decode operations for ACC100 device in TB mode */ static uint16_t acc100_enqueue_ldpc_dec_tb(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; int32_t avail = q->sw_ring_depth + q->sw_ring_tail - q->sw_ring_head; uint16_t i, enqueued_cbs = 0; uint8_t cbs_in_tb; @@ -3683,7 +3010,7 @@ static inline uint32_t hq_index(uint32_t offset) enqueued_cbs += ret; } - acc100_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); + acc_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); /* Update stats */ q_data->queue_stats.enqueued_count += i; @@ -3696,10 +3023,10 @@ static inline uint32_t hq_index(uint32_t offset) acc100_enqueue_ldpc_dec_cb(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; int32_t avail = q->sw_ring_depth + q->sw_ring_tail - q->sw_ring_head; uint16_t i; - union acc100_dma_desc *desc; + union acc_dma_desc *desc; int ret; bool same_op = false; for (i = 0; i < num; ++i) { @@ -3732,7 +3059,7 @@ static inline uint32_t hq_index(uint32_t offset) desc->req.sdone_enable = 1; desc->req.irq_enable = q->irq_enable; - acc100_dma_enqueue(q, i, &q_data->queue_stats); + acc_dma_enqueue(q, i, &q_data->queue_stats); /* Update stats */ q_data->queue_stats.enqueued_count += i; @@ -3746,7 +3073,7 @@ static inline uint32_t hq_index(uint32_t offset) acc100_enqueue_dec_tb(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; int32_t avail = q->sw_ring_depth + q->sw_ring_tail - q->sw_ring_head; uint16_t i, enqueued_cbs = 0; uint8_t cbs_in_tb; @@ -3765,7 +3092,7 @@ static inline uint32_t hq_index(uint32_t offset) enqueued_cbs += ret; } - acc100_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); + acc_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); /* Update stats */ q_data->queue_stats.enqueued_count += i; @@ -3792,7 +3119,7 @@ static inline uint32_t hq_index(uint32_t offset) acc100_enqueue_ldpc_dec(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; int32_t aq_avail = q->aq_depth + (q->aq_dequeued - q->aq_enqueued) / 128; @@ -3808,11 +3135,11 @@ static inline uint32_t hq_index(uint32_t offset) /* Dequeue one encode operations from ACC100 device in CB mode */ static inline int -dequeue_enc_one_op_cb(struct acc100_queue *q, struct rte_bbdev_enc_op **ref_op, +dequeue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op, uint16_t total_dequeued_cbs, uint32_t *aq_dequeued) { - union acc100_dma_desc *desc, atom_desc; - union acc100_dma_rsp_desc rsp; + union acc_dma_desc *desc, atom_desc; + union acc_dma_rsp_desc rsp; struct rte_bbdev_enc_op *op; int i; @@ -3822,7 +3149,7 @@ static inline uint32_t hq_index(uint32_t offset) __ATOMIC_RELAXED); /* Check fdone bit */ - if (!(atom_desc.rsp.val & ACC100_FDONE)) + if (!(atom_desc.rsp.val & ACC_FDONE)) return -1; rsp.val = atom_desc.rsp.val; @@ -3843,7 +3170,7 @@ static inline uint32_t hq_index(uint32_t offset) (*aq_dequeued)++; desc->req.last_desc_in_batch = 0; } - desc->rsp.val = ACC100_DMA_DESC_TYPE; + desc->rsp.val = ACC_DMA_DESC_TYPE; desc->rsp.add_info_0 = 0; /*Reserved bits */ desc->rsp.add_info_1 = 0; /*Reserved bits */ @@ -3858,11 +3185,11 @@ static inline uint32_t hq_index(uint32_t offset) /* Dequeue one encode operations from ACC100 device in TB mode */ static inline int -dequeue_enc_one_op_tb(struct acc100_queue *q, struct rte_bbdev_enc_op **ref_op, +dequeue_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op, uint16_t total_dequeued_cbs, uint32_t *aq_dequeued) { - union acc100_dma_desc *desc, *last_desc, atom_desc; - union acc100_dma_rsp_desc rsp; + union acc_dma_desc *desc, *last_desc, atom_desc; + union acc_dma_rsp_desc rsp; struct rte_bbdev_enc_op *op; uint8_t i = 0; uint16_t current_dequeued_cbs = 0, cbs_in_tb; @@ -3873,7 +3200,7 @@ static inline uint32_t hq_index(uint32_t offset) __ATOMIC_RELAXED); /* Check fdone bit */ - if (!(atom_desc.rsp.val & ACC100_FDONE)) + if (!(atom_desc.rsp.val & ACC_FDONE)) return -1; /* Get number of CBs in dequeued TB */ @@ -3887,7 +3214,7 @@ static inline uint32_t hq_index(uint32_t offset) */ atom_desc.atom_hdr = __atomic_load_n((uint64_t *)last_desc, __ATOMIC_RELAXED); - if (!(atom_desc.rsp.val & ACC100_SDONE)) + if (!(atom_desc.rsp.val & ACC_SDONE)) return -1; /* Dequeue */ @@ -3915,7 +3242,7 @@ static inline uint32_t hq_index(uint32_t offset) (*aq_dequeued)++; desc->req.last_desc_in_batch = 0; } - desc->rsp.val = ACC100_DMA_DESC_TYPE; + desc->rsp.val = ACC_DMA_DESC_TYPE; desc->rsp.add_info_0 = 0; desc->rsp.add_info_1 = 0; total_dequeued_cbs++; @@ -3931,11 +3258,11 @@ static inline uint32_t hq_index(uint32_t offset) /* Dequeue one decode operation from ACC100 device in CB mode */ static inline int dequeue_dec_one_op_cb(struct rte_bbdev_queue_data *q_data, - struct acc100_queue *q, struct rte_bbdev_dec_op **ref_op, + struct acc_queue *q, struct rte_bbdev_dec_op **ref_op, uint16_t dequeued_cbs, uint32_t *aq_dequeued) { - union acc100_dma_desc *desc, atom_desc; - union acc100_dma_rsp_desc rsp; + union acc_dma_desc *desc, atom_desc; + union acc_dma_rsp_desc rsp; struct rte_bbdev_dec_op *op; desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) @@ -3944,7 +3271,7 @@ static inline uint32_t hq_index(uint32_t offset) __ATOMIC_RELAXED); /* Check fdone bit */ - if (!(atom_desc.rsp.val & ACC100_FDONE)) + if (!(atom_desc.rsp.val & ACC_FDONE)) return -1; rsp.val = atom_desc.rsp.val; @@ -3973,7 +3300,7 @@ static inline uint32_t hq_index(uint32_t offset) (*aq_dequeued)++; desc->req.last_desc_in_batch = 0; } - desc->rsp.val = ACC100_DMA_DESC_TYPE; + desc->rsp.val = ACC_DMA_DESC_TYPE; desc->rsp.add_info_0 = 0; desc->rsp.add_info_1 = 0; *ref_op = op; @@ -3985,11 +3312,11 @@ static inline uint32_t hq_index(uint32_t offset) /* Dequeue one decode operations from ACC100 device in CB mode */ static inline int dequeue_ldpc_dec_one_op_cb(struct rte_bbdev_queue_data *q_data, - struct acc100_queue *q, struct rte_bbdev_dec_op **ref_op, + struct acc_queue *q, struct rte_bbdev_dec_op **ref_op, uint16_t dequeued_cbs, uint32_t *aq_dequeued) { - union acc100_dma_desc *desc, atom_desc; - union acc100_dma_rsp_desc rsp; + union acc_dma_desc *desc, atom_desc; + union acc_dma_rsp_desc rsp; struct rte_bbdev_dec_op *op; desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) @@ -3998,7 +3325,7 @@ static inline uint32_t hq_index(uint32_t offset) __ATOMIC_RELAXED); /* Check fdone bit */ - if (!(atom_desc.rsp.val & ACC100_FDONE)) + if (!(atom_desc.rsp.val & ACC_FDONE)) return -1; rsp.val = atom_desc.rsp.val; @@ -4028,7 +3355,7 @@ static inline uint32_t hq_index(uint32_t offset) desc->req.last_desc_in_batch = 0; } - desc->rsp.val = ACC100_DMA_DESC_TYPE; + desc->rsp.val = ACC_DMA_DESC_TYPE; desc->rsp.add_info_0 = 0; desc->rsp.add_info_1 = 0; @@ -4040,11 +3367,11 @@ static inline uint32_t hq_index(uint32_t offset) /* Dequeue one decode operations from ACC100 device in TB mode. */ static inline int -dequeue_dec_one_op_tb(struct acc100_queue *q, struct rte_bbdev_dec_op **ref_op, +dequeue_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op **ref_op, uint16_t dequeued_cbs, uint32_t *aq_dequeued) { - union acc100_dma_desc *desc, *last_desc, atom_desc; - union acc100_dma_rsp_desc rsp; + union acc_dma_desc *desc, *last_desc, atom_desc; + union acc_dma_rsp_desc rsp; struct rte_bbdev_dec_op *op; uint8_t cbs_in_tb = 1, cb_idx = 0; @@ -4054,7 +3381,7 @@ static inline uint32_t hq_index(uint32_t offset) __ATOMIC_RELAXED); /* Check fdone bit */ - if (!(atom_desc.rsp.val & ACC100_FDONE)) + if (!(atom_desc.rsp.val & ACC_FDONE)) return -1; /* Dequeue */ @@ -4071,7 +3398,7 @@ static inline uint32_t hq_index(uint32_t offset) */ atom_desc.atom_hdr = __atomic_load_n((uint64_t *)last_desc, __ATOMIC_RELAXED); - if (!(atom_desc.rsp.val & ACC100_SDONE)) + if (!(atom_desc.rsp.val & ACC_SDONE)) return -1; /* Clearing status, it will be set based on response */ @@ -4103,7 +3430,7 @@ static inline uint32_t hq_index(uint32_t offset) (*aq_dequeued)++; desc->req.last_desc_in_batch = 0; } - desc->rsp.val = ACC100_DMA_DESC_TYPE; + desc->rsp.val = ACC_DMA_DESC_TYPE; desc->rsp.add_info_0 = 0; desc->rsp.add_info_1 = 0; dequeued_cbs++; @@ -4120,7 +3447,7 @@ static inline uint32_t hq_index(uint32_t offset) acc100_dequeue_enc(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_enc_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; uint16_t dequeue_num; uint32_t avail = q->sw_ring_head - q->sw_ring_tail; uint32_t aq_dequeued = 0; @@ -4166,7 +3493,7 @@ static inline uint32_t hq_index(uint32_t offset) acc100_dequeue_ldpc_enc(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_enc_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; uint32_t avail = q->sw_ring_head - q->sw_ring_tail; uint32_t aq_dequeued = 0; uint16_t dequeue_num, i, dequeued_cbs = 0, dequeued_descs = 0; @@ -4205,7 +3532,7 @@ static inline uint32_t hq_index(uint32_t offset) acc100_dequeue_dec(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; uint16_t dequeue_num; uint32_t avail = q->sw_ring_head - q->sw_ring_tail; uint32_t aq_dequeued = 0; @@ -4250,7 +3577,7 @@ static inline uint32_t hq_index(uint32_t offset) acc100_dequeue_ldpc_dec(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t num) { - struct acc100_queue *q = q_data->queue_private; + struct acc_queue *q = q_data->queue_private; uint16_t dequeue_num; uint32_t avail = q->sw_ring_head - q->sw_ring_tail; uint32_t aq_dequeued = 0; @@ -4310,17 +3637,17 @@ static inline uint32_t hq_index(uint32_t offset) /* Device variant specific handling */ if ((pci_dev->id.device_id == ACC100_PF_DEVICE_ID) || (pci_dev->id.device_id == ACC100_VF_DEVICE_ID)) { - ((struct acc100_device *) dev->data->dev_private)->device_variant = ACC100_VARIANT; - ((struct acc100_device *) dev->data->dev_private)->fcw_ld_fill = acc100_fcw_ld_fill; + ((struct acc_device *) dev->data->dev_private)->device_variant = ACC100_VARIANT; + ((struct acc_device *) dev->data->dev_private)->fcw_ld_fill = acc100_fcw_ld_fill; } else { - ((struct acc100_device *) dev->data->dev_private)->device_variant = ACC101_VARIANT; - ((struct acc100_device *) dev->data->dev_private)->fcw_ld_fill = acc101_fcw_ld_fill; + ((struct acc_device *) dev->data->dev_private)->device_variant = ACC101_VARIANT; + ((struct acc_device *) dev->data->dev_private)->fcw_ld_fill = acc101_fcw_ld_fill; } - ((struct acc100_device *) dev->data->dev_private)->pf_device = + ((struct acc_device *) dev->data->dev_private)->pf_device = !strcmp(drv->driver.name, RTE_STR(ACC100PF_DRIVER_NAME)); - ((struct acc100_device *) dev->data->dev_private)->mmio_base = + ((struct acc_device *) dev->data->dev_private)->mmio_base = pci_dev->mem_resource[0].addr; rte_bbdev_log_debug("Init device %s [%s] @ vaddr %p paddr %#"PRIx64"", @@ -4349,13 +3676,13 @@ static int acc100_pci_probe(struct rte_pci_driver *pci_drv, /* allocate device private memory */ bbdev->data->dev_private = rte_zmalloc_socket(dev_name, - sizeof(struct acc100_device), RTE_CACHE_LINE_SIZE, + sizeof(struct acc_device), RTE_CACHE_LINE_SIZE, pci_dev->device.numa_node); if (bbdev->data->dev_private == NULL) { rte_bbdev_log(CRIT, "Allocate of %zu bytes for device \"%s\" failed", - sizeof(struct acc100_device), dev_name); + sizeof(struct acc_device), dev_name); rte_bbdev_release(bbdev); return -ENOMEM; } @@ -4373,53 +3700,16 @@ static int acc100_pci_probe(struct rte_pci_driver *pci_drv, return 0; } -static int acc100_pci_remove(struct rte_pci_device *pci_dev) -{ - struct rte_bbdev *bbdev; - int ret; - uint8_t dev_id; - - if (pci_dev == NULL) - return -EINVAL; - - /* Find device */ - bbdev = rte_bbdev_get_named_dev(pci_dev->device.name); - if (bbdev == NULL) { - rte_bbdev_log(CRIT, - "Couldn't find HW dev \"%s\" to uninitialise it", - pci_dev->device.name); - return -ENODEV; - } - dev_id = bbdev->data->dev_id; - - /* free device private memory before close */ - rte_free(bbdev->data->dev_private); - - /* Close device */ - ret = rte_bbdev_close(dev_id); - if (ret < 0) - rte_bbdev_log(ERR, - "Device %i failed to close during uninit: %i", - dev_id, ret); - - /* release bbdev from library */ - rte_bbdev_release(bbdev); - - rte_bbdev_log_debug("Destroyed bbdev = %u", dev_id); - - return 0; -} - static struct rte_pci_driver acc100_pci_pf_driver = { .probe = acc100_pci_probe, - .remove = acc100_pci_remove, + .remove = acc_pci_remove, .id_table = pci_id_acc100_pf_map, .drv_flags = RTE_PCI_DRV_NEED_MAPPING }; static struct rte_pci_driver acc100_pci_vf_driver = { .probe = acc100_pci_probe, - .remove = acc100_pci_remove, + .remove = acc_pci_remove, .id_table = pci_id_acc100_vf_map, .drv_flags = RTE_PCI_DRV_NEED_MAPPING }; @@ -4437,51 +3727,51 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) * defined. */ static void -poweron_cleanup(struct rte_bbdev *bbdev, struct acc100_device *d, - struct rte_acc100_conf *conf) +poweron_cleanup(struct rte_bbdev *bbdev, struct acc_device *d, + struct rte_acc_conf *conf) { int i, template_idx, qg_idx; uint32_t address, status, value; printf("Need to clear power-on 5GUL status in internal memory\n"); /* Reset LDPC Cores */ for (i = 0; i < ACC100_ENGINES_MAX; i++) - acc100_reg_write(d, HWPfFecUl5gCntrlReg + - ACC100_ENGINE_OFFSET * i, ACC100_RESET_HI); - usleep(ACC100_LONG_WAIT); + acc_reg_write(d, HWPfFecUl5gCntrlReg + + ACC_ENGINE_OFFSET * i, ACC100_RESET_HI); + usleep(ACC_LONG_WAIT); for (i = 0; i < ACC100_ENGINES_MAX; i++) - acc100_reg_write(d, HWPfFecUl5gCntrlReg + - ACC100_ENGINE_OFFSET * i, ACC100_RESET_LO); - usleep(ACC100_LONG_WAIT); + acc_reg_write(d, HWPfFecUl5gCntrlReg + + ACC_ENGINE_OFFSET * i, ACC100_RESET_LO); + usleep(ACC_LONG_WAIT); /* Prepare dummy workload */ alloc_2x64mb_sw_rings_mem(bbdev, d, 0); /* Set base addresses */ uint32_t phys_high = (uint32_t)(d->sw_rings_iova >> 32); uint32_t phys_low = (uint32_t)(d->sw_rings_iova & - ~(ACC100_SIZE_64MBYTE-1)); - acc100_reg_write(d, HWPfDmaFec5GulDescBaseHiRegVf, phys_high); - acc100_reg_write(d, HWPfDmaFec5GulDescBaseLoRegVf, phys_low); + ~(ACC_SIZE_64MBYTE-1)); + acc_reg_write(d, HWPfDmaFec5GulDescBaseHiRegVf, phys_high); + acc_reg_write(d, HWPfDmaFec5GulDescBaseLoRegVf, phys_low); /* Descriptor for a dummy 5GUL code block processing*/ - union acc100_dma_desc *desc = NULL; + union acc_dma_desc *desc = NULL; desc = d->sw_rings; desc->req.data_ptrs[0].address = d->sw_rings_iova + - ACC100_DESC_FCW_OFFSET; - desc->req.data_ptrs[0].blen = ACC100_FCW_LD_BLEN; - desc->req.data_ptrs[0].blkid = ACC100_DMA_BLKID_FCW; + ACC_DESC_FCW_OFFSET; + desc->req.data_ptrs[0].blen = ACC_FCW_LD_BLEN; + desc->req.data_ptrs[0].blkid = ACC_DMA_BLKID_FCW; desc->req.data_ptrs[0].last = 0; desc->req.data_ptrs[0].dma_ext = 0; desc->req.data_ptrs[1].address = d->sw_rings_iova + 512; - desc->req.data_ptrs[1].blkid = ACC100_DMA_BLKID_IN; + desc->req.data_ptrs[1].blkid = ACC_DMA_BLKID_IN; desc->req.data_ptrs[1].last = 1; desc->req.data_ptrs[1].dma_ext = 0; desc->req.data_ptrs[1].blen = 44; desc->req.data_ptrs[2].address = d->sw_rings_iova + 1024; - desc->req.data_ptrs[2].blkid = ACC100_DMA_BLKID_OUT_ENC; + desc->req.data_ptrs[2].blkid = ACC_DMA_BLKID_OUT_ENC; desc->req.data_ptrs[2].last = 1; desc->req.data_ptrs[2].dma_ext = 0; desc->req.data_ptrs[2].blen = 5; /* Dummy FCW */ - desc->req.fcw_ld.FCWversion = ACC100_FCW_VER; + desc->req.fcw_ld.FCWversion = ACC_FCW_VER; desc->req.fcw_ld.qm = 1; desc->req.fcw_ld.nfiller = 30; desc->req.fcw_ld.BG = 2 - 1; @@ -4500,8 +3790,8 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) template_idx++) { /* Check engine power-on status */ address = HwPfFecUl5gIbDebugReg + - ACC100_ENGINE_OFFSET * template_idx; - status = (acc100_reg_read(d, address) >> 4) & 0xF; + ACC_ENGINE_OFFSET * template_idx; + status = (acc_reg_read(d, address) >> 4) & 0xF; if (status == 0) { engines_to_restart[num_failed_engine] = template_idx; num_failed_engine++; @@ -4521,14 +3811,14 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) template_idx <= ACC100_SIG_UL_5G_LAST; template_idx++) { address = HWPfQmgrGrpTmplateReg4Indx - + ACC100_BYTES_IN_WORD * template_idx; + + ACC_BYTES_IN_WORD * template_idx; if (template_idx == failed_engine) - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); else - acc100_reg_write(d, address, 0); + acc_reg_write(d, address, 0); } /* Reset descriptor header */ - desc->req.word0 = ACC100_DMA_DESC_TYPE; + desc->req.word0 = ACC_DMA_DESC_TYPE; desc->req.word1 = 0; desc->req.word2 = 0; desc->req.word3 = 0; @@ -4536,56 +3826,56 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) desc->req.m2dlen = 2; desc->req.d2mlen = 1; /* Enqueue the code block for processing */ - union acc100_enqueue_reg_fmt enq_req; + union acc_enqueue_reg_fmt enq_req; enq_req.val = 0; - enq_req.addr_offset = ACC100_DESC_OFFSET; + enq_req.addr_offset = ACC_DESC_OFFSET; enq_req.num_elem = 1; enq_req.req_elem_addr = 0; rte_wmb(); - acc100_reg_write(d, HWPfQmgrIngressAq + 0x100, enq_req.val); - usleep(ACC100_LONG_WAIT * 100); + acc_reg_write(d, HWPfQmgrIngressAq + 0x100, enq_req.val); + usleep(ACC_LONG_WAIT * 100); if (desc->req.word0 != 2) printf("DMA Response %#"PRIx32"\n", desc->req.word0); } /* Reset LDPC Cores */ for (i = 0; i < ACC100_ENGINES_MAX; i++) - acc100_reg_write(d, HWPfFecUl5gCntrlReg + - ACC100_ENGINE_OFFSET * i, + acc_reg_write(d, HWPfFecUl5gCntrlReg + + ACC_ENGINE_OFFSET * i, ACC100_RESET_HI); - usleep(ACC100_LONG_WAIT); + usleep(ACC_LONG_WAIT); for (i = 0; i < ACC100_ENGINES_MAX; i++) - acc100_reg_write(d, HWPfFecUl5gCntrlReg + - ACC100_ENGINE_OFFSET * i, + acc_reg_write(d, HWPfFecUl5gCntrlReg + + ACC_ENGINE_OFFSET * i, ACC100_RESET_LO); - usleep(ACC100_LONG_WAIT); - acc100_reg_write(d, HWPfHi5GHardResetReg, ACC100_RESET_HARD); - usleep(ACC100_LONG_WAIT); + usleep(ACC_LONG_WAIT); + acc_reg_write(d, HWPfHi5GHardResetReg, ACC100_RESET_HARD); + usleep(ACC_LONG_WAIT); int numEngines = 0; /* Check engine power-on status again */ for (template_idx = ACC100_SIG_UL_5G; template_idx <= ACC100_SIG_UL_5G_LAST; template_idx++) { address = HwPfFecUl5gIbDebugReg + - ACC100_ENGINE_OFFSET * template_idx; - status = (acc100_reg_read(d, address) >> 4) & 0xF; + ACC_ENGINE_OFFSET * template_idx; + status = (acc_reg_read(d, address) >> 4) & 0xF; address = HWPfQmgrGrpTmplateReg4Indx - + ACC100_BYTES_IN_WORD * template_idx; + + ACC_BYTES_IN_WORD * template_idx; if (status == 1) { - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); numEngines++; } else - acc100_reg_write(d, address, 0); + acc_reg_write(d, address, 0); } printf("Number of 5GUL engines %d\n", numEngines); rte_free(d->sw_rings_base); - usleep(ACC100_LONG_WAIT); + usleep(ACC_LONG_WAIT); } /* Initial configuration of a ACC100 device prior to running configure() */ static int -acc100_configure(const char *dev_name, struct rte_acc100_conf *conf) +acc100_configure(const char *dev_name, struct rte_acc_conf *conf) { rte_bbdev_log(INFO, "rte_acc100_configure"); uint32_t value, address, status; @@ -4593,10 +3883,10 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) struct rte_bbdev *bbdev = rte_bbdev_get_named_dev(dev_name); /* Compile time checks */ - RTE_BUILD_BUG_ON(sizeof(struct acc100_dma_req_desc) != 256); - RTE_BUILD_BUG_ON(sizeof(union acc100_dma_desc) != 256); - RTE_BUILD_BUG_ON(sizeof(struct acc100_fcw_td) != 24); - RTE_BUILD_BUG_ON(sizeof(struct acc100_fcw_te) != 32); + RTE_BUILD_BUG_ON(sizeof(struct acc_dma_req_desc) != 256); + RTE_BUILD_BUG_ON(sizeof(union acc_dma_desc) != 256); + RTE_BUILD_BUG_ON(sizeof(struct acc_fcw_td) != 24); + RTE_BUILD_BUG_ON(sizeof(struct acc_fcw_te) != 32); if (bbdev == NULL) { rte_bbdev_log(ERR, @@ -4604,87 +3894,87 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) dev_name); return -ENODEV; } - struct acc100_device *d = bbdev->data->dev_private; + struct acc_device *d = bbdev->data->dev_private; /* Store configuration */ - rte_memcpy(&d->acc100_conf, conf, sizeof(d->acc100_conf)); + rte_memcpy(&d->acc_conf, conf, sizeof(d->acc_conf)); - value = acc100_reg_read(d, HwPfPcieGpexBridgeControl); + value = acc_reg_read(d, HwPfPcieGpexBridgeControl); bool firstCfg = (value != ACC100_CFG_PCI_BRIDGE); /* PCIe Bridge configuration */ - acc100_reg_write(d, HwPfPcieGpexBridgeControl, ACC100_CFG_PCI_BRIDGE); + acc_reg_write(d, HwPfPcieGpexBridgeControl, ACC100_CFG_PCI_BRIDGE); for (i = 1; i < ACC100_GPEX_AXIMAP_NUM; i++) - acc100_reg_write(d, + acc_reg_write(d, HwPfPcieGpexAxiAddrMappingWindowPexBaseHigh + i * 16, 0); /* Prevent blocking AXI read on BRESP for AXI Write */ address = HwPfPcieGpexAxiPioControl; value = ACC100_CFG_PCI_AXI; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* 5GDL PLL phase shift */ - acc100_reg_write(d, HWPfChaDl5gPllPhshft0, 0x1); + acc_reg_write(d, HWPfChaDl5gPllPhshft0, 0x1); /* Explicitly releasing AXI as this may be stopped after PF FLR/BME */ address = HWPfDmaAxiControl; value = 1; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Enable granular dynamic clock gating */ address = HWPfHiClkGateHystReg; value = ACC100_CLOCK_GATING_EN; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Set default descriptor signature */ address = HWPfDmaDescriptorSignatuture; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Enable the Error Detection in DMA */ value = ACC100_CFG_DMA_ERROR; address = HWPfDmaErrorDetectionEn; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* AXI Cache configuration */ value = ACC100_CFG_AXI_CACHE; address = HWPfDmaAxcacheReg; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Adjust PCIe Lane adaptation */ for (i = 0; i < ACC100_QUAD_NUMS; i++) for (j = 0; j < ACC100_LANES_PER_QUAD; j++) - acc100_reg_write(d, HwPfPcieLnAdaptctrl + i * ACC100_PCIE_QUAD_OFFSET + acc_reg_write(d, HwPfPcieLnAdaptctrl + i * ACC100_PCIE_QUAD_OFFSET + j * ACC100_PCIE_LANE_OFFSET, ACC100_ADAPT); /* Enable PCIe live adaptation */ for (i = 0; i < ACC100_QUAD_NUMS; i++) - acc100_reg_write(d, HwPfPciePcsEqControl + + acc_reg_write(d, HwPfPciePcsEqControl + i * ACC100_PCIE_QUAD_OFFSET, ACC100_PCS_EQ); /* Default DMA Configuration (Qmgr Enabled) */ address = HWPfDmaConfig0Reg; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); address = HWPfDmaQmanen; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Default RLIM/ALEN configuration */ address = HWPfDmaConfig1Reg; value = (1 << 31) + (23 << 8) + (1 << 6) + 7; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Configure DMA Qmanager addresses */ address = HWPfDmaQmgrAddrReg; value = HWPfQmgrEgressQueuesTemplate; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Default Fabric Mode */ address = HWPfFabricMode; value = ACC100_FABRIC_MODE; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* ===== Qmgr Configuration ===== */ /* Configuration of the AQueue Depth QMGR_GRP_0_DEPTH_LOG2 for UL */ @@ -4694,42 +3984,42 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) conf->q_dl_5g.num_qgroups; for (qg_idx = 0; qg_idx < totalQgs; qg_idx++) { address = HWPfQmgrDepthLog2Grp + - ACC100_BYTES_IN_WORD * qg_idx; + ACC_BYTES_IN_WORD * qg_idx; value = aqDepth(qg_idx, conf); - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); address = HWPfQmgrTholdGrp + - ACC100_BYTES_IN_WORD * qg_idx; + ACC_BYTES_IN_WORD * qg_idx; value = (1 << 16) + (1 << (aqDepth(qg_idx, conf) - 1)); - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); } /* Template Priority in incremental order */ - for (template_idx = 0; template_idx < ACC100_NUM_TMPL; template_idx++) { - address = HWPfQmgrGrpTmplateReg0Indx + ACC100_BYTES_IN_WORD * template_idx; - value = ACC100_TMPL_PRI_0; - acc100_reg_write(d, address, value); - address = HWPfQmgrGrpTmplateReg1Indx + ACC100_BYTES_IN_WORD * template_idx; - value = ACC100_TMPL_PRI_1; - acc100_reg_write(d, address, value); - address = HWPfQmgrGrpTmplateReg2indx + ACC100_BYTES_IN_WORD * template_idx; - value = ACC100_TMPL_PRI_2; - acc100_reg_write(d, address, value); - address = HWPfQmgrGrpTmplateReg3Indx + ACC100_BYTES_IN_WORD * template_idx; - value = ACC100_TMPL_PRI_3; - acc100_reg_write(d, address, value); + for (template_idx = 0; template_idx < ACC_NUM_TMPL; template_idx++) { + address = HWPfQmgrGrpTmplateReg0Indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_0; + acc_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg1Indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_1; + acc_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg2indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_2; + acc_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg3Indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_3; + acc_reg_write(d, address, value); } address = HWPfQmgrGrpPriority; value = ACC100_CFG_QMGR_HI_P; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Template Configuration */ - for (template_idx = 0; template_idx < ACC100_NUM_TMPL; + for (template_idx = 0; template_idx < ACC_NUM_TMPL; template_idx++) { value = 0; address = HWPfQmgrGrpTmplateReg4Indx - + ACC100_BYTES_IN_WORD * template_idx; - acc100_reg_write(d, address, value); + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); } /* 4GUL */ int numQgs = conf->q_ul_4g.num_qgroups; @@ -4741,8 +4031,8 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) template_idx <= ACC100_SIG_UL_4G_LAST; template_idx++) { address = HWPfQmgrGrpTmplateReg4Indx - + ACC100_BYTES_IN_WORD * template_idx; - acc100_reg_write(d, address, value); + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); } /* 5GUL */ numQqsAcc += numQgs; @@ -4756,15 +4046,15 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) template_idx++) { /* Check engine power-on status */ address = HwPfFecUl5gIbDebugReg + - ACC100_ENGINE_OFFSET * template_idx; - status = (acc100_reg_read(d, address) >> 4) & 0xF; + ACC_ENGINE_OFFSET * template_idx; + status = (acc_reg_read(d, address) >> 4) & 0xF; address = HWPfQmgrGrpTmplateReg4Indx - + ACC100_BYTES_IN_WORD * template_idx; + + ACC_BYTES_IN_WORD * template_idx; if (status == 1) { - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); numEngines++; } else - acc100_reg_write(d, address, 0); + acc_reg_write(d, address, 0); } printf("Number of 5GUL engines %d\n", numEngines); /* 4GDL */ @@ -4777,8 +4067,8 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) template_idx <= ACC100_SIG_DL_4G_LAST; template_idx++) { address = HWPfQmgrGrpTmplateReg4Indx - + ACC100_BYTES_IN_WORD * template_idx; - acc100_reg_write(d, address, value); + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); } /* 5GDL */ numQqsAcc += numQgs; @@ -4790,8 +4080,8 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) template_idx <= ACC100_SIG_DL_5G_LAST; template_idx++) { address = HWPfQmgrGrpTmplateReg4Indx - + ACC100_BYTES_IN_WORD * template_idx; - acc100_reg_write(d, address, value); + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); } /* Queue Group Function mapping */ @@ -4802,14 +4092,14 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) acc = accFromQgid(qg_idx, conf); value |= qman_func_id[acc]<<(qg_idx * 4); } - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Configuration of the Arbitration QGroup depth to 1 */ for (qg_idx = 0; qg_idx < totalQgs; qg_idx++) { address = HWPfQmgrArbQDepthGrp + - ACC100_BYTES_IN_WORD * qg_idx; + ACC_BYTES_IN_WORD * qg_idx; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); } /* Enabling AQueues through the Queue hierarchy*/ @@ -4820,9 +4110,9 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) qg_idx < totalQgs) value = (1 << aqNum(qg_idx, conf)) - 1; address = HWPfQmgrAqEnableVf - + vf_idx * ACC100_BYTES_IN_WORD; + + vf_idx * ACC_BYTES_IN_WORD; value += (qg_idx << 16); - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); } } @@ -4831,10 +4121,10 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) for (qg_idx = 0; qg_idx < totalQgs; qg_idx++) { for (vf_idx = 0; vf_idx < conf->num_vf_bundles; vf_idx++) { address = HWPfQmgrVfBaseAddr + vf_idx - * ACC100_BYTES_IN_WORD + qg_idx - * ACC100_BYTES_IN_WORD * 64; + * ACC_BYTES_IN_WORD + qg_idx + * ACC_BYTES_IN_WORD * 64; value = aram_address; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Offset ARAM Address for next memory bank * - increment of 4B */ @@ -4852,29 +4142,29 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) /* ==== HI Configuration ==== */ /* No Info Ring/MSI by default */ - acc100_reg_write(d, HWPfHiInfoRingIntWrEnRegPf, 0); - acc100_reg_write(d, HWPfHiInfoRingVf2pfLoWrEnReg, 0); - acc100_reg_write(d, HWPfHiCfgMsiIntWrEnRegPf, 0xFFFFFFFF); - acc100_reg_write(d, HWPfHiCfgMsiVf2pfLoWrEnReg, 0xFFFFFFFF); + acc_reg_write(d, HWPfHiInfoRingIntWrEnRegPf, 0); + acc_reg_write(d, HWPfHiInfoRingVf2pfLoWrEnReg, 0); + acc_reg_write(d, HWPfHiCfgMsiIntWrEnRegPf, 0xFFFFFFFF); + acc_reg_write(d, HWPfHiCfgMsiVf2pfLoWrEnReg, 0xFFFFFFFF); /* Prevent Block on Transmit Error */ address = HWPfHiBlockTransmitOnErrorEn; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Prevents to drop MSI */ address = HWPfHiMsiDropEnableReg; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Set the PF Mode register */ address = HWPfHiPfMode; - value = (conf->pf_mode_en) ? ACC100_PF_VAL : 0; - acc100_reg_write(d, address, value); + value = (conf->pf_mode_en) ? ACC_PF_VAL : 0; + acc_reg_write(d, address, value); /* QoS overflow init */ value = 1; address = HWPfQosmonAEvalOverflow0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); address = HWPfQosmonBEvalOverflow0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* HARQ DDR Configuration */ unsigned int ddrSizeInMb = ACC100_HARQ_DDR; @@ -4883,9 +4173,9 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) * 0x10; value = ((vf_idx * (ddrSizeInMb / 64)) << 16) + (ddrSizeInMb - 1); - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); } - usleep(ACC100_LONG_WAIT); + usleep(ACC_LONG_WAIT); /* Workaround in case some 5GUL engines are in an unexpected state */ if (numEngines < (ACC100_SIG_UL_5G_LAST + 1)) @@ -4893,7 +4183,7 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) uint32_t version = 0; for (i = 0; i < 4; i++) - version += acc100_reg_read(d, + version += acc_reg_read(d, HWPfDdrPhyIdtmFwVersion + 4 * i) << (8 * i); if (version != ACC100_PRQ_DDR_VER) { printf("* Note: Not on DDR PRQ version %8x != %08x\n", @@ -4901,76 +4191,76 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) } else if (firstCfg) { /* ---- DDR configuration at boot up --- */ /* Read Clear Ddr training status */ - acc100_reg_read(d, HWPfChaDdrStDoneStatus); + acc_reg_read(d, HWPfChaDdrStDoneStatus); /* Reset PHY/IDTM/UMMC */ - acc100_reg_write(d, HWPfChaDdrWbRstCfg, 3); - acc100_reg_write(d, HWPfChaDdrApbRstCfg, 2); - acc100_reg_write(d, HWPfChaDdrPhyRstCfg, 2); - acc100_reg_write(d, HWPfChaDdrCpuRstCfg, 3); - acc100_reg_write(d, HWPfChaDdrSifRstCfg, 2); - usleep(ACC100_MS_IN_US); + acc_reg_write(d, HWPfChaDdrWbRstCfg, 3); + acc_reg_write(d, HWPfChaDdrApbRstCfg, 2); + acc_reg_write(d, HWPfChaDdrPhyRstCfg, 2); + acc_reg_write(d, HWPfChaDdrCpuRstCfg, 3); + acc_reg_write(d, HWPfChaDdrSifRstCfg, 2); + usleep(ACC_MS_IN_US); /* Reset WB and APB resets */ - acc100_reg_write(d, HWPfChaDdrWbRstCfg, 2); - acc100_reg_write(d, HWPfChaDdrApbRstCfg, 3); + acc_reg_write(d, HWPfChaDdrWbRstCfg, 2); + acc_reg_write(d, HWPfChaDdrApbRstCfg, 3); /* Configure PHY-IDTM */ - acc100_reg_write(d, HWPfDdrPhyIdletimeout, 0x3e8); + acc_reg_write(d, HWPfDdrPhyIdletimeout, 0x3e8); /* IDTM timing registers */ - acc100_reg_write(d, HWPfDdrPhyRdLatency, 0x13); - acc100_reg_write(d, HWPfDdrPhyRdLatencyDbi, 0x15); - acc100_reg_write(d, HWPfDdrPhyWrLatency, 0x10011); + acc_reg_write(d, HWPfDdrPhyRdLatency, 0x13); + acc_reg_write(d, HWPfDdrPhyRdLatencyDbi, 0x15); + acc_reg_write(d, HWPfDdrPhyWrLatency, 0x10011); /* Configure SDRAM MRS registers */ - acc100_reg_write(d, HWPfDdrPhyMr01Dimm, 0x3030b70); - acc100_reg_write(d, HWPfDdrPhyMr01DimmDbi, 0x3030b50); - acc100_reg_write(d, HWPfDdrPhyMr23Dimm, 0x30); - acc100_reg_write(d, HWPfDdrPhyMr67Dimm, 0xc00); - acc100_reg_write(d, HWPfDdrPhyMr45Dimm, 0x4000000); + acc_reg_write(d, HWPfDdrPhyMr01Dimm, 0x3030b70); + acc_reg_write(d, HWPfDdrPhyMr01DimmDbi, 0x3030b50); + acc_reg_write(d, HWPfDdrPhyMr23Dimm, 0x30); + acc_reg_write(d, HWPfDdrPhyMr67Dimm, 0xc00); + acc_reg_write(d, HWPfDdrPhyMr45Dimm, 0x4000000); /* Configure active lanes */ - acc100_reg_write(d, HWPfDdrPhyDqsCountMax, 0x9); - acc100_reg_write(d, HWPfDdrPhyDqsCountNum, 0x9); + acc_reg_write(d, HWPfDdrPhyDqsCountMax, 0x9); + acc_reg_write(d, HWPfDdrPhyDqsCountNum, 0x9); /* Configure WR/RD leveling timing registers */ - acc100_reg_write(d, HWPfDdrPhyWrlvlWwRdlvlRr, 0x101212); + acc_reg_write(d, HWPfDdrPhyWrlvlWwRdlvlRr, 0x101212); /* Configure what trainings to execute */ - acc100_reg_write(d, HWPfDdrPhyTrngType, 0x2d3c); + acc_reg_write(d, HWPfDdrPhyTrngType, 0x2d3c); /* Releasing PHY reset */ - acc100_reg_write(d, HWPfChaDdrPhyRstCfg, 3); + acc_reg_write(d, HWPfChaDdrPhyRstCfg, 3); /* Configure Memory Controller registers */ - acc100_reg_write(d, HWPfDdrMemInitPhyTrng0, 0x3); - acc100_reg_write(d, HWPfDdrBcDram, 0x3c232003); - acc100_reg_write(d, HWPfDdrBcAddrMap, 0x31); + acc_reg_write(d, HWPfDdrMemInitPhyTrng0, 0x3); + acc_reg_write(d, HWPfDdrBcDram, 0x3c232003); + acc_reg_write(d, HWPfDdrBcAddrMap, 0x31); /* Configure UMMC BC timing registers */ - acc100_reg_write(d, HWPfDdrBcRef, 0xa22); - acc100_reg_write(d, HWPfDdrBcTim0, 0x4050501); - acc100_reg_write(d, HWPfDdrBcTim1, 0xf0b0476); - acc100_reg_write(d, HWPfDdrBcTim2, 0x103); - acc100_reg_write(d, HWPfDdrBcTim3, 0x144050a1); - acc100_reg_write(d, HWPfDdrBcTim4, 0x23300); - acc100_reg_write(d, HWPfDdrBcTim5, 0x4230276); - acc100_reg_write(d, HWPfDdrBcTim6, 0x857914); - acc100_reg_write(d, HWPfDdrBcTim7, 0x79100232); - acc100_reg_write(d, HWPfDdrBcTim8, 0x100007ce); - acc100_reg_write(d, HWPfDdrBcTim9, 0x50020); - acc100_reg_write(d, HWPfDdrBcTim10, 0x40ee); + acc_reg_write(d, HWPfDdrBcRef, 0xa22); + acc_reg_write(d, HWPfDdrBcTim0, 0x4050501); + acc_reg_write(d, HWPfDdrBcTim1, 0xf0b0476); + acc_reg_write(d, HWPfDdrBcTim2, 0x103); + acc_reg_write(d, HWPfDdrBcTim3, 0x144050a1); + acc_reg_write(d, HWPfDdrBcTim4, 0x23300); + acc_reg_write(d, HWPfDdrBcTim5, 0x4230276); + acc_reg_write(d, HWPfDdrBcTim6, 0x857914); + acc_reg_write(d, HWPfDdrBcTim7, 0x79100232); + acc_reg_write(d, HWPfDdrBcTim8, 0x100007ce); + acc_reg_write(d, HWPfDdrBcTim9, 0x50020); + acc_reg_write(d, HWPfDdrBcTim10, 0x40ee); /* Configure UMMC DFI timing registers */ - acc100_reg_write(d, HWPfDdrDfiInit, 0x5000); - acc100_reg_write(d, HWPfDdrDfiTim0, 0x15030006); - acc100_reg_write(d, HWPfDdrDfiTim1, 0x11305); - acc100_reg_write(d, HWPfDdrDfiPhyUpdEn, 0x1); - acc100_reg_write(d, HWPfDdrUmmcIntEn, 0x1f); + acc_reg_write(d, HWPfDdrDfiInit, 0x5000); + acc_reg_write(d, HWPfDdrDfiTim0, 0x15030006); + acc_reg_write(d, HWPfDdrDfiTim1, 0x11305); + acc_reg_write(d, HWPfDdrDfiPhyUpdEn, 0x1); + acc_reg_write(d, HWPfDdrUmmcIntEn, 0x1f); /* Release IDTM CPU out of reset */ - acc100_reg_write(d, HWPfChaDdrCpuRstCfg, 0x2); + acc_reg_write(d, HWPfChaDdrCpuRstCfg, 0x2); /* Wait PHY-IDTM to finish static training */ for (i = 0; i < ACC100_DDR_TRAINING_MAX; i++) { - usleep(ACC100_MS_IN_US); - value = acc100_reg_read(d, + usleep(ACC_MS_IN_US); + value = acc_reg_read(d, HWPfChaDdrStDoneStatus); if (value & 1) break; } printf("DDR Training completed in %d ms", i); /* Enable Memory Controller */ - acc100_reg_write(d, HWPfDdrUmmcCtrl, 0x401); + acc_reg_write(d, HWPfDdrUmmcCtrl, 0x401); /* Release AXI interface reset */ - acc100_reg_write(d, HWPfChaDdrSifRstCfg, 3); + acc_reg_write(d, HWPfChaDdrSifRstCfg, 3); } rte_bbdev_log_debug("PF Tip configuration complete for %s", dev_name); @@ -4980,7 +4270,7 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) /* Initial configuration of a ACC101 device prior to running configure() */ static int -acc101_configure(const char *dev_name, struct rte_acc100_conf *conf) +acc101_configure(const char *dev_name, struct rte_acc_conf *conf) { rte_bbdev_log(INFO, "rte_acc101_configure"); uint32_t value, address, status; @@ -4988,10 +4278,10 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) struct rte_bbdev *bbdev = rte_bbdev_get_named_dev(dev_name); /* Compile time checks */ - RTE_BUILD_BUG_ON(sizeof(struct acc100_dma_req_desc) != 256); - RTE_BUILD_BUG_ON(sizeof(union acc100_dma_desc) != 256); - RTE_BUILD_BUG_ON(sizeof(struct acc100_fcw_td) != 24); - RTE_BUILD_BUG_ON(sizeof(struct acc100_fcw_te) != 32); + RTE_BUILD_BUG_ON(sizeof(struct acc_dma_req_desc) != 256); + RTE_BUILD_BUG_ON(sizeof(union acc_dma_desc) != 256); + RTE_BUILD_BUG_ON(sizeof(struct acc_fcw_td) != 24); + RTE_BUILD_BUG_ON(sizeof(struct acc_fcw_te) != 32); if (bbdev == NULL) { rte_bbdev_log(ERR, @@ -4999,67 +4289,67 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) dev_name); return -ENODEV; } - struct acc100_device *d = bbdev->data->dev_private; + struct acc_device *d = bbdev->data->dev_private; /* Store configuration */ - rte_memcpy(&d->acc100_conf, conf, sizeof(d->acc100_conf)); + rte_memcpy(&d->acc_conf, conf, sizeof(d->acc_conf)); /* PCIe Bridge configuration */ - acc100_reg_write(d, HwPfPcieGpexBridgeControl, ACC101_CFG_PCI_BRIDGE); + acc_reg_write(d, HwPfPcieGpexBridgeControl, ACC101_CFG_PCI_BRIDGE); for (i = 1; i < ACC101_GPEX_AXIMAP_NUM; i++) - acc100_reg_write(d, HwPfPcieGpexAxiAddrMappingWindowPexBaseHigh + i * 16, 0); + acc_reg_write(d, HwPfPcieGpexAxiAddrMappingWindowPexBaseHigh + i * 16, 0); /* Prevent blocking AXI read on BRESP for AXI Write */ address = HwPfPcieGpexAxiPioControl; value = ACC101_CFG_PCI_AXI; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Explicitly releasing AXI including a 2ms delay on ACC101 */ usleep(2000); - acc100_reg_write(d, HWPfDmaAxiControl, 1); + acc_reg_write(d, HWPfDmaAxiControl, 1); /* Set the default 5GDL DMA configuration */ - acc100_reg_write(d, HWPfDmaInboundDrainDataSize, ACC101_DMA_INBOUND); + acc_reg_write(d, HWPfDmaInboundDrainDataSize, ACC101_DMA_INBOUND); /* Enable granular dynamic clock gating */ address = HWPfHiClkGateHystReg; value = ACC101_CLOCK_GATING_EN; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Set default descriptor signature */ address = HWPfDmaDescriptorSignatuture; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Enable the Error Detection in DMA */ value = ACC101_CFG_DMA_ERROR; address = HWPfDmaErrorDetectionEn; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* AXI Cache configuration */ value = ACC101_CFG_AXI_CACHE; address = HWPfDmaAxcacheReg; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Default DMA Configuration (Qmgr Enabled) */ address = HWPfDmaConfig0Reg; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); address = HWPfDmaQmanen; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Default RLIM/ALEN configuration */ address = HWPfDmaConfig1Reg; int alen_r = 0xF; int alen_w = 0x7; value = (1 << 31) + (alen_w << 20) + (1 << 6) + alen_r; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Configure DMA Qmanager addresses */ address = HWPfDmaQmgrAddrReg; value = HWPfQmgrEgressQueuesTemplate; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* ===== Qmgr Configuration ===== */ /* Configuration of the AQueue Depth QMGR_GRP_0_DEPTH_LOG2 for UL */ @@ -5069,43 +4359,43 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) conf->q_dl_5g.num_qgroups; for (qg_idx = 0; qg_idx < totalQgs; qg_idx++) { address = HWPfQmgrDepthLog2Grp + - ACC101_BYTES_IN_WORD * qg_idx; + ACC_BYTES_IN_WORD * qg_idx; value = aqDepth(qg_idx, conf); - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); address = HWPfQmgrTholdGrp + - ACC101_BYTES_IN_WORD * qg_idx; + ACC_BYTES_IN_WORD * qg_idx; value = (1 << 16) + (1 << (aqDepth(qg_idx, conf) - 1)); - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); } /* Template Priority in incremental order */ - for (template_idx = 0; template_idx < ACC101_NUM_TMPL; + for (template_idx = 0; template_idx < ACC_NUM_TMPL; template_idx++) { - address = HWPfQmgrGrpTmplateReg0Indx + ACC101_BYTES_IN_WORD * template_idx; - value = ACC101_TMPL_PRI_0; - acc100_reg_write(d, address, value); - address = HWPfQmgrGrpTmplateReg1Indx + ACC101_BYTES_IN_WORD * template_idx; - value = ACC101_TMPL_PRI_1; - acc100_reg_write(d, address, value); - address = HWPfQmgrGrpTmplateReg2indx + ACC101_BYTES_IN_WORD * template_idx; - value = ACC101_TMPL_PRI_2; - acc100_reg_write(d, address, value); - address = HWPfQmgrGrpTmplateReg3Indx + ACC101_BYTES_IN_WORD * template_idx; - value = ACC101_TMPL_PRI_3; - acc100_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg0Indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_0; + acc_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg1Indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_1; + acc_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg2indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_2; + acc_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg3Indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_3; + acc_reg_write(d, address, value); } address = HWPfQmgrGrpPriority; value = ACC101_CFG_QMGR_HI_P; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Template Configuration */ - for (template_idx = 0; template_idx < ACC101_NUM_TMPL; + for (template_idx = 0; template_idx < ACC_NUM_TMPL; template_idx++) { value = 0; address = HWPfQmgrGrpTmplateReg4Indx - + ACC101_BYTES_IN_WORD * template_idx; - acc100_reg_write(d, address, value); + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); } /* 4GUL */ int numQgs = conf->q_ul_4g.num_qgroups; @@ -5117,8 +4407,8 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) template_idx <= ACC101_SIG_UL_4G_LAST; template_idx++) { address = HWPfQmgrGrpTmplateReg4Indx - + ACC101_BYTES_IN_WORD * template_idx; - acc100_reg_write(d, address, value); + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); } /* 5GUL */ numQqsAcc += numQgs; @@ -5132,15 +4422,15 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) template_idx++) { /* Check engine power-on status */ address = HwPfFecUl5gIbDebugReg + - ACC101_ENGINE_OFFSET * template_idx; - status = (acc100_reg_read(d, address) >> 4) & 0xF; + ACC_ENGINE_OFFSET * template_idx; + status = (acc_reg_read(d, address) >> 4) & 0xF; address = HWPfQmgrGrpTmplateReg4Indx - + ACC101_BYTES_IN_WORD * template_idx; + + ACC_BYTES_IN_WORD * template_idx; if (status == 1) { - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); numEngines++; } else - acc100_reg_write(d, address, 0); + acc_reg_write(d, address, 0); } printf("Number of 5GUL engines %d\n", numEngines); /* 4GDL */ @@ -5153,8 +4443,8 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) template_idx <= ACC101_SIG_DL_4G_LAST; template_idx++) { address = HWPfQmgrGrpTmplateReg4Indx - + ACC101_BYTES_IN_WORD * template_idx; - acc100_reg_write(d, address, value); + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); } /* 5GDL */ numQqsAcc += numQgs; @@ -5166,8 +4456,8 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) template_idx <= ACC101_SIG_DL_5G_LAST; template_idx++) { address = HWPfQmgrGrpTmplateReg4Indx - + ACC101_BYTES_IN_WORD * template_idx; - acc100_reg_write(d, address, value); + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); } /* Queue Group Function mapping */ @@ -5178,14 +4468,14 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) acc = accFromQgid(qg_idx, conf); value |= qman_func_id[acc]<<(qg_idx * 4); } - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Configuration of the Arbitration QGroup depth to 1 */ for (qg_idx = 0; qg_idx < totalQgs; qg_idx++) { address = HWPfQmgrArbQDepthGrp + - ACC101_BYTES_IN_WORD * qg_idx; + ACC_BYTES_IN_WORD * qg_idx; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); } /* Enabling AQueues through the Queue hierarchy*/ @@ -5196,9 +4486,9 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) qg_idx < totalQgs) value = (1 << aqNum(qg_idx, conf)) - 1; address = HWPfQmgrAqEnableVf - + vf_idx * ACC101_BYTES_IN_WORD; + + vf_idx * ACC_BYTES_IN_WORD; value += (qg_idx << 16); - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); } } @@ -5207,10 +4497,10 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) for (qg_idx = 0; qg_idx < totalQgs; qg_idx++) { for (vf_idx = 0; vf_idx < conf->num_vf_bundles; vf_idx++) { address = HWPfQmgrVfBaseAddr + vf_idx - * ACC101_BYTES_IN_WORD + qg_idx - * ACC101_BYTES_IN_WORD * 64; + * ACC_BYTES_IN_WORD + qg_idx + * ACC_BYTES_IN_WORD * 64; value = aram_address; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Offset ARAM Address for next memory bank * - increment of 4B */ @@ -5228,32 +4518,32 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) /* ==== HI Configuration ==== */ /* No Info Ring/MSI by default */ - acc100_reg_write(d, HWPfHiInfoRingIntWrEnRegPf, 0); - acc100_reg_write(d, HWPfHiInfoRingVf2pfLoWrEnReg, 0); - acc100_reg_write(d, HWPfHiCfgMsiIntWrEnRegPf, 0xFFFFFFFF); - acc100_reg_write(d, HWPfHiCfgMsiVf2pfLoWrEnReg, 0xFFFFFFFF); + acc_reg_write(d, HWPfHiInfoRingIntWrEnRegPf, 0); + acc_reg_write(d, HWPfHiInfoRingVf2pfLoWrEnReg, 0); + acc_reg_write(d, HWPfHiCfgMsiIntWrEnRegPf, 0xFFFFFFFF); + acc_reg_write(d, HWPfHiCfgMsiVf2pfLoWrEnReg, 0xFFFFFFFF); /* Prevent Block on Transmit Error */ address = HWPfHiBlockTransmitOnErrorEn; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Prevents to drop MSI */ address = HWPfHiMsiDropEnableReg; value = 0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* Set the PF Mode register */ address = HWPfHiPfMode; - value = (conf->pf_mode_en) ? ACC101_PF_VAL : 0; - acc100_reg_write(d, address, value); + value = (conf->pf_mode_en) ? ACC_PF_VAL : 0; + acc_reg_write(d, address, value); /* Explicitly releasing AXI after PF Mode and 2 ms */ usleep(2000); - acc100_reg_write(d, HWPfDmaAxiControl, 1); + acc_reg_write(d, HWPfDmaAxiControl, 1); /* QoS overflow init */ value = 1; address = HWPfQosmonAEvalOverflow0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); address = HWPfQosmonBEvalOverflow0; - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); /* HARQ DDR Configuration */ unsigned int ddrSizeInMb = ACC101_HARQ_DDR; @@ -5262,16 +4552,16 @@ static int acc100_pci_remove(struct rte_pci_device *pci_dev) * 0x10; value = ((vf_idx * (ddrSizeInMb / 64)) << 16) + (ddrSizeInMb - 1); - acc100_reg_write(d, address, value); + acc_reg_write(d, address, value); } - usleep(ACC101_LONG_WAIT); + usleep(ACC_LONG_WAIT); rte_bbdev_log_debug("PF TIP configuration complete for %s", dev_name); return 0; } int -rte_acc10x_configure(const char *dev_name, struct rte_acc100_conf *conf) +rte_acc10x_configure(const char *dev_name, struct rte_acc_conf *conf) { struct rte_bbdev *bbdev = rte_bbdev_get_named_dev(dev_name); if (bbdev == NULL) { diff --git a/drivers/baseband/acc100/rte_acc_common_cfg.h b/drivers/baseband/acc100/rte_acc_common_cfg.h new file mode 100644 index 0000000..8292ef4 --- /dev/null +++ b/drivers/baseband/acc100/rte_acc_common_cfg.h @@ -0,0 +1,101 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Intel Corporation + */ + +#ifndef _RTE_ACC_COMMON_CFG_H_ +#define _RTE_ACC_COMMON_CFG_H_ + +/** + * @file rte_acc100_cfg.h + * + * Functions for configuring ACC100 HW, exposed directly to applications. + * Configuration related to encoding/decoding is done through the + * librte_bbdev library. + * + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + */ + +#include <stdint.h> +#include <stdbool.h> + +#ifdef __cplusplus +extern "C" { +#endif + +/**< Number of Virtual Functions ACC300 supports */ +#define RTE_ACC_NUM_VFS 64 + +/** + * Definition of Queue Topology for ACC300 Configuration + * Some level of details is abstracted out to expose a clean interface + * given that comprehensive flexibility is not required + */ +struct rte_acc_queue_topology { + /** Number of QGroups in incremental order of priority */ + uint16_t num_qgroups; + /** + * All QGroups have the same number of AQs here. + * Note : Could be made a 16-array if more flexibility is really + * required + */ + uint16_t num_aqs_per_groups; + /** + * Depth of the AQs is the same of all QGroups here. Log2 Enum : 2^N + * Note : Could be made a 16-array if more flexibility is really + * required + */ + uint16_t aq_depth_log2; + /** + * Index of the first Queue Group Index - assuming contiguity + * Initialized as -1 + */ + int8_t first_qgroup_index; +}; + +/** + * Definition of Arbitration related parameters for ACC300 Configuration + */ +struct rte_acc_arbitration { + /** Default Weight for VF Fairness Arbitration */ + uint16_t round_robin_weight; + uint32_t gbr_threshold1; /**< Guaranteed Bitrate Threshold 1 */ + uint32_t gbr_threshold2; /**< Guaranteed Bitrate Threshold 2 */ +}; + +/** + * Structure to pass ACC300 configuration. + * Note: all VF Bundles will have the same configuration. + */ +struct rte_acc_conf { + bool pf_mode_en; /**< 1 if PF is used for dataplane, 0 for VFs */ + /** 1 if input '1' bit is represented by a positive LLR value, 0 if '1' + * bit is represented by a negative value. + */ + bool input_pos_llr_1_bit; + /** 1 if output '1' bit is represented by a positive value, 0 if '1' + * bit is represented by a negative value. + */ + bool output_pos_llr_1_bit; + uint16_t num_vf_bundles; /**< Number of VF bundles to setup */ + /** Queue topology for each operation type */ + struct rte_acc_queue_topology q_ul_4g; + struct rte_acc_queue_topology q_dl_4g; + struct rte_acc_queue_topology q_ul_5g; + struct rte_acc_queue_topology q_dl_5g; + struct rte_acc_queue_topology q_fft; + struct rte_acc_queue_topology q_mld; + /** Arbitration configuration for each operation type */ + struct rte_acc_arbitration arb_ul_4g[RTE_ACC_NUM_VFS]; + struct rte_acc_arbitration arb_dl_4g[RTE_ACC_NUM_VFS]; + struct rte_acc_arbitration arb_ul_5g[RTE_ACC_NUM_VFS]; + struct rte_acc_arbitration arb_dl_5g[RTE_ACC_NUM_VFS]; + struct rte_acc_arbitration arb_fft[RTE_ACC_NUM_VFS]; + struct rte_acc_arbitration arb_mld[RTE_ACC_NUM_VFS]; +}; + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_ACC_COMMON_CFG_H_ */ -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 01/11] baseband/acc100: refactory to segregate common code 2022-09-12 1:08 ` [PATCH v2 01/11] baseband/acc100: refactory to segregate common code Nic Chautru @ 2022-09-12 15:19 ` Bruce Richardson 0 siblings, 0 replies; 50+ messages in thread From: Bruce Richardson @ 2022-09-12 15:19 UTC (permalink / raw) To: Nic Chautru Cc: dev, thomas, gakhil, hemant.agrawal, maxime.coquelin, trix, mdr, david.marchand, stephen, hernan.vargas On Sun, Sep 11, 2022 at 06:08:48PM -0700, Nic Chautru wrote: > Refactoring all shareable common code to be used by future PMD > (including ACC200 it his serie as well as taking into account > following PMDs in roadmap) by gathering such structures or inline methods. > Cleaning up the enum files to remove un-used registers definitions. > No functionality change. > s/refactory/refactor/ Good to see this cleanup to reduce future code duplication. > Signed-off-by: Nic Chautru <nicolas.chautru@intel.com> > --- Acked-by: Bruce Richardson <bruce.richardson@intel.com> ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 02/11] baseband/acc200: introduce PMD for ACC200 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru 2022-09-12 1:08 ` [PATCH v2 01/11] baseband/acc100: refactory to segregate common code Nic Chautru @ 2022-09-12 1:08 ` Nic Chautru 2022-09-12 15:41 ` Bruce Richardson 2022-09-12 1:08 ` [PATCH v2 03/11] baseband/acc200: add HW register definitions Nic Chautru ` (8 subsequent siblings) 10 siblings, 1 reply; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nicolas Chautru From: Nicolas Chautru <nicolas.chautru@intel.com> This patch introduce stubs for device driver for the ACC200 integrated VRAN accelerator on SPR-EEC Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- MAINTAINERS | 3 + doc/guides/bbdevs/acc200.rst | 244 +++++++++++++++++++++++++++++++ doc/guides/bbdevs/index.rst | 1 + drivers/baseband/acc200/acc200_pmd.h | 32 ++++ drivers/baseband/acc200/meson.build | 6 + drivers/baseband/acc200/rte_acc200_pmd.c | 142 ++++++++++++++++++ drivers/baseband/acc200/version.map | 3 + drivers/baseband/meson.build | 1 + 8 files changed, 432 insertions(+) create mode 100644 doc/guides/bbdevs/acc200.rst create mode 100644 drivers/baseband/acc200/acc200_pmd.h create mode 100644 drivers/baseband/acc200/meson.build create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c create mode 100644 drivers/baseband/acc200/version.map diff --git a/MAINTAINERS b/MAINTAINERS index 32ffdd1..2abaf45 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1340,6 +1340,9 @@ F: drivers/baseband/acc100/ F: doc/guides/bbdevs/acc100.rst F: doc/guides/bbdevs/features/acc100.ini F: doc/guides/bbdevs/features/acc101.ini +F: drivers/baseband/acc200/ +F: doc/guides/bbdevs/acc200.rst +F: doc/guides/bbdevs/features/acc200.ini Null baseband M: Nicolas Chautru <nicolas.chautru@intel.com> diff --git a/doc/guides/bbdevs/acc200.rst b/doc/guides/bbdevs/acc200.rst new file mode 100644 index 0000000..c5e1bd7 --- /dev/null +++ b/doc/guides/bbdevs/acc200.rst @@ -0,0 +1,244 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2022 Intel Corporation + +Intel(R) ACC200 vRAN Dedicated Accelerator Poll Mode Driver +=========================================================== + +The Intel® vRAN Dedicated Accelerator ACC200 peripheral enables cost-effective 4G +and 5G next-generation virtualized Radio Access Network (vRAN) solutions integrated on +Sapphire Rapids EEC Intel(R)7 based Xeon(R) multi-core Serverprocessor. + +Features +-------- + +The ACC200 includes a 5G Low Density Parity Check (LDPC) encoder/decoder, rate match/dematch, +Hybrid Automatic Repeat Request (HARQ) with access to DDR memory for buffer management, a 4G +Turbo encoder/decoder, a Fast Fourier Transform (FFT) block providing DFT/iDFT processing offload +for the 5G Sounding Reference Signal (SRS), a Queue Manager (QMGR), and a DMA subsystem. +There is no dedicated on-card memory for HARQ, this is using coherent memory on the CPU side. + +These correspond to the following features exposed by the PMD: + +- LDPC Encode in the Downlink (5GNR) +- LDPC Decode in the Uplink (5GNR) +- Turbo Encode in the Downlink (4G) +- Turbo Decode in the Uplink (4G) +- FFT processing +- SR-IOV with 16 VFs per PF +- Maximum of 256 queues per VF +- MSI + +ACC200 PMD supports the following BBDEV capabilities: + +* For the LDPC encode operation: + - ``RTE_BBDEV_LDPC_CRC_24B_ATTACH`` : set to attach CRC24B to CB(s) + - ``RTE_BBDEV_LDPC_RATE_MATCH`` : if set then do not do Rate Match bypass + - ``RTE_BBDEV_LDPC_INTERLEAVER_BYPASS`` : if set then bypass interleaver + +* For the LDPC decode operation: + - ``RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK`` : check CRC24B from CB(s) + - ``RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP`` : drops CRC24B bits appended while decoding + - ``RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK`` : check CRC24A from CB(s) + - ``RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK`` : check CRC16 from CB(s) + - ``RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE`` : provides an input for HARQ combining + - ``RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE`` : provides an input for HARQ combining + - ``RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE`` : disable early termination + - ``RTE_BBDEV_LDPC_DEC_SCATTER_GATHER`` : supports scatter-gather for input/output data + - ``RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION`` : supports compression of the HARQ input/output + - ``RTE_BBDEV_LDPC_LLR_COMPRESSION`` : supports LLR input compression + +* For the turbo encode operation: + - ``RTE_BBDEV_TURBO_CRC_24B_ATTACH`` : set to attach CRC24B to CB(s) + - ``RTE_BBDEV_TURBO_RATE_MATCH`` : if set then do not do Rate Match bypass + - ``RTE_BBDEV_TURBO_ENC_INTERRUPTS`` : set for encoder dequeue interrupts + - ``RTE_BBDEV_TURBO_RV_INDEX_BYPASS`` : set to bypass RV index + - ``RTE_BBDEV_TURBO_ENC_SCATTER_GATHER`` : supports scatter-gather for input/output data + +* For the turbo decode operation: + - ``RTE_BBDEV_TURBO_CRC_TYPE_24B`` : check CRC24B from CB(s) + - ``RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE`` : perform subblock de-interleave + - ``RTE_BBDEV_TURBO_DEC_INTERRUPTS`` : set for decoder dequeue interrupts + - ``RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN`` : set if negative LLR input is supported + - ``RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP`` : keep CRC24B bits appended while decoding + - ``RTE_BBDEV_TURBO_DEC_CRC_24B_DROP`` : option to drop the code block CRC after decoding + - ``RTE_BBDEV_TURBO_EARLY_TERMINATION`` : set early termination feature + - ``RTE_BBDEV_TURBO_DEC_SCATTER_GATHER`` : supports scatter-gather for input/output data + - ``RTE_BBDEV_TURBO_HALF_ITERATION_EVEN`` : set half iteration granularity + - ``RTE_BBDEV_TURBO_SOFT_OUTPUT`` : set the APP LLR soft output + - ``RTE_BBDEV_TURBO_EQUALIZER`` : set the turbo equalizer feature + - ``RTE_BBDEV_TURBO_SOFT_OUT_SATURATE`` : set the soft output saturation + - ``RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH`` : set to run an extra odd iteration after CRC match + - ``RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT`` : set if negative APP LLR output supported + - ``RTE_BBDEV_TURBO_MAP_DEC`` : supports flexible parallel MAP engine decoding + +Installation +------------ + +Section 3 of the DPDK manual provides instructions on installing and compiling DPDK. + +DPDK requires hugepages to be configured as detailed in section 2 of the DPDK manual. +The bbdev test application has been tested with a configuration 40 x 1GB hugepages. The +hugepage configuration of a server may be examined using: + +.. code-block:: console + + grep Huge* /proc/meminfo + + +Initialization +-------------- + +When the device first powers up, its PCI Physical Functions (PF) can be listed through these +commands for ACC200: + +.. code-block:: console + + sudo lspci -vd8086:57c0 + +The physical and virtual functions are compatible with Linux UIO drivers: +``vfio`` and ``igb_uio``. However, in order to work the 5G/4G +FEC device first needs to be bound to one of these linux drivers through DPDK. + + +Bind PF UIO driver(s) +~~~~~~~~~~~~~~~~~~~~~ + +Install the DPDK igb_uio driver, bind it with the PF PCI device ID and use +``lspci`` to confirm the PF device is under use by ``igb_uio`` DPDK UIO driver. + +The igb_uio driver may be bound to the PF PCI device using one of two methods for ACC200: + + +1. PCI functions (physical or virtual, depending on the use case) can be bound to +the UIO driver by repeating this command for every function. + +.. code-block:: console + + cd <dpdk-top-level-directory> + insmod ./build/kmod/igb_uio.ko + echo "8086 57c0" > /sys/bus/pci/drivers/igb_uio/new_id + lspci -vd8086:57c0 + + +2. Another way to bind PF with DPDK UIO driver is by using the ``dpdk-devbind.py`` tool + +.. code-block:: console + + cd <dpdk-top-level-directory> + ./usertools/dpdk-devbind.py -b igb_uio 0000:f7:00.0 + +where the PCI device ID (example: 0000:f7:00.0) is obtained using lspci -vd8086:57c0 + + +In a similar way the PF may be bound with vfio-pci as any PCIe device. + + +Enable Virtual Functions +~~~~~~~~~~~~~~~~~~~~~~~~ + +Now, it should be visible in the printouts that PCI PF is under igb_uio control +"``Kernel driver in use: igb_uio``" + +To show the number of available VFs on the device, read ``sriov_totalvfs`` file.. + +.. code-block:: console + + cat /sys/bus/pci/devices/0000\:<b>\:<d>.<f>/sriov_totalvfs + + where 0000\:<b>\:<d>.<f> is the PCI device ID + + +To enable VFs via igb_uio, echo the number of virtual functions intended to +enable to ``max_vfs`` file.. + +.. code-block:: console + + echo <num-of-vfs> > /sys/bus/pci/devices/0000\:<b>\:<d>.<f>/max_vfs + + +Afterwards, all VFs must be bound to appropriate UIO drivers as required, same +way it was done with the physical function previously. + +Enabling SR-IOV via vfio driver is pretty much the same, except that the file +name is different: + +.. code-block:: console + + echo <num-of-vfs> > /sys/bus/pci/devices/0000\:<b>\:<d>.<f>/sriov_numvfs + + +Configure the VFs through PF +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The PCI virtual functions must be configured before working or getting assigned +to VMs/Containers. The configuration involves allocating the number of hardware +queues, priorities, load balance, bandwidth and other settings necessary for the +device to perform FEC functions. + +This configuration needs to be executed at least once after reboot or PCI FLR and can +be achieved by using the functions ``rte_acc200_configure()``, +which sets up the parameters defined in the compatible ``acc200_conf`` structure. + +Test Application +---------------- + +BBDEV provides a test application, ``test-bbdev.py`` and range of test data for testing +the functionality of the device, depending on the device's +capabilities. The test application is located under app->test-bbdev folder and has the +following options: + +.. code-block:: console + + "-p", "--testapp-path": specifies path to the bbdev test app. + "-e", "--eal-params" : EAL arguments which are passed to the test app. + "-t", "--timeout" : Timeout in seconds (default=300). + "-c", "--test-cases" : Defines test cases to run. Run all if not specified. + "-v", "--test-vector" : Test vector path (default=dpdk_path+/app/test-bbdev/test_vectors/bbdev_null.data). + "-n", "--num-ops" : Number of operations to process on device (default=32). + "-b", "--burst-size" : Operations enqueue/dequeue burst size (default=32). + "-s", "--snr" : SNR in dB used when generating LLRs for bler tests. + "-s", "--iter_max" : Number of iterations for LDPC decoder. + "-l", "--num-lcores" : Number of lcores to run (default=16). + "-i", "--init-device" : Initialise PF device with default values. + + +To execute the test application tool using simple decode or encode data, +type one of the following: + +.. code-block:: console + + ./test-bbdev.py -c validation -n 64 -b 1 -v ./ldpc_dec_default.data + ./test-bbdev.py -c validation -n 64 -b 1 -v ./ldpc_enc_default.data + + +The test application ``test-bbdev.py``, supports the ability to configure the PF device with +a default set of values, if the "-i" or "- -init-device" option is included. The default values +are defined in test_bbdev_perf.c. + + +Test Vectors +~~~~~~~~~~~~ + +In addition to the simple LDPC decoder and LDPC encoder tests, bbdev also provides +a range of additional tests under the test_vectors folder, which may be useful. The results +of these tests will depend on the device capabilities which may cause some +testcases to be skipped, but no failure should be reported. + + +Alternate Baseband Device configuration tool +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +On top of the embedded configuration feature supported in test-bbdev using "- -init-device" +option mentioned above, there is also a tool available to perform that device configuration +using a companion application. +The ``pf_bb_config`` application notably enables then to run bbdev-test from the VF +and not only limited to the PF as captured above. + +See for more details: https://github.com/intel/pf-bb-config + +Specifically for the BBDEV ACC200 PMD, the command below can be used: + +.. code-block:: console + + ./pf_bb_config ACC200 -c ./acc200/acc200_config_vf_5g.cfg + ./test-bbdev.py -e="-c 0xff0 -a${VF_PCI_ADDR}" -c validation -n 64 -b 64 -l 1 -v ./ldpc_dec_default.data diff --git a/doc/guides/bbdevs/index.rst b/doc/guides/bbdevs/index.rst index cedd706..4e9dea8 100644 --- a/doc/guides/bbdevs/index.rst +++ b/doc/guides/bbdevs/index.rst @@ -14,4 +14,5 @@ Baseband Device Drivers fpga_lte_fec fpga_5gnr_fec acc100 + acc200 la12xx diff --git a/drivers/baseband/acc200/acc200_pmd.h b/drivers/baseband/acc200/acc200_pmd.h new file mode 100644 index 0000000..626e9fb --- /dev/null +++ b/drivers/baseband/acc200/acc200_pmd.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Intel Corporation + */ + +#ifndef _RTE_ACC200_PMD_H_ +#define _RTE_ACC200_PMD_H_ + +#include "../acc100/acc_common.h" + +/* Helper macro for logging */ +#define rte_bbdev_log(level, fmt, ...) \ + rte_log(RTE_LOG_ ## level, acc200_logtype, fmt "\n", \ + ##__VA_ARGS__) + +#ifdef RTE_LIBRTE_BBDEV_DEBUG +#define rte_bbdev_log_debug(fmt, ...) \ + rte_bbdev_log(DEBUG, "acc200_pmd: " fmt, \ + ##__VA_ARGS__) +#else +#define rte_bbdev_log_debug(fmt, ...) +#endif + +/* ACC200 PF and VF driver names */ +#define ACC200PF_DRIVER_NAME intel_acc200_pf +#define ACC200VF_DRIVER_NAME intel_acc200_vf + +/* ACC200 PCI vendor & device IDs */ +#define RTE_ACC200_VENDOR_ID (0x8086) +#define RTE_ACC200_PF_DEVICE_ID (0x57C0) +#define RTE_ACC200_VF_DEVICE_ID (0x57C1) + +#endif /* _RTE_ACC200_PMD_H_ */ diff --git a/drivers/baseband/acc200/meson.build b/drivers/baseband/acc200/meson.build new file mode 100644 index 0000000..7ec8679 --- /dev/null +++ b/drivers/baseband/acc200/meson.build @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2022 Intel Corporation + +deps += ['bbdev', 'bus_vdev', 'ring', 'pci', 'bus_pci'] + +sources = files('rte_acc200_pmd.c') diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c new file mode 100644 index 0000000..db8b641 --- /dev/null +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -0,0 +1,142 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Intel Corporation + */ + +#include <unistd.h> + +#include <rte_common.h> +#include <rte_log.h> +#include <rte_dev.h> +#include <rte_malloc.h> +#include <rte_mempool.h> +#include <rte_byteorder.h> +#include <rte_errno.h> +#include <rte_branch_prediction.h> +#include <rte_hexdump.h> +#include <rte_pci.h> +#include <rte_bus_pci.h> +#ifdef RTE_BBDEV_OFFLOAD_COST +#include <rte_cycles.h> +#endif + +#include <rte_bbdev.h> +#include <rte_bbdev_pmd.h> +#include "acc200_pmd.h" + +#ifdef RTE_LIBRTE_BBDEV_DEBUG +RTE_LOG_REGISTER_DEFAULT(acc200_logtype, DEBUG); +#else +RTE_LOG_REGISTER_DEFAULT(acc200_logtype, NOTICE); +#endif + +/* Free memory used for software rings */ +static int +acc200_dev_close(struct rte_bbdev *dev) +{ + RTE_SET_USED(dev); + return 0; +} + + +static const struct rte_bbdev_ops acc200_bbdev_ops = { + .close = acc200_dev_close, +}; + +/* ACC200 PCI PF address map */ +static struct rte_pci_id pci_id_acc200_pf_map[] = { + { + RTE_PCI_DEVICE(RTE_ACC200_VENDOR_ID, RTE_ACC200_PF_DEVICE_ID) + }, + {.device_id = 0}, +}; + +/* ACC200 PCI VF address map */ +static struct rte_pci_id pci_id_acc200_vf_map[] = { + { + RTE_PCI_DEVICE(RTE_ACC200_VENDOR_ID, RTE_ACC200_VF_DEVICE_ID) + }, + {.device_id = 0}, +}; + +/* Initialization Function */ +static void +acc200_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv) +{ + struct rte_pci_device *pci_dev = RTE_DEV_TO_PCI(dev->device); + + dev->dev_ops = &acc200_bbdev_ops; + + ((struct acc_device *) dev->data->dev_private)->pf_device = + !strcmp(drv->driver.name, + RTE_STR(ACC200PF_DRIVER_NAME)); + ((struct acc_device *) dev->data->dev_private)->mmio_base = + pci_dev->mem_resource[0].addr; + + rte_bbdev_log_debug("Init device %s [%s] @ vaddr %p paddr %#"PRIx64"", + drv->driver.name, dev->data->name, + (void *)pci_dev->mem_resource[0].addr, + pci_dev->mem_resource[0].phys_addr); +} + +static int acc200_pci_probe(struct rte_pci_driver *pci_drv, + struct rte_pci_device *pci_dev) +{ + struct rte_bbdev *bbdev = NULL; + char dev_name[RTE_BBDEV_NAME_MAX_LEN]; + + if (pci_dev == NULL) { + rte_bbdev_log(ERR, "NULL PCI device"); + return -EINVAL; + } + + rte_pci_device_name(&pci_dev->addr, dev_name, sizeof(dev_name)); + + /* Allocate memory to be used privately by drivers */ + bbdev = rte_bbdev_allocate(pci_dev->device.name); + if (bbdev == NULL) + return -ENODEV; + + /* allocate device private memory */ + bbdev->data->dev_private = rte_zmalloc_socket(dev_name, + sizeof(struct acc_device), RTE_CACHE_LINE_SIZE, + pci_dev->device.numa_node); + + if (bbdev->data->dev_private == NULL) { + rte_bbdev_log(CRIT, + "Allocate of %zu bytes for device \"%s\" failed", + sizeof(struct acc_device), dev_name); + rte_bbdev_release(bbdev); + return -ENOMEM; + } + + /* Fill HW specific part of device structure */ + bbdev->device = &pci_dev->device; + bbdev->intr_handle = pci_dev->intr_handle; + bbdev->data->socket_id = pci_dev->device.numa_node; + + /* Invoke ACC200 device initialization function */ + acc200_bbdev_init(bbdev, pci_drv); + + rte_bbdev_log_debug("Initialised bbdev %s (id = %u)", + dev_name, bbdev->data->dev_id); + return 0; +} + +static struct rte_pci_driver acc200_pci_pf_driver = { + .probe = acc200_pci_probe, + .remove = acc_pci_remove, + .id_table = pci_id_acc200_pf_map, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING +}; + +static struct rte_pci_driver acc200_pci_vf_driver = { + .probe = acc200_pci_probe, + .remove = acc_pci_remove, + .id_table = pci_id_acc200_vf_map, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING +}; + +RTE_PMD_REGISTER_PCI(ACC200PF_DRIVER_NAME, acc200_pci_pf_driver); +RTE_PMD_REGISTER_PCI_TABLE(ACC200PF_DRIVER_NAME, pci_id_acc200_pf_map); +RTE_PMD_REGISTER_PCI(ACC200VF_DRIVER_NAME, acc200_pci_vf_driver); +RTE_PMD_REGISTER_PCI_TABLE(ACC200VF_DRIVER_NAME, pci_id_acc200_vf_map); diff --git a/drivers/baseband/acc200/version.map b/drivers/baseband/acc200/version.map new file mode 100644 index 0000000..c2e0723 --- /dev/null +++ b/drivers/baseband/acc200/version.map @@ -0,0 +1,3 @@ +DPDK_22 { + local: *; +}; diff --git a/drivers/baseband/meson.build b/drivers/baseband/meson.build index 686e98b..343f83a 100644 --- a/drivers/baseband/meson.build +++ b/drivers/baseband/meson.build @@ -7,6 +7,7 @@ endif drivers = [ 'acc100', + 'acc200', 'fpga_5gnr_fec', 'fpga_lte_fec', 'la12xx', -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 02/11] baseband/acc200: introduce PMD for ACC200 2022-09-12 1:08 ` [PATCH v2 02/11] baseband/acc200: introduce PMD for ACC200 Nic Chautru @ 2022-09-12 15:41 ` Bruce Richardson 0 siblings, 0 replies; 50+ messages in thread From: Bruce Richardson @ 2022-09-12 15:41 UTC (permalink / raw) To: Nic Chautru Cc: dev, thomas, gakhil, hemant.agrawal, maxime.coquelin, trix, mdr, david.marchand, stephen, hernan.vargas On Sun, Sep 11, 2022 at 06:08:49PM -0700, Nic Chautru wrote: > From: Nicolas Chautru <nicolas.chautru@intel.com> > > This patch introduce stubs for device driver for the ACC200 > integrated VRAN accelerator on SPR-EEC > > Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> > --- > MAINTAINERS | 3 + > doc/guides/bbdevs/acc200.rst | 244 +++++++++++++++++++++++++++++++ > doc/guides/bbdevs/index.rst | 1 + > drivers/baseband/acc200/acc200_pmd.h | 32 ++++ > drivers/baseband/acc200/meson.build | 6 + > drivers/baseband/acc200/rte_acc200_pmd.c | 142 ++++++++++++++++++ > drivers/baseband/acc200/version.map | 3 + > drivers/baseband/meson.build | 1 + > 8 files changed, 432 insertions(+) > create mode 100644 doc/guides/bbdevs/acc200.rst > create mode 100644 drivers/baseband/acc200/acc200_pmd.h > create mode 100644 drivers/baseband/acc200/meson.build > create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c > create mode 100644 drivers/baseband/acc200/version.map > <snip> > diff --git a/drivers/baseband/acc200/meson.build b/drivers/baseband/acc200/meson.build > new file mode 100644 > index 0000000..7ec8679 > --- /dev/null > +++ b/drivers/baseband/acc200/meson.build > @@ -0,0 +1,6 @@ > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2022 Intel Corporation > + > +deps += ['bbdev', 'bus_vdev', 'ring', 'pci', 'bus_pci'] > + Does this really depend on both bus_vdev and bus_pci? Ideally, I think that drivers/baseband/meson.build should probably have the line "std_deps = ['bbdev']" to pull in that as a dependency for all baseband drivers. Based off some quick testing, I got this driver to build with just "deps += ['bbdev', 'bus_pci']". Though, again, I think these probably should be standard deps for all bbdevs. > +sources = files('rte_acc200_pmd.c') Given that the driver is using shared headers with the acc100 codebase, you might want to consider putting in "includes += include_directories('../acc100')" in the meson.build file. It saves you having to manually specify the full path to all these shared headers, and gives you only one place to update things if those headers ever move elsewhere. /Bruce ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 03/11] baseband/acc200: add HW register definitions 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru 2022-09-12 1:08 ` [PATCH v2 01/11] baseband/acc100: refactory to segregate common code Nic Chautru 2022-09-12 1:08 ` [PATCH v2 02/11] baseband/acc200: introduce PMD for ACC200 Nic Chautru @ 2022-09-12 1:08 ` Nic Chautru 2022-09-12 1:08 ` [PATCH v2 04/11] baseband/acc200: add info get function Nic Chautru ` (7 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nicolas Chautru From: Nicolas Chautru <nicolas.chautru@intel.com> Add registers list and structure to access the device. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/acc200_pf_enum.h | 108 ++++++++++++++++++++ drivers/baseband/acc200/acc200_pmd.h | 163 +++++++++++++++++++++++++++++++ drivers/baseband/acc200/acc200_vf_enum.h | 83 ++++++++++++++++ drivers/baseband/acc200/rte_acc200_pmd.c | 2 + 4 files changed, 356 insertions(+) create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h diff --git a/drivers/baseband/acc200/acc200_pf_enum.h b/drivers/baseband/acc200/acc200_pf_enum.h new file mode 100644 index 0000000..e52d8f5 --- /dev/null +++ b/drivers/baseband/acc200/acc200_pf_enum.h @@ -0,0 +1,108 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#ifndef ACC200_PF_ENUM_H +#define ACC200_PF_ENUM_H + +/* + * ACC200 Register mapping on PF BAR0 + * This is automatically generated from RDL, format may change with new RDL + * Release. + * Variable names are as is + */ +enum { + HWPfQmgrEgressQueuesTemplate = 0x0007FC00, + HWPfQmgrIngressAq = 0x00080000, + HWPfQmgrDepthLog2Grp = 0x00A00200, + HWPfQmgrTholdGrp = 0x00A00300, + HWPfQmgrGrpTmplateReg0Indx = 0x00A00600, + HWPfQmgrGrpTmplateReg1Indx = 0x00A00700, + HWPfQmgrGrpTmplateReg2indx = 0x00A00800, + HWPfQmgrGrpTmplateReg3Indx = 0x00A00900, + HWPfQmgrGrpTmplateReg4Indx = 0x00A00A00, + HWPfQmgrVfBaseAddr = 0x00A01000, + HWPfQmgrArbQDepthGrp = 0x00A02F00, + HWPfQmgrGrpFunction0 = 0x00A02F40, + HWPfQmgrGrpFunction1 = 0x00A02F44, + HWPfQmgrGrpPriority = 0x00A02F48, + HWPfQmgrAqEnableVf = 0x00A10000, + HWPfQmgrRingSizeVf = 0x00A20004, + HWPfQmgrGrpDepthLog20Vf = 0x00A20008, + HWPfQmgrGrpDepthLog21Vf = 0x00A2000C, + HWPfFabricM2iBufferReg = 0x00B30000, + HWPfFabricI2Mdma_weight = 0x00B31044, + HwPfFecUl5gIbDebugReg = 0x00B40200, + HWPfFftConfig0 = 0x00B58004, + HWPfFftRamPageAccess = 0x00B5800C, + HWPfFftRamOff = 0x00B58800, + HWPfDmaConfig0Reg = 0x00B80000, + HWPfDmaConfig1Reg = 0x00B80004, + HWPfDmaQmgrAddrReg = 0x00B80008, + HWPfDmaAxcacheReg = 0x00B80010, + HWPfDmaAxiControl = 0x00B8002C, + HWPfDmaQmanen = 0x00B80040, + HWPfDma4gdlIbThld = 0x00B800CC, + HWPfDmaCfgRrespBresp = 0x00B80814, + HWPfDmaDescriptorSignatuture = 0x00B80868, + HWPfDmaErrorDetectionEn = 0x00B80870, + HWPfDmaFec5GulDescBaseLoRegVf = 0x00B88020, + HWPfDmaFec5GulDescBaseHiRegVf = 0x00B88024, + HWPfDmaFec5GulRespPtrLoRegVf = 0x00B88028, + HWPfDmaFec5GulRespPtrHiRegVf = 0x00B8802C, + HWPfDmaFec5GdlDescBaseLoRegVf = 0x00B88040, + HWPfDmaFec5GdlDescBaseHiRegVf = 0x00B88044, + HWPfDmaFec5GdlRespPtrLoRegVf = 0x00B88048, + HWPfDmaFec5GdlRespPtrHiRegVf = 0x00B8804C, + HWPfDmaFec4GulDescBaseLoRegVf = 0x00B88060, + HWPfDmaFec4GulDescBaseHiRegVf = 0x00B88064, + HWPfDmaFec4GulRespPtrLoRegVf = 0x00B88068, + HWPfDmaFec4GulRespPtrHiRegVf = 0x00B8806C, + HWPfDmaFec4GdlDescBaseLoRegVf = 0x00B88080, + HWPfDmaFec4GdlDescBaseHiRegVf = 0x00B88084, + HWPfDmaFec4GdlRespPtrLoRegVf = 0x00B88088, + HWPfDmaFec4GdlRespPtrHiRegVf = 0x00B8808C, + HWPDmaFftDescBaseLoRegVf = 0x00B880A0, + HWPDmaFftDescBaseHiRegVf = 0x00B880A4, + HWPDmaFftRespPtrLoRegVf = 0x00B880A8, + HWPDmaFftRespPtrHiRegVf = 0x00B880AC, + HWPfQosmonAEvalOverflow0 = 0x00B90008, + HWPfPermonACntrlRegVf = 0x00B98000, + HWPfQosmonBEvalOverflow0 = 0x00BA0008, + HWPfPermonBCntrlRegVf = 0x00BA8000, + HWPfPermonCCntrlRegVf = 0x00BB8000, + HWPfHiInfoRingBaseLoRegPf = 0x00C84014, + HWPfHiInfoRingBaseHiRegPf = 0x00C84018, + HWPfHiInfoRingPointerRegPf = 0x00C8401C, + HWPfHiInfoRingIntWrEnRegPf = 0x00C84020, + HWPfHiBlockTransmitOnErrorEn = 0x00C84038, + HWPfHiCfgMsiIntWrEnRegPf = 0x00C84040, + HWPfHiMsixVectorMapperPf = 0x00C84060, + HWPfHiPfMode = 0x00C84108, + HWPfHiClkGateHystReg = 0x00C8410C, + HWPfHiMsiDropEnableReg = 0x00C84114, + HWPfHiSectionPowerGatingReq = 0x00C84128, + HWPfHiSectionPowerGatingAck = 0x00C8412C, +}; + +/* TIP PF Interrupt numbers */ +enum { + ACC200_PF_INT_QMGR_AQ_OVERFLOW = 0, + ACC200_PF_INT_DOORBELL_VF_2_PF = 1, + ACC200_PF_INT_ILLEGAL_FORMAT = 2, + ACC200_PF_INT_QMGR_DISABLED_ACCESS = 3, + ACC200_PF_INT_QMGR_AQ_OVERTHRESHOLD = 4, + ACC200_PF_INT_DMA_DL_DESC_IRQ = 5, + ACC200_PF_INT_DMA_UL_DESC_IRQ = 6, + ACC200_PF_INT_DMA_FFT_DESC_IRQ = 7, + ACC200_PF_INT_DMA_UL5G_DESC_IRQ = 8, + ACC200_PF_INT_DMA_DL5G_DESC_IRQ = 9, + ACC200_PF_INT_DMA_MLD_DESC_IRQ = 10, + ACC200_PF_INT_ARAM_ECC_1BIT_ERR = 11, + ACC200_PF_INT_PARITY_ERR = 12, + ACC200_PF_INT_QMGR_ERR = 13, + ACC200_PF_INT_INT_REQ_OVERFLOW = 14, + ACC200_PF_INT_APB_TIMEOUT = 15, +}; + +#endif /* ACC200_PF_ENUM_H */ diff --git a/drivers/baseband/acc200/acc200_pmd.h b/drivers/baseband/acc200/acc200_pmd.h index 626e9fb..57b7e63 100644 --- a/drivers/baseband/acc200/acc200_pmd.h +++ b/drivers/baseband/acc200/acc200_pmd.h @@ -6,6 +6,8 @@ #define _RTE_ACC200_PMD_H_ #include "../acc100/acc_common.h" +#include "acc200_pf_enum.h" +#include "acc200_vf_enum.h" /* Helper macro for logging */ #define rte_bbdev_log(level, fmt, ...) \ @@ -29,4 +31,165 @@ #define RTE_ACC200_PF_DEVICE_ID (0x57C0) #define RTE_ACC200_VF_DEVICE_ID (0x57C1) +#define ACC200_MAX_PF_MSIX (256+32) +#define ACC200_MAX_VF_MSIX (256+7) + +/* Values used in writing to the registers */ +#define ACC200_REG_IRQ_EN_ALL 0x1FF83FF /* Enable all interrupts */ + +/* Number of Virtual Functions ACC200 supports */ +#define ACC200_NUM_VFS 16 +#define ACC200_NUM_QGRPS 16 +#define ACC200_NUM_AQS 16 + +#define ACC200_GRP_ID_SHIFT 10 /* Queue Index Hierarchy */ +#define ACC200_VF_ID_SHIFT 4 /* Queue Index Hierarchy */ +#define ACC200_WORDS_IN_ARAM_SIZE (256 * 1024 / 4) + +/* Mapping of signals for the available engines */ +#define ACC200_SIG_UL_5G 0 +#define ACC200_SIG_UL_5G_LAST 4 +#define ACC200_SIG_DL_5G 10 +#define ACC200_SIG_DL_5G_LAST 11 +#define ACC200_SIG_UL_4G 12 +#define ACC200_SIG_UL_4G_LAST 16 +#define ACC200_SIG_DL_4G 21 +#define ACC200_SIG_DL_4G_LAST 23 +#define ACC200_SIG_FFT 24 +#define ACC200_SIG_FFT_LAST 24 + +#define ACC200_NUM_ACCS 5 /* FIXMEFFT */ + +/* ACC200 Configuration */ +#define ACC200_FABRIC_MODE 0x8000103 +#define ACC200_CFG_DMA_ERROR 0x3DF +#define ACC200_CFG_AXI_CACHE 0x11 +#define ACC200_CFG_QMGR_HI_P 0x0F0F +#define ACC200_RESET_HARD 0x1FF +#define ACC200_ENGINES_MAX 9 +#define ACC200_GPEX_AXIMAP_NUM 17 +#define ACC200_CLOCK_GATING_EN 0x30000 +#define ACC200_FFT_CFG_0 0x2001 +#define ACC200_FFT_RAM_EN 0x80008000 +#define ACC200_FFT_RAM_DIS 0x0 +#define ACC200_FFT_RAM_SIZE 512 +#define ACC200_CLK_EN 0x00010A01 +#define ACC200_CLK_DIS 0x01F10A01 +#define ACC200_PG_MASK_0 0x1F +#define ACC200_PG_MASK_1 0xF +#define ACC200_PG_MASK_2 0x1 +#define ACC200_PG_MASK_3 0x0 +#define ACC200_PG_MASK_FFT 1 +#define ACC200_PG_MASK_4GUL 4 +#define ACC200_PG_MASK_5GUL 8 +#define ACC200_STATUS_WAIT 10 +#define ACC200_STATUS_TO 100 + +struct acc200_registry_addr { + unsigned int dma_ring_dl5g_hi; + unsigned int dma_ring_dl5g_lo; + unsigned int dma_ring_ul5g_hi; + unsigned int dma_ring_ul5g_lo; + unsigned int dma_ring_dl4g_hi; + unsigned int dma_ring_dl4g_lo; + unsigned int dma_ring_ul4g_hi; + unsigned int dma_ring_ul4g_lo; + unsigned int dma_ring_fft_hi; + unsigned int dma_ring_fft_lo; + unsigned int ring_size; + unsigned int info_ring_hi; + unsigned int info_ring_lo; + unsigned int info_ring_en; + unsigned int info_ring_ptr; + unsigned int tail_ptrs_dl5g_hi; + unsigned int tail_ptrs_dl5g_lo; + unsigned int tail_ptrs_ul5g_hi; + unsigned int tail_ptrs_ul5g_lo; + unsigned int tail_ptrs_dl4g_hi; + unsigned int tail_ptrs_dl4g_lo; + unsigned int tail_ptrs_ul4g_hi; + unsigned int tail_ptrs_ul4g_lo; + unsigned int tail_ptrs_fft_hi; + unsigned int tail_ptrs_fft_lo; + unsigned int depth_log0_offset; + unsigned int depth_log1_offset; + unsigned int qman_group_func; + unsigned int hi_mode; + unsigned int pmon_ctrl_a; + unsigned int pmon_ctrl_b; + unsigned int pmon_ctrl_c; +}; + +/* Structure holding registry addresses for PF */ +static const struct acc200_registry_addr pf_reg_addr = { + .dma_ring_dl5g_hi = HWPfDmaFec5GdlDescBaseHiRegVf, + .dma_ring_dl5g_lo = HWPfDmaFec5GdlDescBaseLoRegVf, + .dma_ring_ul5g_hi = HWPfDmaFec5GulDescBaseHiRegVf, + .dma_ring_ul5g_lo = HWPfDmaFec5GulDescBaseLoRegVf, + .dma_ring_dl4g_hi = HWPfDmaFec4GdlDescBaseHiRegVf, + .dma_ring_dl4g_lo = HWPfDmaFec4GdlDescBaseLoRegVf, + .dma_ring_ul4g_hi = HWPfDmaFec4GulDescBaseHiRegVf, + .dma_ring_ul4g_lo = HWPfDmaFec4GulDescBaseLoRegVf, + .dma_ring_fft_hi = HWPDmaFftDescBaseHiRegVf, + .dma_ring_fft_lo = HWPDmaFftDescBaseLoRegVf, + .ring_size = HWPfQmgrRingSizeVf, + .info_ring_hi = HWPfHiInfoRingBaseHiRegPf, + .info_ring_lo = HWPfHiInfoRingBaseLoRegPf, + .info_ring_en = HWPfHiInfoRingIntWrEnRegPf, + .info_ring_ptr = HWPfHiInfoRingPointerRegPf, + .tail_ptrs_dl5g_hi = HWPfDmaFec5GdlRespPtrHiRegVf, + .tail_ptrs_dl5g_lo = HWPfDmaFec5GdlRespPtrLoRegVf, + .tail_ptrs_ul5g_hi = HWPfDmaFec5GulRespPtrHiRegVf, + .tail_ptrs_ul5g_lo = HWPfDmaFec5GulRespPtrLoRegVf, + .tail_ptrs_dl4g_hi = HWPfDmaFec4GdlRespPtrHiRegVf, + .tail_ptrs_dl4g_lo = HWPfDmaFec4GdlRespPtrLoRegVf, + .tail_ptrs_ul4g_hi = HWPfDmaFec4GulRespPtrHiRegVf, + .tail_ptrs_ul4g_lo = HWPfDmaFec4GulRespPtrLoRegVf, + .tail_ptrs_fft_hi = HWPDmaFftRespPtrHiRegVf, + .tail_ptrs_fft_lo = HWPDmaFftRespPtrLoRegVf, + .depth_log0_offset = HWPfQmgrGrpDepthLog20Vf, + .depth_log1_offset = HWPfQmgrGrpDepthLog21Vf, + .qman_group_func = HWPfQmgrGrpFunction0, + .hi_mode = HWPfHiMsixVectorMapperPf, + .pmon_ctrl_a = HWPfPermonACntrlRegVf, + .pmon_ctrl_b = HWPfPermonBCntrlRegVf, + .pmon_ctrl_c = HWPfPermonCCntrlRegVf, +}; + +/* Structure holding registry addresses for VF */ +static const struct acc200_registry_addr vf_reg_addr = { + .dma_ring_dl5g_hi = HWVfDmaFec5GdlDescBaseHiRegVf, + .dma_ring_dl5g_lo = HWVfDmaFec5GdlDescBaseLoRegVf, + .dma_ring_ul5g_hi = HWVfDmaFec5GulDescBaseHiRegVf, + .dma_ring_ul5g_lo = HWVfDmaFec5GulDescBaseLoRegVf, + .dma_ring_dl4g_hi = HWVfDmaFec4GdlDescBaseHiRegVf, + .dma_ring_dl4g_lo = HWVfDmaFec4GdlDescBaseLoRegVf, + .dma_ring_ul4g_hi = HWVfDmaFec4GulDescBaseHiRegVf, + .dma_ring_ul4g_lo = HWVfDmaFec4GulDescBaseLoRegVf, + .dma_ring_fft_hi = HWVfDmaFftDescBaseHiRegVf, + .dma_ring_fft_lo = HWVfDmaFftDescBaseLoRegVf, + .ring_size = HWVfQmgrRingSizeVf, + .info_ring_hi = HWVfHiInfoRingBaseHiVf, + .info_ring_lo = HWVfHiInfoRingBaseLoVf, + .info_ring_en = HWVfHiInfoRingIntWrEnVf, + .info_ring_ptr = HWVfHiInfoRingPointerVf, + .tail_ptrs_dl5g_hi = HWVfDmaFec5GdlRespPtrHiRegVf, + .tail_ptrs_dl5g_lo = HWVfDmaFec5GdlRespPtrLoRegVf, + .tail_ptrs_ul5g_hi = HWVfDmaFec5GulRespPtrHiRegVf, + .tail_ptrs_ul5g_lo = HWVfDmaFec5GulRespPtrLoRegVf, + .tail_ptrs_dl4g_hi = HWVfDmaFec4GdlRespPtrHiRegVf, + .tail_ptrs_dl4g_lo = HWVfDmaFec4GdlRespPtrLoRegVf, + .tail_ptrs_ul4g_hi = HWVfDmaFec4GulRespPtrHiRegVf, + .tail_ptrs_ul4g_lo = HWVfDmaFec4GulRespPtrLoRegVf, + .tail_ptrs_fft_hi = HWVfDmaFftRespPtrHiRegVf, + .tail_ptrs_fft_lo = HWVfDmaFftRespPtrLoRegVf, + .depth_log0_offset = HWVfQmgrGrpDepthLog20Vf, + .depth_log1_offset = HWVfQmgrGrpDepthLog21Vf, + .qman_group_func = HWVfQmgrGrpFunction0Vf, + .hi_mode = HWVfHiMsixVectorMapperVf, + .pmon_ctrl_a = HWVfPmACntrlRegVf, + .pmon_ctrl_b = HWVfPmBCntrlRegVf, + .pmon_ctrl_c = HWVfPmCCntrlRegVf, +}; + #endif /* _RTE_ACC200_PMD_H_ */ diff --git a/drivers/baseband/acc200/acc200_vf_enum.h b/drivers/baseband/acc200/acc200_vf_enum.h new file mode 100644 index 0000000..0d35420 --- /dev/null +++ b/drivers/baseband/acc200/acc200_vf_enum.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#ifndef ACC200_VF_ENUM_H +#define ACC200_VF_ENUM_H + +/* + * ACC200 Register mapping on VF BAR0 + * This is automatically generated from RDL, format may change with new RDL + */ +enum { + HWVfQmgrIngressAq = 0x00000000, + HWVfHiVfToPfDbellVf = 0x00000800, + HWVfHiPfToVfDbellVf = 0x00000808, + HWVfHiInfoRingBaseLoVf = 0x00000810, + HWVfHiInfoRingBaseHiVf = 0x00000814, + HWVfHiInfoRingPointerVf = 0x00000818, + HWVfHiInfoRingIntWrEnVf = 0x00000820, + HWVfHiInfoRingPf2VfWrEnVf = 0x00000824, + HWVfHiMsixVectorMapperVf = 0x00000860, + HWVfDmaFec5GulDescBaseLoRegVf = 0x00000920, + HWVfDmaFec5GulDescBaseHiRegVf = 0x00000924, + HWVfDmaFec5GulRespPtrLoRegVf = 0x00000928, + HWVfDmaFec5GulRespPtrHiRegVf = 0x0000092C, + HWVfDmaFec5GdlDescBaseLoRegVf = 0x00000940, + HWVfDmaFec5GdlDescBaseHiRegVf = 0x00000944, + HWVfDmaFec5GdlRespPtrLoRegVf = 0x00000948, + HWVfDmaFec5GdlRespPtrHiRegVf = 0x0000094C, + HWVfDmaFec4GulDescBaseLoRegVf = 0x00000960, + HWVfDmaFec4GulDescBaseHiRegVf = 0x00000964, + HWVfDmaFec4GulRespPtrLoRegVf = 0x00000968, + HWVfDmaFec4GulRespPtrHiRegVf = 0x0000096C, + HWVfDmaFec4GdlDescBaseLoRegVf = 0x00000980, + HWVfDmaFec4GdlDescBaseHiRegVf = 0x00000984, + HWVfDmaFec4GdlRespPtrLoRegVf = 0x00000988, + HWVfDmaFec4GdlRespPtrHiRegVf = 0x0000098C, + HWVfDmaFftDescBaseLoRegVf = 0x000009A0, + HWVfDmaFftDescBaseHiRegVf = 0x000009A4, + HWVfDmaFftRespPtrLoRegVf = 0x000009A8, + HWVfDmaFftRespPtrHiRegVf = 0x000009AC, + HWVfQmgrAqResetVf = 0x00000E00, + HWVfQmgrRingSizeVf = 0x00000E04, + HWVfQmgrGrpDepthLog20Vf = 0x00000E08, + HWVfQmgrGrpDepthLog21Vf = 0x00000E0C, + HWVfQmgrGrpFunction0Vf = 0x00000E10, + HWVfQmgrGrpFunction1Vf = 0x00000E14, + HWVfPmACntrlRegVf = 0x00000F40, + HWVfPmACountVf = 0x00000F48, + HWVfPmAKCntLoVf = 0x00000F50, + HWVfPmAKCntHiVf = 0x00000F54, + HWVfPmADeltaCntLoVf = 0x00000F60, + HWVfPmADeltaCntHiVf = 0x00000F64, + HWVfPmBCntrlRegVf = 0x00000F80, + HWVfPmBCountVf = 0x00000F88, + HWVfPmBKCntLoVf = 0x00000F90, + HWVfPmBKCntHiVf = 0x00000F94, + HWVfPmBDeltaCntLoVf = 0x00000FA0, + HWVfPmBDeltaCntHiVf = 0x00000FA4, + HWVfPmCCntrlRegVf = 0x00000FC0, + HWVfPmCCountVf = 0x00000FC8, + HWVfPmCKCntLoVf = 0x00000FD0, + HWVfPmCKCntHiVf = 0x00000FD4, + HWVfPmCDeltaCntLoVf = 0x00000FE0, + HWVfPmCDeltaCntHiVf = 0x00000FE4 +}; + +/* TIP VF Interrupt numbers */ +enum { + ACC200_VF_INT_QMGR_AQ_OVERFLOW = 0, + ACC200_VF_INT_DOORBELL_PF_2_VF = 1, + ACC200_VF_INT_ILLEGAL_FORMAT = 2, + ACC200_VF_INT_QMGR_DISABLED_ACCESS = 3, + ACC200_VF_INT_QMGR_AQ_OVERTHRESHOLD = 4, + ACC200_VF_INT_DMA_DL_DESC_IRQ = 5, + ACC200_VF_INT_DMA_UL_DESC_IRQ = 6, + ACC200_VF_INT_DMA_FFT_DESC_IRQ = 7, + ACC200_VF_INT_DMA_UL5G_DESC_IRQ = 8, + ACC200_VF_INT_DMA_DL5G_DESC_IRQ = 9, + ACC200_VF_INT_DMA_MLD_DESC_IRQ = 10, +}; + +#endif /* ACC200_VF_ENUM_H */ diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index db8b641..5554488 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -34,6 +34,8 @@ acc200_dev_close(struct rte_bbdev *dev) { RTE_SET_USED(dev); + /* Ensure all in flight HW transactions are completed */ + usleep(ACC_LONG_WAIT); return 0; } -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 04/11] baseband/acc200: add info get function 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru ` (2 preceding siblings ...) 2022-09-12 1:08 ` [PATCH v2 03/11] baseband/acc200: add HW register definitions Nic Chautru @ 2022-09-12 1:08 ` Nic Chautru 2022-09-12 1:08 ` [PATCH v2 05/11] baseband/acc200: add queue configuration Nic Chautru ` (6 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nicolas Chautru From: Nicolas Chautru <nicolas.chautru@intel.com> Add support for info_get to allow to query the device. Null capability exposed. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/acc200_pmd.h | 1 + drivers/baseband/acc200/rte_acc200_cfg.h | 27 ++++ drivers/baseband/acc200/rte_acc200_pmd.c | 239 +++++++++++++++++++++++++++++++ 3 files changed, 267 insertions(+) create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h diff --git a/drivers/baseband/acc200/acc200_pmd.h b/drivers/baseband/acc200/acc200_pmd.h index 57b7e63..ec30e76 100644 --- a/drivers/baseband/acc200/acc200_pmd.h +++ b/drivers/baseband/acc200/acc200_pmd.h @@ -8,6 +8,7 @@ #include "../acc100/acc_common.h" #include "acc200_pf_enum.h" #include "acc200_vf_enum.h" +#include "rte_acc200_cfg.h" /* Helper macro for logging */ #define rte_bbdev_log(level, fmt, ...) \ diff --git a/drivers/baseband/acc200/rte_acc200_cfg.h b/drivers/baseband/acc200/rte_acc200_cfg.h new file mode 100644 index 0000000..9ae96c6 --- /dev/null +++ b/drivers/baseband/acc200/rte_acc200_cfg.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#ifndef _RTE_ACC200_CFG_H_ +#define _RTE_ACC200_CFG_H_ + +/** + * @file rte_acc200_cfg.h + * + * Functions for configuring ACC200 HW, exposed directly to applications. + * Configuration related to encoding/decoding is done through the + * librte_bbdev library. + * + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + */ + +#include <stdint.h> +#include <stdbool.h> +#include "../acc100/rte_acc_common_cfg.h" + +#ifdef __cplusplus +extern "C" { +#endif + +#endif /* _RTE_ACC200_CFG_H_ */ diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 5554488..43415eb 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -29,6 +29,197 @@ RTE_LOG_REGISTER_DEFAULT(acc200_logtype, NOTICE); #endif +/* Calculate the offset of the enqueue register */ +static inline uint32_t +queue_offset(bool pf_device, uint8_t vf_id, uint8_t qgrp_id, uint16_t aq_id) +{ + if (pf_device) + return ((vf_id << 12) + (qgrp_id << 7) + (aq_id << 3) + + HWPfQmgrIngressAq); + else + return ((qgrp_id << 7) + (aq_id << 3) + + HWVfQmgrIngressAq); +} + +enum {UL_4G = 0, UL_5G, DL_4G, DL_5G, FFT, NUM_ACC}; + +/* Return the queue topology for a Queue Group Index */ +static inline void +qtopFromAcc(struct rte_acc_queue_topology **qtop, int acc_enum, + struct rte_acc_conf *acc_conf) +{ + struct rte_acc_queue_topology *p_qtop; + p_qtop = NULL; + switch (acc_enum) { + case UL_4G: + p_qtop = &(acc_conf->q_ul_4g); + break; + case UL_5G: + p_qtop = &(acc_conf->q_ul_5g); + break; + case DL_4G: + p_qtop = &(acc_conf->q_dl_4g); + break; + case DL_5G: + p_qtop = &(acc_conf->q_dl_5g); + break; + case FFT: + p_qtop = &(acc_conf->q_fft); + break; + default: + /* NOTREACHED */ + rte_bbdev_log(ERR, "Unexpected error evaluating qtopFromAcc %d", + acc_enum); + break; + } + *qtop = p_qtop; +} + +static void +initQTop(struct rte_acc_conf *acc_conf) +{ + acc_conf->q_ul_4g.num_aqs_per_groups = 0; + acc_conf->q_ul_4g.num_qgroups = 0; + acc_conf->q_ul_4g.first_qgroup_index = -1; + acc_conf->q_ul_5g.num_aqs_per_groups = 0; + acc_conf->q_ul_5g.num_qgroups = 0; + acc_conf->q_ul_5g.first_qgroup_index = -1; + acc_conf->q_dl_4g.num_aqs_per_groups = 0; + acc_conf->q_dl_4g.num_qgroups = 0; + acc_conf->q_dl_4g.first_qgroup_index = -1; + acc_conf->q_dl_5g.num_aqs_per_groups = 0; + acc_conf->q_dl_5g.num_qgroups = 0; + acc_conf->q_dl_5g.first_qgroup_index = -1; + acc_conf->q_fft.num_aqs_per_groups = 0; + acc_conf->q_fft.num_qgroups = 0; + acc_conf->q_fft.first_qgroup_index = -1; +} + +static inline void +updateQtop(uint8_t acc, uint8_t qg, struct rte_acc_conf *acc_conf, + struct acc_device *d) { + uint32_t reg; + struct rte_acc_queue_topology *q_top = NULL; + qtopFromAcc(&q_top, acc, acc_conf); + if (unlikely(q_top == NULL)) + return; + uint16_t aq; + q_top->num_qgroups++; + if (q_top->first_qgroup_index == -1) { + q_top->first_qgroup_index = qg; + /* Can be optimized to assume all are enabled by default */ + reg = acc_reg_read(d, queue_offset(d->pf_device, + 0, qg, ACC200_NUM_AQS - 1)); + if (reg & ACC_QUEUE_ENABLE) { + q_top->num_aqs_per_groups = ACC200_NUM_AQS; + return; + } + q_top->num_aqs_per_groups = 0; + for (aq = 0; aq < ACC200_NUM_AQS; aq++) { + reg = acc_reg_read(d, queue_offset(d->pf_device, + 0, qg, aq)); + if (reg & ACC_QUEUE_ENABLE) + q_top->num_aqs_per_groups++; + } + } +} + +/* Fetch configuration enabled for the PF/VF using MMIO Read (slow) */ +static inline void +fetch_acc200_config(struct rte_bbdev *dev) +{ + struct acc_device *d = dev->data->dev_private; + struct rte_acc_conf *acc_conf = &d->acc_conf; + const struct acc200_registry_addr *reg_addr; + uint8_t acc, qg; + uint32_t reg_aq, reg_len0, reg_len1, reg0, reg1; + uint32_t reg_mode, idx; + + /* No need to retrieve the configuration is already done */ + if (d->configured) + return; + + /* Choose correct registry addresses for the device type */ + if (d->pf_device) + reg_addr = &pf_reg_addr; + else + reg_addr = &vf_reg_addr; + + d->ddr_size = 0; + + /* Single VF Bundle by VF */ + acc_conf->num_vf_bundles = 1; + initQTop(acc_conf); + + struct rte_acc_queue_topology *q_top = NULL; + int qman_func_id[ACC200_NUM_ACCS] = {ACC_ACCMAP_0, ACC_ACCMAP_1, + ACC_ACCMAP_2, ACC_ACCMAP_3, ACC_ACCMAP_4}; + reg0 = acc_reg_read(d, reg_addr->qman_group_func); + reg1 = acc_reg_read(d, reg_addr->qman_group_func + 4); + for (qg = 0; qg < ACC200_NUM_QGRPS; qg++) { + reg_aq = acc_reg_read(d, + queue_offset(d->pf_device, 0, qg, 0)); + if (reg_aq & ACC_QUEUE_ENABLE) { + /* printf("Qg enabled %d %x\n", qg, reg_aq); */ + if (qg < ACC_NUM_QGRPS_PER_WORD) + idx = (reg0 >> (qg * 4)) & 0x7; + else + idx = (reg1 >> ((qg - + ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7; + if (idx < ACC200_NUM_ACCS) { + acc = qman_func_id[idx]; + updateQtop(acc, qg, acc_conf, d); + } + } + } + + /* Check the depth of the AQs*/ + reg_len0 = acc_reg_read(d, reg_addr->depth_log0_offset); + reg_len1 = acc_reg_read(d, reg_addr->depth_log1_offset); + for (acc = 0; acc < NUM_ACC; acc++) { + qtopFromAcc(&q_top, acc, acc_conf); + if (q_top->first_qgroup_index < ACC_NUM_QGRPS_PER_WORD) + q_top->aq_depth_log2 = (reg_len0 >> + (q_top->first_qgroup_index * 4)) + & 0xF; + else + q_top->aq_depth_log2 = (reg_len1 >> + ((q_top->first_qgroup_index - + ACC_NUM_QGRPS_PER_WORD) * 4)) + & 0xF; + } + + /* Read PF mode */ + if (d->pf_device) { + reg_mode = acc_reg_read(d, HWPfHiPfMode); + acc_conf->pf_mode_en = (reg_mode == ACC_PF_VAL) ? 1 : 0; + } else { + reg_mode = acc_reg_read(d, reg_addr->hi_mode); + acc_conf->pf_mode_en = reg_mode & 1; + } + + rte_bbdev_log_debug( + "%s Config LLR SIGN IN/OUT %s %s QG %u %u %u %u %u AQ %u %u %u %u %u Len %u %u %u %u %u\n", + (d->pf_device) ? "PF" : "VF", + (acc_conf->input_pos_llr_1_bit) ? "POS" : "NEG", + (acc_conf->output_pos_llr_1_bit) ? "POS" : "NEG", + acc_conf->q_ul_4g.num_qgroups, + acc_conf->q_dl_4g.num_qgroups, + acc_conf->q_ul_5g.num_qgroups, + acc_conf->q_dl_5g.num_qgroups, + acc_conf->q_fft.num_qgroups, + acc_conf->q_ul_4g.num_aqs_per_groups, + acc_conf->q_dl_4g.num_aqs_per_groups, + acc_conf->q_ul_5g.num_aqs_per_groups, + acc_conf->q_dl_5g.num_aqs_per_groups, + acc_conf->q_fft.num_aqs_per_groups, + acc_conf->q_ul_4g.aq_depth_log2, + acc_conf->q_dl_4g.aq_depth_log2, + acc_conf->q_ul_5g.aq_depth_log2, + acc_conf->q_dl_5g.aq_depth_log2, + acc_conf->q_fft.aq_depth_log2); +} + /* Free memory used for software rings */ static int acc200_dev_close(struct rte_bbdev *dev) @@ -39,9 +230,57 @@ return 0; } +/* Get ACC200 device info */ +static void +acc200_dev_info_get(struct rte_bbdev *dev, + struct rte_bbdev_driver_info *dev_info) +{ + struct acc_device *d = dev->data->dev_private; + int i; + static const struct rte_bbdev_op_cap bbdev_capabilities[] = { + RTE_BBDEV_END_OF_CAPABILITIES_LIST() + }; + + static struct rte_bbdev_queue_conf default_queue_conf; + default_queue_conf.socket = dev->data->socket_id; + default_queue_conf.queue_size = ACC_MAX_QUEUE_DEPTH; + + dev_info->driver_name = dev->device->driver->name; + + /* Read and save the populated config from ACC200 registers */ + fetch_acc200_config(dev); + + /* Exposed number of queues */ + dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_FFT] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_FFT] = 0; + dev_info->max_num_queues = 0; + for (i = RTE_BBDEV_OP_NONE; i <= RTE_BBDEV_OP_FFT; i++) + dev_info->max_num_queues += dev_info->num_queues[i]; + dev_info->queue_size_lim = ACC_MAX_QUEUE_DEPTH; + dev_info->hardware_accelerated = true; + dev_info->max_dl_queue_priority = + d->acc_conf.q_dl_4g.num_qgroups - 1; + dev_info->max_ul_queue_priority = + d->acc_conf.q_ul_4g.num_qgroups - 1; + dev_info->default_queue_conf = default_queue_conf; + dev_info->cpu_flag_reqs = NULL; + dev_info->min_alignment = 1; + dev_info->capabilities = bbdev_capabilities; + dev_info->harq_buffer_size = 0; +} static const struct rte_bbdev_ops acc200_bbdev_ops = { .close = acc200_dev_close, + .info_get = acc200_dev_info_get, }; /* ACC200 PCI PF address map */ -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 05/11] baseband/acc200: add queue configuration 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru ` (3 preceding siblings ...) 2022-09-12 1:08 ` [PATCH v2 04/11] baseband/acc200: add info get function Nic Chautru @ 2022-09-12 1:08 ` Nic Chautru 2022-09-12 1:08 ` [PATCH v2 06/11] baseband/acc200: add LDPC processing functions Nic Chautru ` (5 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nicolas Chautru From: Nicolas Chautru <nicolas.chautru@intel.com> Adding fuinction to create and configure queues for the device. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/rte_acc200_pmd.c | 373 ++++++++++++++++++++++++++++++- 1 file changed, 372 insertions(+), 1 deletion(-) diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 43415eb..225bab9 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -220,16 +220,383 @@ acc_conf->q_fft.aq_depth_log2); } +/* Allocate 64MB memory used for all software rings */ +static int +acc200_setup_queues(struct rte_bbdev *dev, uint16_t num_queues, int socket_id) +{ + uint32_t phys_low, phys_high, value; + struct acc_device *d = dev->data->dev_private; + const struct acc200_registry_addr *reg_addr; + + if (d->pf_device && !d->acc_conf.pf_mode_en) { + rte_bbdev_log(NOTICE, + "%s has PF mode disabled. This PF can't be used.", + dev->data->name); + return -ENODEV; + } + if (!d->pf_device && d->acc_conf.pf_mode_en) { + rte_bbdev_log(NOTICE, + "%s has PF mode enabled. This VF can't be used.", + dev->data->name); + return -ENODEV; + } + + alloc_sw_rings_min_mem(dev, d, num_queues, socket_id); + + /* If minimal memory space approach failed, then allocate + * the 2 * 64MB block for the sw rings + */ + if (d->sw_rings == NULL) + alloc_2x64mb_sw_rings_mem(dev, d, socket_id); + + if (d->sw_rings == NULL) { + rte_bbdev_log(NOTICE, + "Failure allocating sw_rings memory"); + return -ENODEV; + } + + /* Configure ACC200 with the base address for DMA descriptor rings + * Same descriptor rings used for UL and DL DMA Engines + * Note : Assuming only VF0 bundle is used for PF mode + */ + phys_high = (uint32_t)(d->sw_rings_iova >> 32); + phys_low = (uint32_t)(d->sw_rings_iova & ~(ACC_SIZE_64MBYTE-1)); + + /* Choose correct registry addresses for the device type */ + if (d->pf_device) + reg_addr = &pf_reg_addr; + else + reg_addr = &vf_reg_addr; + + /* Read the populated cfg from ACC200 registers */ + fetch_acc200_config(dev); + + /* Start Pmon */ + for (value = 0; value <= 2; value++) { + acc_reg_write(d, reg_addr->pmon_ctrl_a, value); + acc_reg_write(d, reg_addr->pmon_ctrl_b, value); + acc_reg_write(d, reg_addr->pmon_ctrl_c, value); + } + + /* Release AXI from PF */ + if (d->pf_device) + acc_reg_write(d, HWPfDmaAxiControl, 1); + + acc_reg_write(d, reg_addr->dma_ring_ul5g_hi, phys_high); + acc_reg_write(d, reg_addr->dma_ring_ul5g_lo, phys_low); + acc_reg_write(d, reg_addr->dma_ring_dl5g_hi, phys_high); + acc_reg_write(d, reg_addr->dma_ring_dl5g_lo, phys_low); + acc_reg_write(d, reg_addr->dma_ring_ul4g_hi, phys_high); + acc_reg_write(d, reg_addr->dma_ring_ul4g_lo, phys_low); + acc_reg_write(d, reg_addr->dma_ring_dl4g_hi, phys_high); + acc_reg_write(d, reg_addr->dma_ring_dl4g_lo, phys_low); + acc_reg_write(d, reg_addr->dma_ring_fft_hi, phys_high); + acc_reg_write(d, reg_addr->dma_ring_fft_lo, phys_low); + /* + * Configure Ring Size to the max queue ring size + * (used for wrapping purpose) + */ + value = log2_basic(d->sw_ring_size / 64); + acc_reg_write(d, reg_addr->ring_size, value); + + /* Configure tail pointer for use when SDONE enabled */ + if (d->tail_ptrs == NULL) + d->tail_ptrs = rte_zmalloc_socket( + dev->device->driver->name, + ACC200_NUM_QGRPS * ACC200_NUM_AQS * sizeof(uint32_t), + RTE_CACHE_LINE_SIZE, socket_id); + if (d->tail_ptrs == NULL) { + rte_bbdev_log(ERR, "Failed to allocate tail ptr for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + rte_free(d->sw_rings); + return -ENOMEM; + } + d->tail_ptr_iova = rte_malloc_virt2iova(d->tail_ptrs); + + phys_high = (uint32_t)(d->tail_ptr_iova >> 32); + phys_low = (uint32_t)(d->tail_ptr_iova); + acc_reg_write(d, reg_addr->tail_ptrs_ul5g_hi, phys_high); + acc_reg_write(d, reg_addr->tail_ptrs_ul5g_lo, phys_low); + acc_reg_write(d, reg_addr->tail_ptrs_dl5g_hi, phys_high); + acc_reg_write(d, reg_addr->tail_ptrs_dl5g_lo, phys_low); + acc_reg_write(d, reg_addr->tail_ptrs_ul4g_hi, phys_high); + acc_reg_write(d, reg_addr->tail_ptrs_ul4g_lo, phys_low); + acc_reg_write(d, reg_addr->tail_ptrs_dl4g_hi, phys_high); + acc_reg_write(d, reg_addr->tail_ptrs_dl4g_lo, phys_low); + acc_reg_write(d, reg_addr->tail_ptrs_fft_hi, phys_high); + acc_reg_write(d, reg_addr->tail_ptrs_fft_lo, phys_low); + + if (d->harq_layout == NULL) + d->harq_layout = rte_zmalloc_socket("HARQ Layout", + ACC_HARQ_LAYOUT * sizeof(*d->harq_layout), + RTE_CACHE_LINE_SIZE, dev->data->socket_id); + if (d->harq_layout == NULL) { + rte_bbdev_log(ERR, "Failed to allocate harq_layout for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + rte_free(d->sw_rings); + return -ENOMEM; + } + + /* Mark as configured properly */ + d->configured = true; + + rte_bbdev_log_debug( + "ACC200 (%s) configured sw_rings = %p, sw_rings_iova = %#" + PRIx64, dev->data->name, d->sw_rings, d->sw_rings_iova); + + return 0; +} + /* Free memory used for software rings */ static int acc200_dev_close(struct rte_bbdev *dev) { - RTE_SET_USED(dev); + struct acc_device *d = dev->data->dev_private; + if (d->sw_rings_base != NULL) { + rte_free(d->tail_ptrs); + rte_free(d->sw_rings_base); + rte_free(d->harq_layout); + d->sw_rings_base = NULL; + d->tail_ptrs = NULL; + d->harq_layout = NULL; + } /* Ensure all in flight HW transactions are completed */ usleep(ACC_LONG_WAIT); return 0; } +/** + * Report a ACC200 queue index which is free + * Return 0 to 16k for a valid queue_idx or -1 when no queue is available + * Note : Only supporting VF0 Bundle for PF mode + */ +static int +acc200_find_free_queue_idx(struct rte_bbdev *dev, + const struct rte_bbdev_queue_conf *conf) +{ + struct acc_device *d = dev->data->dev_private; + int op_2_acc[6] = {0, UL_4G, DL_4G, UL_5G, DL_5G, FFT}; + int acc = op_2_acc[conf->op_type]; + struct rte_acc_queue_topology *qtop = NULL; + + qtopFromAcc(&qtop, acc, &(d->acc_conf)); + if (qtop == NULL) + return -1; + /* Identify matching QGroup Index which are sorted in priority order */ + uint16_t group_idx = qtop->first_qgroup_index; + group_idx += conf->priority; + if (group_idx >= ACC200_NUM_QGRPS || + conf->priority >= qtop->num_qgroups) { + rte_bbdev_log(INFO, "Invalid Priority on %s, priority %u", + dev->data->name, conf->priority); + return -1; + } + /* Find a free AQ_idx */ + uint16_t aq_idx; + for (aq_idx = 0; aq_idx < qtop->num_aqs_per_groups; aq_idx++) { + if (((d->q_assigned_bit_map[group_idx] >> aq_idx) & 0x1) == 0) { + /* Mark the Queue as assigned */ + d->q_assigned_bit_map[group_idx] |= (1 << aq_idx); + /* Report the AQ Index */ + return (group_idx << ACC200_GRP_ID_SHIFT) + aq_idx; + } + } + rte_bbdev_log(INFO, "Failed to find free queue on %s, priority %u", + dev->data->name, conf->priority); + return -1; +} + +/* Setup ACC200 queue */ +static int +acc200_queue_setup(struct rte_bbdev *dev, uint16_t queue_id, + const struct rte_bbdev_queue_conf *conf) +{ + struct acc_device *d = dev->data->dev_private; + struct acc_queue *q; + int16_t q_idx; + + if (d == NULL) { + rte_bbdev_log(ERR, "Undefined device"); + return -ENODEV; + } + /* Allocate the queue data structure. */ + q = rte_zmalloc_socket(dev->device->driver->name, sizeof(*q), + RTE_CACHE_LINE_SIZE, conf->socket); + if (q == NULL) { + rte_bbdev_log(ERR, "Failed to allocate queue memory"); + return -ENOMEM; + } + + q->d = d; + q->ring_addr = RTE_PTR_ADD(d->sw_rings, (d->sw_ring_size * queue_id)); + q->ring_addr_iova = d->sw_rings_iova + (d->sw_ring_size * queue_id); + + /* Prepare the Ring with default descriptor format */ + union acc_dma_desc *desc = NULL; + unsigned int desc_idx, b_idx; + int fcw_len = (conf->op_type == RTE_BBDEV_OP_LDPC_ENC ? + ACC_FCW_LE_BLEN : (conf->op_type == RTE_BBDEV_OP_TURBO_DEC ? + ACC_FCW_TD_BLEN : (conf->op_type == RTE_BBDEV_OP_LDPC_DEC ? + ACC_FCW_LD_BLEN : ACC_FCW_FFT_BLEN))); + + for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) { + desc = q->ring_addr + desc_idx; + desc->req.word0 = ACC_DMA_DESC_TYPE; + desc->req.word1 = 0; /**< Timestamp */ + desc->req.word2 = 0; + desc->req.word3 = 0; + uint64_t fcw_offset = (desc_idx << 8) + ACC_DESC_FCW_OFFSET; + desc->req.data_ptrs[0].address = q->ring_addr_iova + fcw_offset; + desc->req.data_ptrs[0].blen = fcw_len; + desc->req.data_ptrs[0].blkid = ACC_DMA_BLKID_FCW; + desc->req.data_ptrs[0].last = 0; + desc->req.data_ptrs[0].dma_ext = 0; + for (b_idx = 1; b_idx < ACC_DMA_MAX_NUM_POINTERS - 1; + b_idx++) { + desc->req.data_ptrs[b_idx].blkid = ACC_DMA_BLKID_IN; + desc->req.data_ptrs[b_idx].last = 1; + desc->req.data_ptrs[b_idx].dma_ext = 0; + b_idx++; + desc->req.data_ptrs[b_idx].blkid = + ACC_DMA_BLKID_OUT_ENC; + desc->req.data_ptrs[b_idx].last = 1; + desc->req.data_ptrs[b_idx].dma_ext = 0; + } + /* Preset some fields of LDPC FCW */ + desc->req.fcw_ld.FCWversion = ACC_FCW_VER; + desc->req.fcw_ld.gain_i = 1; + desc->req.fcw_ld.gain_h = 1; + } + + q->lb_in = rte_zmalloc_socket(dev->device->driver->name, + RTE_CACHE_LINE_SIZE, + RTE_CACHE_LINE_SIZE, conf->socket); + if (q->lb_in == NULL) { + rte_bbdev_log(ERR, "Failed to allocate lb_in memory"); + rte_free(q); + return -ENOMEM; + } + q->lb_in_addr_iova = rte_malloc_virt2iova(q->lb_in); + q->lb_out = rte_zmalloc_socket(dev->device->driver->name, + RTE_CACHE_LINE_SIZE, + RTE_CACHE_LINE_SIZE, conf->socket); + if (q->lb_out == NULL) { + rte_bbdev_log(ERR, "Failed to allocate lb_out memory"); + rte_free(q->lb_in); + rte_free(q); + return -ENOMEM; + } + q->lb_out_addr_iova = rte_malloc_virt2iova(q->lb_out); + q->companion_ring_addr = rte_zmalloc_socket(dev->device->driver->name, + d->sw_ring_max_depth * sizeof(*q->companion_ring_addr), + RTE_CACHE_LINE_SIZE, conf->socket); + if (q->companion_ring_addr == NULL) { + rte_bbdev_log(ERR, "Failed to allocate companion_ring memory"); + rte_free(q->lb_in); + rte_free(q->lb_out); + rte_free(q); + return -ENOMEM; + } + + /* + * Software queue ring wraps synchronously with the HW when it reaches + * the boundary of the maximum allocated queue size, no matter what the + * sw queue size is. This wrapping is guarded by setting the wrap_mask + * to represent the maximum queue size as allocated at the time when + * the device has been setup (in configure()). + * + * The queue depth is set to the queue size value (conf->queue_size). + * This limits the occupancy of the queue at any point of time, so that + * the queue does not get swamped with enqueue requests. + */ + q->sw_ring_depth = conf->queue_size; + q->sw_ring_wrap_mask = d->sw_ring_max_depth - 1; + + q->op_type = conf->op_type; + + q_idx = acc200_find_free_queue_idx(dev, conf); + if (q_idx == -1) { + rte_free(q->companion_ring_addr); + rte_free(q->lb_in); + rte_free(q->lb_out); + rte_free(q); + return -1; + } + + q->qgrp_id = (q_idx >> ACC200_GRP_ID_SHIFT) & 0xF; + q->vf_id = (q_idx >> ACC200_VF_ID_SHIFT) & 0x3F; + q->aq_id = q_idx & 0xF; + q->aq_depth = 0; + if (conf->op_type == RTE_BBDEV_OP_TURBO_DEC) + q->aq_depth = (1 << d->acc_conf.q_ul_4g.aq_depth_log2); + else if (conf->op_type == RTE_BBDEV_OP_TURBO_ENC) + q->aq_depth = (1 << d->acc_conf.q_dl_4g.aq_depth_log2); + else if (conf->op_type == RTE_BBDEV_OP_LDPC_DEC) + q->aq_depth = (1 << d->acc_conf.q_ul_5g.aq_depth_log2); + else if (conf->op_type == RTE_BBDEV_OP_LDPC_ENC) + q->aq_depth = (1 << d->acc_conf.q_dl_5g.aq_depth_log2); + else if (conf->op_type == RTE_BBDEV_OP_FFT) + q->aq_depth = (1 << d->acc_conf.q_fft.aq_depth_log2); + + q->mmio_reg_enqueue = RTE_PTR_ADD(d->mmio_base, + queue_offset(d->pf_device, + q->vf_id, q->qgrp_id, q->aq_id)); + + rte_bbdev_log_debug( + "Setup dev%u q%u: qgrp_id=%u, vf_id=%u, aq_id=%u, aq_depth=%u, mmio_reg_enqueue=%p base %p\n", + dev->data->dev_id, queue_id, q->qgrp_id, q->vf_id, + q->aq_id, q->aq_depth, q->mmio_reg_enqueue, + d->mmio_base); + + dev->data->queues[queue_id].queue_private = q; + return 0; +} + + +static int +acc_queue_stop(struct rte_bbdev *dev, uint16_t queue_id) +{ + struct acc_queue *q; + q = dev->data->queues[queue_id].queue_private; + rte_bbdev_log(INFO, "Queue Stop %d H/T/D %d %d %x OpType %d", + queue_id, q->sw_ring_head, q->sw_ring_tail, + q->sw_ring_depth, q->op_type); + /* ignore all operations in flight and clear counters */ + q->sw_ring_tail = q->sw_ring_head; + q->aq_enqueued = 0; + q->aq_dequeued = 0; + dev->data->queues[queue_id].queue_stats.enqueued_count = 0; + dev->data->queues[queue_id].queue_stats.dequeued_count = 0; + dev->data->queues[queue_id].queue_stats.enqueue_err_count = 0; + dev->data->queues[queue_id].queue_stats.dequeue_err_count = 0; + dev->data->queues[queue_id].queue_stats.enqueue_warn_count = 0; + dev->data->queues[queue_id].queue_stats.dequeue_warn_count = 0; + return 0; +} + +/* Release ACC200 queue */ +static int +acc200_queue_release(struct rte_bbdev *dev, uint16_t q_id) +{ + struct acc_device *d = dev->data->dev_private; + struct acc_queue *q = dev->data->queues[q_id].queue_private; + + if (q != NULL) { + /* Mark the Queue as un-assigned */ + d->q_assigned_bit_map[q->qgrp_id] &= (0xFFFFFFFF - + (1 << q->aq_id)); + rte_free(q->companion_ring_addr); + rte_free(q->lb_in); + rte_free(q->lb_out); + rte_free(q); + dev->data->queues[q_id].queue_private = NULL; + } + + return 0; +} + /* Get ACC200 device info */ static void acc200_dev_info_get(struct rte_bbdev *dev, @@ -279,8 +646,12 @@ } static const struct rte_bbdev_ops acc200_bbdev_ops = { + .setup_queues = acc200_setup_queues, .close = acc200_dev_close, .info_get = acc200_dev_info_get, + .queue_setup = acc200_queue_setup, + .queue_release = acc200_queue_release, + .queue_stop = acc_queue_stop, }; /* ACC200 PCI PF address map */ -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 06/11] baseband/acc200: add LDPC processing functions 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru ` (4 preceding siblings ...) 2022-09-12 1:08 ` [PATCH v2 05/11] baseband/acc200: add queue configuration Nic Chautru @ 2022-09-12 1:08 ` Nic Chautru 2022-09-12 1:08 ` [PATCH v2 07/11] baseband/acc200: add LTE " Nic Chautru ` (4 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nicolas Chautru From: Nicolas Chautru <nicolas.chautru@intel.com> Adding LDPC encode and decode processing functions. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/rte_acc200_pmd.c | 1526 +++++++++++++++++++++++++++++- 1 file changed, 1522 insertions(+), 4 deletions(-) diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 225bab9..def5ed7 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -554,15 +554,50 @@ return 0; } +static inline void +acc200_print_op(struct rte_bbdev_dec_op *op, enum rte_bbdev_op_type op_type, + uint16_t index) +{ + if (op == NULL) + return; + if (op_type == RTE_BBDEV_OP_LDPC_DEC) + rte_bbdev_log(INFO, + " Op 5GUL %d %d %d %d %d %d %d %d %d %d %d %d", + index, + op->ldpc_dec.basegraph, op->ldpc_dec.z_c, + op->ldpc_dec.n_cb, op->ldpc_dec.q_m, + op->ldpc_dec.n_filler, op->ldpc_dec.cb_params.e, + op->ldpc_dec.op_flags, op->ldpc_dec.rv_index, + op->ldpc_dec.iter_max, op->ldpc_dec.iter_count, + op->ldpc_dec.harq_combined_input.length + ); + else if (op_type == RTE_BBDEV_OP_LDPC_ENC) { + struct rte_bbdev_enc_op *op_dl = (struct rte_bbdev_enc_op *) op; + rte_bbdev_log(INFO, + " Op 5GDL %d %d %d %d %d %d %d %d %d", + index, + op_dl->ldpc_enc.basegraph, op_dl->ldpc_enc.z_c, + op_dl->ldpc_enc.n_cb, op_dl->ldpc_enc.q_m, + op_dl->ldpc_enc.n_filler, op_dl->ldpc_enc.cb_params.e, + op_dl->ldpc_enc.op_flags, op_dl->ldpc_enc.rv_index + ); + } +} static int acc_queue_stop(struct rte_bbdev *dev, uint16_t queue_id) { struct acc_queue *q; + struct rte_bbdev_dec_op *op; + uint16_t i; q = dev->data->queues[queue_id].queue_private; rte_bbdev_log(INFO, "Queue Stop %d H/T/D %d %d %x OpType %d", queue_id, q->sw_ring_head, q->sw_ring_tail, q->sw_ring_depth, q->op_type); + for (i = 0; i < q->sw_ring_depth; ++i) { + op = (q->ring_addr + i)->req.op_addr; + acc200_print_op(op, q->op_type, i); + } /* ignore all operations in flight and clear counters */ q->sw_ring_tail = q->sw_ring_head; q->aq_enqueued = 0; @@ -605,6 +640,43 @@ struct acc_device *d = dev->data->dev_private; int i; static const struct rte_bbdev_op_cap bbdev_capabilities[] = { + { + .type = RTE_BBDEV_OP_LDPC_ENC, + .cap.ldpc_enc = { + .capability_flags = + RTE_BBDEV_LDPC_RATE_MATCH | + RTE_BBDEV_LDPC_CRC_24B_ATTACH | + RTE_BBDEV_LDPC_INTERLEAVER_BYPASS, + .num_buffers_src = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + .num_buffers_dst = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + } + }, + { + .type = RTE_BBDEV_OP_LDPC_DEC, + .cap.ldpc_dec = { + .capability_flags = + RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK | + RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP | + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK | + RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK | + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE | + RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE | + RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE | + RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS | + RTE_BBDEV_LDPC_DEC_SCATTER_GATHER | + RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION | + RTE_BBDEV_LDPC_LLR_COMPRESSION, + .llr_size = 8, + .llr_decimals = 1, + .num_buffers_src = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + .num_buffers_hard_out = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + .num_buffers_soft_out = 0, + } + }, RTE_BBDEV_END_OF_CAPABILITIES_LIST() }; @@ -621,13 +693,15 @@ dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = 0; dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = 0; - dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = 0; - dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = d->acc_conf.q_ul_5g.num_aqs_per_groups * + d->acc_conf.q_ul_5g.num_qgroups; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = d->acc_conf.q_dl_5g.num_aqs_per_groups * + d->acc_conf.q_dl_5g.num_qgroups; dev_info->num_queues[RTE_BBDEV_OP_FFT] = 0; dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = 0; dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = 0; - dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = 0; - dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = d->acc_conf.q_ul_5g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = d->acc_conf.q_dl_5g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_FFT] = 0; dev_info->max_num_queues = 0; for (i = RTE_BBDEV_OP_NONE; i <= RTE_BBDEV_OP_FFT; i++) @@ -670,6 +744,1446 @@ {.device_id = 0}, }; +/* Fill in a frame control word for LDPC decoding. */ +static inline void +acc200_fcw_ld_fill(struct rte_bbdev_dec_op *op, struct acc_fcw_ld *fcw, + union acc_harq_layout_data *harq_layout) +{ + uint16_t harq_out_length, harq_in_length, ncb_p, k0_p, parity_offset; + uint32_t harq_index; + uint32_t l; + + fcw->qm = op->ldpc_dec.q_m; + fcw->nfiller = op->ldpc_dec.n_filler; + fcw->BG = (op->ldpc_dec.basegraph - 1); + fcw->Zc = op->ldpc_dec.z_c; + fcw->ncb = op->ldpc_dec.n_cb; + fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_dec.basegraph, + op->ldpc_dec.rv_index); + if (op->ldpc_dec.code_block_mode == RTE_BBDEV_CODE_BLOCK) + fcw->rm_e = op->ldpc_dec.cb_params.e; + else + fcw->rm_e = (op->ldpc_dec.tb_params.r < + op->ldpc_dec.tb_params.cab) ? + op->ldpc_dec.tb_params.ea : + op->ldpc_dec.tb_params.eb; + + if (unlikely(check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE) && + (op->ldpc_dec.harq_combined_input.length == 0))) { + rte_bbdev_log(WARNING, "Null HARQ input size provided"); + /* Disable HARQ input in that case to carry forward */ + op->ldpc_dec.op_flags ^= RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE; + } + if (unlikely(fcw->rm_e == 0)) { + rte_bbdev_log(WARNING, "Null E input provided"); + fcw->rm_e = 2; + } + + fcw->hcin_en = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE); + fcw->hcout_en = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE); + fcw->crc_select = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK); + fcw->bypass_dec = 0; + fcw->bypass_intlv = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS); + if (op->ldpc_dec.q_m == 1) { + fcw->bypass_intlv = 1; + fcw->qm = 2; + } + fcw->hcin_decomp_mode = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION); + fcw->hcout_comp_mode = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION); + fcw->llr_pack_mode = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_LLR_COMPRESSION); + harq_index = hq_index(op->ldpc_dec.harq_combined_output.offset); + + if (fcw->hcin_en > 0) { + harq_in_length = op->ldpc_dec.harq_combined_input.length; + if (fcw->hcin_decomp_mode > 0) + harq_in_length = harq_in_length * 8 / 6; + harq_in_length = RTE_MIN(harq_in_length, op->ldpc_dec.n_cb + - op->ldpc_dec.n_filler); + harq_in_length = RTE_ALIGN_CEIL(harq_in_length, 64); + fcw->hcin_size0 = harq_in_length; + fcw->hcin_offset = 0; + fcw->hcin_size1 = 0; + } else { + fcw->hcin_size0 = 0; + fcw->hcin_offset = 0; + fcw->hcin_size1 = 0; + } + + fcw->itmax = op->ldpc_dec.iter_max; + fcw->itstop = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE); + fcw->cnu_algo = ACC_ALGO_MSA; + fcw->synd_precoder = fcw->itstop; + /* + * These are all implicitly set + * fcw->synd_post = 0; + * fcw->so_en = 0; + * fcw->so_bypass_rm = 0; + * fcw->so_bypass_intlv = 0; + * fcw->dec_convllr = 0; + * fcw->hcout_convllr = 0; + * fcw->hcout_size1 = 0; + * fcw->so_it = 0; + * fcw->hcout_offset = 0; + * fcw->negstop_th = 0; + * fcw->negstop_it = 0; + * fcw->negstop_en = 0; + * fcw->gain_i = 1; + * fcw->gain_h = 1; + */ + if (fcw->hcout_en > 0) { + parity_offset = (op->ldpc_dec.basegraph == 1 ? 20 : 8) + * op->ldpc_dec.z_c - op->ldpc_dec.n_filler; + k0_p = (fcw->k0 > parity_offset) ? + fcw->k0 - op->ldpc_dec.n_filler : fcw->k0; + ncb_p = fcw->ncb - op->ldpc_dec.n_filler; + l = k0_p + fcw->rm_e; + harq_out_length = (uint16_t) fcw->hcin_size0; + harq_out_length = RTE_MIN(RTE_MAX(harq_out_length, l), ncb_p); + harq_out_length = RTE_ALIGN_CEIL(harq_out_length, 64); + fcw->hcout_size0 = harq_out_length; + fcw->hcout_size1 = 0; + fcw->hcout_offset = 0; + harq_layout[harq_index].offset = fcw->hcout_offset; + harq_layout[harq_index].size0 = fcw->hcout_size0; + } else { + fcw->hcout_size0 = 0; + fcw->hcout_size1 = 0; + fcw->hcout_offset = 0; + } + + fcw->tb_crc_select = 0; + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK)) + fcw->tb_crc_select = 2; + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK)) + fcw->tb_crc_select = 1; +} + +static inline int +acc200_dma_desc_le_fill(struct rte_bbdev_enc_op *op, + struct acc_dma_req_desc *desc, struct rte_mbuf **input, + struct rte_mbuf *output, uint32_t *in_offset, + uint32_t *out_offset, uint32_t *out_length, + uint32_t *mbuf_total_left, uint32_t *seg_total_left) +{ + int next_triplet = 1; /* FCW already done */ + uint16_t K, in_length_in_bits, in_length_in_bytes; + struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc; + + acc_header_init(desc); + K = (enc->basegraph == 1 ? 22 : 10) * enc->z_c; + in_length_in_bits = K - enc->n_filler; + if ((enc->op_flags & RTE_BBDEV_LDPC_CRC_24A_ATTACH) || + (enc->op_flags & RTE_BBDEV_LDPC_CRC_24B_ATTACH)) + in_length_in_bits -= 24; + in_length_in_bytes = in_length_in_bits >> 3; + + if (unlikely((*mbuf_total_left == 0) || + (*mbuf_total_left < in_length_in_bytes))) { + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u", + *mbuf_total_left, in_length_in_bytes); + return -1; + } + + next_triplet = acc_dma_fill_blk_type_in(desc, input, in_offset, + in_length_in_bytes, + seg_total_left, next_triplet, + check_bit(op->ldpc_enc.op_flags, + RTE_BBDEV_LDPC_ENC_SCATTER_GATHER)); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + desc->data_ptrs[next_triplet - 1].last = 1; + desc->m2dlen = next_triplet; + *mbuf_total_left -= in_length_in_bytes; + + /* Set output length */ + /* Integer round up division by 8 */ + *out_length = (enc->cb_params.e + 7) >> 3; + + next_triplet = acc_dma_fill_blk_type(desc, output, *out_offset, + *out_length, next_triplet, ACC_DMA_BLKID_OUT_ENC); + op->ldpc_enc.output.length += *out_length; + *out_offset += *out_length; + desc->data_ptrs[next_triplet - 1].last = 1; + desc->data_ptrs[next_triplet - 1].dma_ext = 0; + desc->d2mlen = next_triplet - desc->m2dlen; + + desc->op_addr = op; + + return 0; +} + +static inline int +acc200_dma_desc_ld_fill(struct rte_bbdev_dec_op *op, + struct acc_dma_req_desc *desc, + struct rte_mbuf **input, struct rte_mbuf *h_output, + uint32_t *in_offset, uint32_t *h_out_offset, + uint32_t *h_out_length, uint32_t *mbuf_total_left, + uint32_t *seg_total_left, + struct acc_fcw_ld *fcw) +{ + struct rte_bbdev_op_ldpc_dec *dec = &op->ldpc_dec; + int next_triplet = 1; /* FCW already done */ + uint32_t input_length; + uint16_t output_length, crc24_overlap = 0; + uint16_t sys_cols, K, h_p_size, h_np_size; + bool h_comp = check_bit(dec->op_flags, + RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION); + + acc_header_init(desc); + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP)) + crc24_overlap = 24; + + /* Compute some LDPC BG lengths */ + input_length = fcw->rm_e; + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_LLR_COMPRESSION)) + input_length = (input_length * 3 + 3) / 4; + sys_cols = (dec->basegraph == 1) ? 22 : 10; + K = sys_cols * dec->z_c; + output_length = K - dec->n_filler - crc24_overlap; + + if (unlikely((*mbuf_total_left == 0) || + (*mbuf_total_left < input_length))) { + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u", + *mbuf_total_left, input_length); + return -1; + } + + next_triplet = acc_dma_fill_blk_type_in(desc, input, + in_offset, input_length, + seg_total_left, next_triplet, + check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_DEC_SCATTER_GATHER)); + + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE)) { + if (op->ldpc_dec.harq_combined_input.data == 0) { + rte_bbdev_log(ERR, "HARQ input is not defined"); + return -1; + } + h_p_size = fcw->hcin_size0 + fcw->hcin_size1; + if (h_comp) + h_p_size = (h_p_size * 3 + 3) / 4; + if (op->ldpc_dec.harq_combined_input.data == 0) { + rte_bbdev_log(ERR, "HARQ input is not defined"); + return -1; + } + acc_dma_fill_blk_type( + desc, + op->ldpc_dec.harq_combined_input.data, + op->ldpc_dec.harq_combined_input.offset, + h_p_size, + next_triplet, + ACC_DMA_BLKID_IN_HARQ); + next_triplet++; + } + + desc->data_ptrs[next_triplet - 1].last = 1; + desc->m2dlen = next_triplet; + *mbuf_total_left -= input_length; + + next_triplet = acc_dma_fill_blk_type(desc, h_output, + *h_out_offset, output_length >> 3, next_triplet, + ACC_DMA_BLKID_OUT_HARD); + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE)) { + if (op->ldpc_dec.harq_combined_output.data == 0) { + rte_bbdev_log(ERR, "HARQ output is not defined"); + return -1; + } + + /* Pruned size of the HARQ */ + h_p_size = fcw->hcout_size0 + fcw->hcout_size1; + /* Non-Pruned size of the HARQ */ + h_np_size = fcw->hcout_offset > 0 ? + fcw->hcout_offset + fcw->hcout_size1 : + h_p_size; + if (h_comp) { + h_np_size = (h_np_size * 3 + 3) / 4; + h_p_size = (h_p_size * 3 + 3) / 4; + } + dec->harq_combined_output.length = h_np_size; + acc_dma_fill_blk_type( + desc, + dec->harq_combined_output.data, + dec->harq_combined_output.offset, + h_p_size, + next_triplet, + ACC_DMA_BLKID_OUT_HARQ); + + next_triplet++; + } + + *h_out_length = output_length >> 3; + dec->hard_output.length += *h_out_length; + *h_out_offset += *h_out_length; + desc->data_ptrs[next_triplet - 1].last = 1; + desc->d2mlen = next_triplet - desc->m2dlen; + + desc->op_addr = op; + + return 0; +} + +static inline void +acc200_dma_desc_ld_update(struct rte_bbdev_dec_op *op, + struct acc_dma_req_desc *desc, + struct rte_mbuf *input, struct rte_mbuf *h_output, + uint32_t *in_offset, uint32_t *h_out_offset, + uint32_t *h_out_length, + union acc_harq_layout_data *harq_layout) +{ + int next_triplet = 1; /* FCW already done */ + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(input, *in_offset); + next_triplet++; + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE)) { + struct rte_bbdev_op_data hi = op->ldpc_dec.harq_combined_input; + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(hi.data, hi.offset); + next_triplet++; + } + + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(h_output, *h_out_offset); + *h_out_length = desc->data_ptrs[next_triplet].blen; + next_triplet++; + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE)) { + /* Adjust based on previous operation */ + struct rte_bbdev_dec_op *prev_op = desc->op_addr; + op->ldpc_dec.harq_combined_output.length = + prev_op->ldpc_dec.harq_combined_output.length; + uint32_t harq_idx = hq_index( + op->ldpc_dec.harq_combined_output.offset); + uint32_t prev_harq_idx = hq_index( + prev_op->ldpc_dec.harq_combined_output.offset); + harq_layout[harq_idx].val = harq_layout[prev_harq_idx].val; + struct rte_bbdev_op_data ho = + op->ldpc_dec.harq_combined_output; + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(ho.data, ho.offset); + next_triplet++; + } + + op->ldpc_dec.hard_output.length += *h_out_length; + desc->op_addr = op; +} + +/* Enqueue one encode operations for ACC200 device in CB mode + * multiplexed on the same descriptor + */ +static inline int +enqueue_ldpc_enc_n_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ops, + uint16_t total_enqueued_descs, int16_t num) +{ + union acc_dma_desc *desc = NULL; + uint32_t out_length; + struct rte_mbuf *output_head, *output; + int i, next_triplet; + uint16_t in_length_in_bytes; + struct rte_bbdev_op_ldpc_enc *enc = &ops[0]->ldpc_enc; + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_descs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + acc_fcw_le_fill(ops[0], &desc->req.fcw_le, num, 0); + + /** This could be done at polling */ + acc_header_init(&desc->req); + desc->req.numCBs = num; + + in_length_in_bytes = ops[0]->ldpc_enc.input.data->data_len; + out_length = (enc->cb_params.e + 7) >> 3; + desc->req.m2dlen = 1 + num; + desc->req.d2mlen = num; + next_triplet = 1; + + for (i = 0; i < num; i++) { + desc->req.data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(ops[i]->ldpc_enc.input.data, 0); + desc->req.data_ptrs[next_triplet].blen = in_length_in_bytes; + next_triplet++; + desc->req.data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset( + ops[i]->ldpc_enc.output.data, 0); + desc->req.data_ptrs[next_triplet].blen = out_length; + next_triplet++; + ops[i]->ldpc_enc.output.length = out_length; + output_head = output = ops[i]->ldpc_enc.output.data; + mbuf_append(output_head, output, out_length); + output->data_len = out_length; + } + + desc->req.op_addr = ops[0]; + /* Keep track of pointers even when multiplexed in single descriptor */ + struct acc_ptrs *context_ptrs = q->companion_ring_addr + desc_idx; + for (i = 0; i < num; i++) + context_ptrs->ptr[i].op_addr = ops[i]; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_le, + sizeof(desc->req.fcw_le) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + + /* One CB (one op) was successfully prepared to enqueue */ + return num; +} + +/* Enqueue one encode operations for ACC200 device for a partial TB + * all codes blocks have same configuration multiplexed on the same descriptor + */ +static inline void +enqueue_ldpc_enc_part_tb(struct acc_queue *q, struct rte_bbdev_enc_op *op, + uint16_t total_enqueued_descs, int16_t num_cbs, uint32_t e, + uint16_t in_len_B, uint32_t out_len_B, uint32_t *in_offset, + uint32_t *out_offset) +{ + + union acc_dma_desc *desc = NULL; + struct rte_mbuf *output_head, *output; + int i, next_triplet; + struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc; + + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_descs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + acc_fcw_le_fill(op, &desc->req.fcw_le, num_cbs, e); + + /** This could be done at polling */ + acc_header_init(&desc->req); + desc->req.numCBs = num_cbs; + + desc->req.m2dlen = 1 + num_cbs; + desc->req.d2mlen = num_cbs; + next_triplet = 1; + + for (i = 0; i < num_cbs; i++) { + desc->req.data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(enc->input.data, + *in_offset); + *in_offset += in_len_B; + desc->req.data_ptrs[next_triplet].blen = in_len_B; + next_triplet++; + desc->req.data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset( + enc->output.data, *out_offset); + *out_offset += out_len_B; + desc->req.data_ptrs[next_triplet].blen = out_len_B; + next_triplet++; + enc->output.length += out_len_B; + output_head = output = enc->output.data; + mbuf_append(output_head, output, out_len_B); + } + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_le, + sizeof(desc->req.fcw_le) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + +} + +/* Enqueue one encode operations for ACC200 device in CB mode */ +static inline int +enqueue_ldpc_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op *op, + uint16_t total_enqueued_cbs) +{ + union acc_dma_desc *desc = NULL; + int ret; + uint32_t in_offset, out_offset, out_length, mbuf_total_left, + seg_total_left; + struct rte_mbuf *input, *output_head, *output; + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + acc_fcw_le_fill(op, &desc->req.fcw_le, 1, 0); + + input = op->ldpc_enc.input.data; + output_head = output = op->ldpc_enc.output.data; + in_offset = op->ldpc_enc.input.offset; + out_offset = op->ldpc_enc.output.offset; + out_length = 0; + mbuf_total_left = op->ldpc_enc.input.length; + seg_total_left = rte_pktmbuf_data_len(op->ldpc_enc.input.data) + - in_offset; + + ret = acc200_dma_desc_le_fill(op, &desc->req, &input, output, + &in_offset, &out_offset, &out_length, &mbuf_total_left, + &seg_total_left); + + if (unlikely(ret < 0)) + return ret; + + mbuf_append(output_head, output, out_length); + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_le, + sizeof(desc->req.fcw_le) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); + + if (check_mbuf_total_left(mbuf_total_left) != 0) + return -EINVAL; +#endif + /* One CB (one op) was successfully prepared to enqueue */ + return 1; +} + +/* Enqueue one encode operations for ACC200 device in TB mode. + * returns the number of descs used + */ +static inline int +enqueue_ldpc_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op *op, + uint16_t enq_descs, uint8_t cbs_in_tb) +{ + uint8_t num_a, num_b; + uint16_t desc_idx; + uint8_t r = op->ldpc_enc.tb_params.r; + uint8_t cab = op->ldpc_enc.tb_params.cab; + union acc_dma_desc *desc; + uint16_t init_enq_descs = enq_descs; + uint16_t input_len_B = ((op->ldpc_enc.basegraph == 1 ? 22 : 10) * + op->ldpc_enc.z_c) >> 3; + if (check_bit(op->ldpc_enc.op_flags, RTE_BBDEV_LDPC_CRC_24B_ATTACH)) + input_len_B -= 3; + + if (r < cab) { + num_a = cab - r; + num_b = cbs_in_tb - cab; + } else { + num_a = 0; + num_b = cbs_in_tb - r; + } + uint32_t in_offset = 0, out_offset = 0; + + while (num_a > 0) { + uint32_t e = op->ldpc_enc.tb_params.ea; + uint32_t out_len_B = (e + 7) >> 3; + uint8_t enq = RTE_MIN(num_a, ACC_MUX_5GDL_DESC); + num_a -= enq; + enqueue_ldpc_enc_part_tb(q, op, enq_descs, enq, e, input_len_B, + out_len_B, &in_offset, &out_offset); + enq_descs++; + } + while (num_b > 0) { + uint32_t e = op->ldpc_enc.tb_params.eb; + uint32_t out_len_B = (e + 7) >> 3; + uint8_t enq = RTE_MIN(num_b, ACC_MUX_5GDL_DESC); + num_b -= enq; + enqueue_ldpc_enc_part_tb(q, op, enq_descs, enq, e, input_len_B, + out_len_B, &in_offset, &out_offset); + enq_descs++; + } + + uint16_t return_descs = enq_descs - init_enq_descs; + /* Keep total number of CBs in first TB */ + desc_idx = ((q->sw_ring_head + init_enq_descs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + desc->req.cbs_in_tb = return_descs; /** Actual number of descriptors */ + desc->req.op_addr = op; + + /* Set SDone on last CB descriptor for TB mode. */ + desc_idx = ((q->sw_ring_head + enq_descs - 1) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + desc->req.op_addr = op; + return return_descs; +} + +/** Enqueue one decode operations for ACC200 device in CB mode */ +static inline int +enqueue_ldpc_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op, + uint16_t total_enqueued_cbs, bool same_op) +{ + int ret, hq_len; + if (op->ldpc_dec.cb_params.e == 0) + return -EINVAL; + + union acc_dma_desc *desc; + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + struct rte_mbuf *input, *h_output_head, *h_output; + uint32_t in_offset, h_out_offset, mbuf_total_left, h_out_length = 0; + input = op->ldpc_dec.input.data; + h_output_head = h_output = op->ldpc_dec.hard_output.data; + in_offset = op->ldpc_dec.input.offset; + h_out_offset = op->ldpc_dec.hard_output.offset; + mbuf_total_left = op->ldpc_dec.input.length; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(input == NULL)) { + rte_bbdev_log(ERR, "Invalid mbuf pointer"); + return -EFAULT; + } +#endif + union acc_harq_layout_data *harq_layout = q->d->harq_layout; + + if (same_op) { + union acc_dma_desc *prev_desc; + desc_idx = ((q->sw_ring_head + total_enqueued_cbs - 1) + & q->sw_ring_wrap_mask); + prev_desc = q->ring_addr + desc_idx; + uint8_t *prev_ptr = (uint8_t *) prev_desc; + uint8_t *new_ptr = (uint8_t *) desc; + /* Copy first 4 words and BDESCs */ + rte_memcpy(new_ptr, prev_ptr, ACC_5GUL_SIZE_0); + rte_memcpy(new_ptr + ACC_5GUL_OFFSET_0, + prev_ptr + ACC_5GUL_OFFSET_0, + ACC_5GUL_SIZE_1); + desc->req.op_addr = prev_desc->req.op_addr; + /* Copy FCW */ + rte_memcpy(new_ptr + ACC_DESC_FCW_OFFSET, + prev_ptr + ACC_DESC_FCW_OFFSET, + ACC_FCW_LD_BLEN); + acc200_dma_desc_ld_update(op, &desc->req, input, h_output, + &in_offset, &h_out_offset, + &h_out_length, harq_layout); + } else { + struct acc_fcw_ld *fcw; + uint32_t seg_total_left; + fcw = &desc->req.fcw_ld; + acc200_fcw_ld_fill(op, fcw, harq_layout); + + /* Special handling when using mbuf or not */ + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_DEC_SCATTER_GATHER)) + seg_total_left = rte_pktmbuf_data_len(input) + - in_offset; + else + seg_total_left = fcw->rm_e; + + ret = acc200_dma_desc_ld_fill(op, &desc->req, &input, h_output, + &in_offset, &h_out_offset, + &h_out_length, &mbuf_total_left, + &seg_total_left, fcw); + if (unlikely(ret < 0)) + return ret; + } + + /* Hard output */ + mbuf_append(h_output_head, h_output, h_out_length); + if (op->ldpc_dec.harq_combined_output.length > 0) { + /* Push the HARQ output into host memory */ + struct rte_mbuf *hq_output_head, *hq_output; + hq_output_head = op->ldpc_dec.harq_combined_output.data; + hq_output = op->ldpc_dec.harq_combined_output.data; + hq_len = op->ldpc_dec.harq_combined_output.length; + if (unlikely(!mbuf_append(hq_output_head, hq_output, + hq_len))) { + rte_bbdev_log(ERR, "HARQ output mbuf issue %d %d\n", + hq_output->buf_len, + hq_len); + return -1; + } + } + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_ld, + sizeof(desc->req.fcw_ld) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + + /* One CB (one op) was successfully prepared to enqueue */ + return 1; +} + + +/* Enqueue one decode operations for ACC200 device in TB mode */ +static inline int +enqueue_ldpc_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op, + uint16_t total_enqueued_cbs, uint8_t cbs_in_tb) +{ + union acc_dma_desc *desc = NULL; + union acc_dma_desc *desc_first = NULL; + int ret; + uint8_t r, c; + uint32_t in_offset, h_out_offset, + h_out_length, mbuf_total_left, seg_total_left; + struct rte_mbuf *input, *h_output_head, *h_output; + uint16_t current_enqueued_cbs = 0; + uint16_t sys_cols, trail_len = 0; + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + desc_first = desc; + uint64_t fcw_offset = (desc_idx << 8) + ACC_DESC_FCW_OFFSET; + union acc_harq_layout_data *harq_layout = q->d->harq_layout; + acc200_fcw_ld_fill(op, &desc->req.fcw_ld, harq_layout); + + input = op->ldpc_dec.input.data; + h_output_head = h_output = op->ldpc_dec.hard_output.data; + in_offset = op->ldpc_dec.input.offset; + h_out_offset = op->ldpc_dec.hard_output.offset; + h_out_length = 0; + mbuf_total_left = op->ldpc_dec.input.length; + c = op->ldpc_dec.tb_params.c; + r = op->ldpc_dec.tb_params.r; + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK)) { + sys_cols = (op->ldpc_dec.basegraph == 1) ? 22 : 10; + trail_len = sys_cols * op->ldpc_dec.z_c - + op->ldpc_dec.n_filler - 24; + } + + while (mbuf_total_left > 0 && r < c) { + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_DEC_SCATTER_GATHER)) + seg_total_left = rte_pktmbuf_data_len(input) + - in_offset; + else + seg_total_left = op->ldpc_dec.input.length; + /* Set up DMA descriptor */ + desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + fcw_offset = (desc_idx << 8) + ACC_DESC_FCW_OFFSET; + desc->req.data_ptrs[0].address = q->ring_addr_iova + + fcw_offset; + desc->req.data_ptrs[0].blen = ACC_FCW_LD_BLEN; + rte_memcpy(&desc->req.fcw_ld, &desc_first->req.fcw_ld, + ACC_FCW_LD_BLEN); + desc->req.fcw_ld.tb_trailer_size = (c - r - 1) * trail_len; + + ret = acc200_dma_desc_ld_fill(op, &desc->req, &input, + h_output, &in_offset, &h_out_offset, + &h_out_length, + &mbuf_total_left, &seg_total_left, + &desc->req.fcw_ld); + + if (unlikely(ret < 0)) + return ret; + + /* Hard output */ + mbuf_append(h_output_head, h_output, h_out_length); + + /* Set total number of CBs in TB */ + desc->req.cbs_in_tb = cbs_in_tb; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_td, + sizeof(desc->req.fcw_td) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_DEC_SCATTER_GATHER) + && (seg_total_left == 0)) { + /* Go to the next mbuf */ + input = input->next; + in_offset = 0; + h_output = h_output->next; + h_out_offset = 0; + } + total_enqueued_cbs++; + current_enqueued_cbs++; + r++; + } + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (check_mbuf_total_left(mbuf_total_left) != 0) + return -EINVAL; +#endif + /* Set SDone on last CB descriptor for TB mode */ + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + return current_enqueued_cbs; +} + +/** Enqueue encode operations for ACC200 device in CB mode. */ +static inline uint16_t +acc200_enqueue_ldpc_enc_cb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + int32_t avail = acc_ring_avail_enq(q); + uint16_t i = 0; + union acc_dma_desc *desc; + int ret, desc_idx = 0; + int16_t enq, left = num; + + while (left > 0) { + if (unlikely(avail < 1)) { + acc_enqueue_ring_full(q_data); + break; + } + avail--; + enq = RTE_MIN(left, ACC_MUX_5GDL_DESC); + enq = check_mux(&ops[i], enq); + if (enq > 1) { + ret = enqueue_ldpc_enc_n_op_cb(q, &ops[i], + desc_idx, enq); + if (ret < 0) { + acc_enqueue_invalid(q_data); + break; + } + i += enq; + } else { + ret = enqueue_ldpc_enc_one_op_cb(q, ops[i], desc_idx); + if (ret < 0) { + acc_enqueue_invalid(q_data); + break; + } + i++; + } + desc_idx++; + left = num - i; + } + + if (unlikely(i == 0)) + return 0; /* Nothing to enqueue */ + + /* Set SDone in last CB in enqueued ops for CB mode*/ + desc = q->ring_addr + ((q->sw_ring_head + desc_idx - 1) + & q->sw_ring_wrap_mask); + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + acc_dma_enqueue(q, desc_idx, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + + return i; +} + +/* Enqueue LDPC encode operations for ACC200 device in TB mode. */ +static uint16_t +acc200_enqueue_ldpc_enc_tb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + int32_t avail = acc_ring_avail_enq(q); + uint16_t i, enqueued_descs = 0; + uint8_t cbs_in_tb; + int descs_used; + + for (i = 0; i < num; ++i) { + cbs_in_tb = get_num_cbs_in_tb_ldpc_enc(&ops[i]->ldpc_enc); + /* Check if there are available space for further processing */ + if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) { + acc_enqueue_ring_full(q_data); + break; + } + + descs_used = enqueue_ldpc_enc_one_op_tb(q, ops[i], + enqueued_descs, cbs_in_tb); + if (descs_used < 0) { + acc_enqueue_invalid(q_data); + break; + } + enqueued_descs += descs_used; + avail -= descs_used; + } + if (unlikely(enqueued_descs == 0)) + return 0; /* Nothing to enqueue */ + + acc_dma_enqueue(q, enqueued_descs, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + + return i; +} + +/* Check room in AQ for the enqueues batches into Qmgr */ +static int32_t +acc200_aq_avail(struct rte_bbdev_queue_data *q_data, uint16_t num_ops) +{ + struct acc_queue *q = q_data->queue_private; + int32_t aq_avail = q->aq_depth - + ((q->aq_enqueued - q->aq_dequeued + + ACC_MAX_QUEUE_DEPTH) % ACC_MAX_QUEUE_DEPTH) + - (num_ops >> 7); + if (aq_avail <= 0) + acc_enqueue_queue_full(q_data); + return aq_avail; +} + +/* Enqueue encode operations for ACC200 device. */ +static uint16_t +acc200_enqueue_ldpc_enc(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + int32_t aq_avail = acc_ring_avail_enq(q); + if (unlikely((aq_avail <= 0) || (num == 0))) + return 0; + if (ops[0]->ldpc_enc.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + return acc200_enqueue_ldpc_enc_tb(q_data, ops, num); + else + return acc200_enqueue_ldpc_enc_cb(q_data, ops, num); +} + +/* Enqueue decode operations for ACC200 device in TB mode */ +static uint16_t +acc200_enqueue_ldpc_dec_tb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + int32_t avail = acc_ring_avail_enq(q); + uint16_t i, enqueued_cbs = 0; + uint8_t cbs_in_tb; + int ret; + + for (i = 0; i < num; ++i) { + cbs_in_tb = get_num_cbs_in_tb_ldpc_dec(&ops[i]->ldpc_dec); + /* Check if there are available space for further processing */ + if (unlikely((avail - cbs_in_tb < 0) || + (cbs_in_tb == 0))) + break; + avail -= cbs_in_tb; + + ret = enqueue_ldpc_dec_one_op_tb(q, ops[i], + enqueued_cbs, cbs_in_tb); + if (ret <= 0) + break; + enqueued_cbs += ret; + } + + acc_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + return i; +} + +/* Enqueue decode operations for ACC200 device in CB mode */ +static uint16_t +acc200_enqueue_ldpc_dec_cb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + int32_t avail = acc_ring_avail_enq(q); + uint16_t i; + union acc_dma_desc *desc; + int ret; + bool same_op = false; + for (i = 0; i < num; ++i) { + /* Check if there are available space for further processing */ + if (unlikely(avail < 1)) { + acc_enqueue_ring_full(q_data); + break; + } + avail -= 1; +#ifdef ACC200_DESC_OPTIMIZATION + if (i > 0) + same_op = cmp_ldpc_dec_op(&ops[i-1]); +#endif + rte_bbdev_log(INFO, "Op %d %d %d %d %d %d %d %d %d %d %d %d\n", + i, ops[i]->ldpc_dec.op_flags, ops[i]->ldpc_dec.rv_index, + ops[i]->ldpc_dec.iter_max, ops[i]->ldpc_dec.iter_count, + ops[i]->ldpc_dec.basegraph, ops[i]->ldpc_dec.z_c, + ops[i]->ldpc_dec.n_cb, ops[i]->ldpc_dec.q_m, + ops[i]->ldpc_dec.n_filler, ops[i]->ldpc_dec.cb_params.e, + same_op); + ret = enqueue_ldpc_dec_one_op_cb(q, ops[i], i, same_op); + if (ret < 0) { + acc_enqueue_invalid(q_data); + break; + } + } + + if (unlikely(i == 0)) + return 0; /* Nothing to enqueue */ + + /* Set SDone in last CB in enqueued ops for CB mode*/ + desc = q->ring_addr + ((q->sw_ring_head + i - 1) + & q->sw_ring_wrap_mask); + + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + acc_dma_enqueue(q, i, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + return i; +} + +/* Enqueue decode operations for ACC200 device. */ +static uint16_t +acc200_enqueue_ldpc_dec(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + int32_t aq_avail = acc200_aq_avail(q_data, num); + if (unlikely((aq_avail <= 0) || (num == 0))) + return 0; + if (ops[0]->ldpc_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + return acc200_enqueue_ldpc_dec_tb(q_data, ops, num); + else + return acc200_enqueue_ldpc_dec_cb(q_data, ops, num); +} + + +/* Dequeue one encode operations from ACC200 device in CB mode + */ +static inline int +dequeue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op, + uint16_t *dequeued_ops, uint32_t *aq_dequeued, + uint16_t *dequeued_descs) +{ + union acc_dma_desc *desc, atom_desc; + union acc_dma_rsp_desc rsp; + struct rte_bbdev_enc_op *op; + int i; + int desc_idx = ((q->sw_ring_tail + *dequeued_descs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC_FDONE)) + return -1; + + rsp.val = atom_desc.rsp.val; + rte_bbdev_log_debug("Resp. desc %p: %x", desc, rsp.val); + + /* Dequeue */ + op = desc->req.op_addr; + + /* Clearing status, it will be set based on response */ + op->status = 0; + + op->status |= ((rsp.input_err) + ? (1 << RTE_BBDEV_DATA_ERROR) : 0); + op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + desc->rsp.val = ACC_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; /*Reserved bits */ + desc->rsp.add_info_1 = 0; /*Reserved bits */ + + ref_op[0] = op; + struct acc_ptrs *context_ptrs = q->companion_ring_addr + desc_idx; + for (i = 1 ; i < desc->req.numCBs; i++) + ref_op[i] = context_ptrs->ptr[i].op_addr; + + /* One op was successfully dequeued */ + (*dequeued_descs)++; + *dequeued_ops += desc->req.numCBs; + return desc->req.numCBs; +} + +/* Dequeue one LDPC encode operations from ACC200 device in TB mode + * That operation may cover multiple descriptors + */ +static inline int +dequeue_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op, + uint16_t *dequeued_ops, uint32_t *aq_dequeued, + uint16_t *dequeued_descs) +{ + union acc_dma_desc *desc, *last_desc, atom_desc; + union acc_dma_rsp_desc rsp; + struct rte_bbdev_enc_op *op; + uint8_t i = 0; + uint16_t current_dequeued_descs = 0, descs_in_tb; + + desc = q->ring_addr + ((q->sw_ring_tail + *dequeued_descs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC_FDONE)) + return -1; + + /* Get number of CBs in dequeued TB */ + descs_in_tb = desc->req.cbs_in_tb; + /* Get last CB */ + last_desc = q->ring_addr + ((q->sw_ring_tail + + *dequeued_descs + descs_in_tb - 1) + & q->sw_ring_wrap_mask); + /* Check if last CB in TB is ready to dequeue (and thus + * the whole TB) - checking sdone bit. If not return. + */ + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)last_desc, + __ATOMIC_RELAXED); + if (!(atom_desc.rsp.val & ACC_SDONE)) + return -1; + + /* Dequeue */ + op = desc->req.op_addr; + + /* Clearing status, it will be set based on response */ + op->status = 0; + + while (i < descs_in_tb) { + desc = q->ring_addr + ((q->sw_ring_tail + + *dequeued_descs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + rsp.val = atom_desc.rsp.val; + rte_bbdev_log_debug("Resp. desc %p: %x", desc, + rsp.val); + + op->status |= ((rsp.input_err) + ? (1 << RTE_BBDEV_DATA_ERROR) : 0); + op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + desc->rsp.val = ACC_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; + desc->rsp.add_info_1 = 0; + (*dequeued_descs)++; + current_dequeued_descs++; + i++; + } + + *ref_op = op; + (*dequeued_ops)++; + return current_dequeued_descs; +} + +/* Dequeue one decode operation from ACC200 device in CB mode */ +static inline int +dequeue_dec_one_op_cb(struct rte_bbdev_queue_data *q_data, + struct acc_queue *q, struct rte_bbdev_dec_op **ref_op, + uint16_t dequeued_cbs, uint32_t *aq_dequeued) +{ + union acc_dma_desc *desc, atom_desc; + union acc_dma_rsp_desc rsp; + struct rte_bbdev_dec_op *op; + + desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC_FDONE)) + return -1; + + rsp.val = atom_desc.rsp.val; + rte_bbdev_log_debug("Resp. desc %p: %x\n", desc, rsp.val); + + /* Dequeue */ + op = desc->req.op_addr; + + /* Clearing status, it will be set based on response */ + op->status = 0; + op->status |= ((rsp.input_err) + ? (1 << RTE_BBDEV_DATA_ERROR) : 0); + op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + if (op->status != 0) { + /* These errors are not expected */ + q_data->queue_stats.dequeue_err_count++; + } + + /* CRC invalid if error exists */ + if (!op->status) + op->status |= rsp.crc_status << RTE_BBDEV_CRC_ERROR; + op->turbo_dec.iter_count = (uint8_t) rsp.iter_cnt; + /* Check if this is the last desc in batch (Atomic Queue) */ + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + desc->rsp.val = ACC_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; + desc->rsp.add_info_1 = 0; + *ref_op = op; + + /* One CB (op) was successfully dequeued */ + return 1; +} + +/* Dequeue one decode operations from ACC200 device in CB mode */ +static inline int +dequeue_ldpc_dec_one_op_cb(struct rte_bbdev_queue_data *q_data, + struct acc_queue *q, struct rte_bbdev_dec_op **ref_op, + uint16_t dequeued_cbs, uint32_t *aq_dequeued) +{ + union acc_dma_desc *desc, atom_desc; + union acc_dma_rsp_desc rsp; + struct rte_bbdev_dec_op *op; + + desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC_FDONE)) + return -1; + + rsp.val = atom_desc.rsp.val; + rte_bbdev_log_debug("Resp. desc %p: %x %x %x\n", desc, + rsp.val, desc->rsp.add_info_0, + desc->rsp.add_info_1); + + /* Dequeue */ + op = desc->req.op_addr; + + /* Clearing status, it will be set based on response */ + op->status = 0; + op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR; + op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR; + op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR; + if (op->status != 0) + q_data->queue_stats.dequeue_err_count++; + + op->status |= rsp.crc_status << RTE_BBDEV_CRC_ERROR; + if (op->ldpc_dec.hard_output.length > 0 && !rsp.synd_ok) + op->status |= 1 << RTE_BBDEV_SYNDROME_ERROR; + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK) || + check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK)) { + if (desc->rsp.add_info_1 != 0) + op->status |= 1 << RTE_BBDEV_CRC_ERROR; + } + + op->ldpc_dec.iter_count = (uint8_t) rsp.iter_cnt; + + /* Check if this is the last desc in batch (Atomic Queue) */ + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + + desc->rsp.val = ACC_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; + desc->rsp.add_info_1 = 0; + + *ref_op = op; + + /* One CB (op) was successfully dequeued */ + return 1; +} + +/* Dequeue one decode operations from ACC200 device in TB mode. */ +static inline int +dequeue_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op **ref_op, + uint16_t dequeued_cbs, uint32_t *aq_dequeued) +{ + union acc_dma_desc *desc, *last_desc, atom_desc; + union acc_dma_rsp_desc rsp; + struct rte_bbdev_dec_op *op; + uint8_t cbs_in_tb = 1, cb_idx = 0; + uint32_t tb_crc_check = 0; + + desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC_FDONE)) + return -1; + + /* Dequeue */ + op = desc->req.op_addr; + + /* Get number of CBs in dequeued TB */ + cbs_in_tb = desc->req.cbs_in_tb; + /* Get last CB */ + last_desc = q->ring_addr + ((q->sw_ring_tail + + dequeued_cbs + cbs_in_tb - 1) + & q->sw_ring_wrap_mask); + /* Check if last CB in TB is ready to dequeue (and thus + * the whole TB) - checking sdone bit. If not return. + */ + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)last_desc, + __ATOMIC_RELAXED); + if (!(atom_desc.rsp.val & ACC_SDONE)) + return -1; + + /* Clearing status, it will be set based on response */ + op->status = 0; + + /* Read remaining CBs if exists */ + while (cb_idx < cbs_in_tb) { + desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + rsp.val = atom_desc.rsp.val; + rte_bbdev_log_debug("Resp. desc %p: %x %x %x", desc, + rsp.val, desc->rsp.add_info_0, + desc->rsp.add_info_1); + + op->status |= ((rsp.input_err) + ? (1 << RTE_BBDEV_DATA_ERROR) : 0); + op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK)) + tb_crc_check ^= desc->rsp.add_info_1; + + /* CRC invalid if error exists */ + if (!op->status) + op->status |= rsp.crc_status << RTE_BBDEV_CRC_ERROR; + op->turbo_dec.iter_count = RTE_MAX((uint8_t) rsp.iter_cnt, + op->turbo_dec.iter_count); + + /* Check if this is the last desc in batch (Atomic Queue) */ + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + desc->rsp.val = ACC_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; + desc->rsp.add_info_1 = 0; + dequeued_cbs++; + cb_idx++; + } + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK)) { + rte_bbdev_log_debug("TB-CRC Check %x\n", tb_crc_check); + if (tb_crc_check > 0) + op->status |= 1 << RTE_BBDEV_CRC_ERROR; + } + + *ref_op = op; + + return cb_idx; +} + +/* Dequeue LDPC encode operations from ACC200 device. */ +static uint16_t +acc200_dequeue_ldpc_enc(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + uint32_t avail = acc_ring_avail_deq(q); + uint32_t aq_dequeued = 0; + uint16_t i, dequeued_ops = 0, dequeued_descs = 0; + int ret; + struct rte_bbdev_enc_op *op; + if (avail == 0) + return 0; + op = (q->ring_addr + (q->sw_ring_tail & + q->sw_ring_wrap_mask))->req.op_addr; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(ops == NULL || q == NULL || op == NULL)) + return 0; +#endif + int cbm = op->ldpc_enc.code_block_mode; + + for (i = 0; i < avail; i++) { + if (cbm == RTE_BBDEV_TRANSPORT_BLOCK) + ret = dequeue_enc_one_op_tb(q, &ops[dequeued_ops], + &dequeued_ops, &aq_dequeued, + &dequeued_descs); + else + ret = dequeue_enc_one_op_cb(q, &ops[dequeued_ops], + &dequeued_ops, &aq_dequeued, + &dequeued_descs); + if (ret < 0) + break; + if (dequeued_ops >= num) + break; + } + + q->aq_dequeued += aq_dequeued; + q->sw_ring_tail += dequeued_descs; + + /* Update enqueue stats */ + q_data->queue_stats.dequeued_count += dequeued_ops; + + return dequeued_ops; +} + +/* Dequeue decode operations from ACC200 device. */ +static uint16_t +acc200_dequeue_ldpc_dec(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + uint16_t dequeue_num; + uint32_t avail = acc_ring_avail_deq(q); + uint32_t aq_dequeued = 0; + uint16_t i; + uint16_t dequeued_cbs = 0; + struct rte_bbdev_dec_op *op; + int ret; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(ops == 0 && q == NULL)) + return 0; +#endif + + dequeue_num = RTE_MIN(avail, num); + + for (i = 0; i < dequeue_num; ++i) { + op = (q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask))->req.op_addr; + if (op->ldpc_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + ret = dequeue_dec_one_op_tb(q, &ops[i], dequeued_cbs, + &aq_dequeued); + else + ret = dequeue_ldpc_dec_one_op_cb( + q_data, q, &ops[i], dequeued_cbs, + &aq_dequeued); + + if (ret <= 0) + break; + dequeued_cbs += ret; + } + + q->aq_dequeued += aq_dequeued; + q->sw_ring_tail += dequeued_cbs; + + /* Update enqueue stats */ + q_data->queue_stats.dequeued_count += i; + + return i; +} + /* Initialization Function */ static void acc200_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv) @@ -677,6 +2191,10 @@ struct rte_pci_device *pci_dev = RTE_DEV_TO_PCI(dev->device); dev->dev_ops = &acc200_bbdev_ops; + dev->enqueue_ldpc_enc_ops = acc200_enqueue_ldpc_enc; + dev->enqueue_ldpc_dec_ops = acc200_enqueue_ldpc_dec; + dev->dequeue_ldpc_enc_ops = acc200_dequeue_ldpc_enc; + dev->dequeue_ldpc_dec_ops = acc200_dequeue_ldpc_dec; ((struct acc_device *) dev->data->dev_private)->pf_device = !strcmp(drv->driver.name, -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 07/11] baseband/acc200: add LTE processing functions 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru ` (5 preceding siblings ...) 2022-09-12 1:08 ` [PATCH v2 06/11] baseband/acc200: add LDPC processing functions Nic Chautru @ 2022-09-12 1:08 ` Nic Chautru 2022-09-12 1:08 ` [PATCH v2 08/11] baseband/acc200: add support for FFT operations Nic Chautru ` (3 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nicolas Chautru From: Nicolas Chautru <nicolas.chautru@intel.com> Add functions and capability for 4G FEC Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/rte_acc200_pmd.c | 894 ++++++++++++++++++++++++++++++- 1 file changed, 874 insertions(+), 20 deletions(-) diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index def5ed7..9c3388f 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -641,6 +641,46 @@ int i; static const struct rte_bbdev_op_cap bbdev_capabilities[] = { { + .type = RTE_BBDEV_OP_TURBO_DEC, + .cap.turbo_dec = { + .capability_flags = + RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE | + RTE_BBDEV_TURBO_CRC_TYPE_24B | + RTE_BBDEV_TURBO_EQUALIZER | + RTE_BBDEV_TURBO_SOFT_OUT_SATURATE | + RTE_BBDEV_TURBO_HALF_ITERATION_EVEN | + RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH | + RTE_BBDEV_TURBO_SOFT_OUTPUT | + RTE_BBDEV_TURBO_EARLY_TERMINATION | + RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN | + RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT | + RTE_BBDEV_TURBO_MAP_DEC | + RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP | + RTE_BBDEV_TURBO_DEC_SCATTER_GATHER, + .max_llr_modulus = INT8_MAX, + .num_buffers_src = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_hard_out = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_soft_out = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + } + }, + { + .type = RTE_BBDEV_OP_TURBO_ENC, + .cap.turbo_enc = { + .capability_flags = + RTE_BBDEV_TURBO_CRC_24B_ATTACH | + RTE_BBDEV_TURBO_RV_INDEX_BYPASS | + RTE_BBDEV_TURBO_RATE_MATCH | + RTE_BBDEV_TURBO_ENC_SCATTER_GATHER, + .num_buffers_src = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_dst = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + } + }, + { .type = RTE_BBDEV_OP_LDPC_ENC, .cap.ldpc_enc = { .capability_flags = @@ -691,15 +731,17 @@ /* Exposed number of queues */ dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; - dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = 0; - dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = d->acc_conf.q_ul_4g.num_aqs_per_groups * + d->acc_conf.q_ul_4g.num_qgroups; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = d->acc_conf.q_dl_4g.num_aqs_per_groups * + d->acc_conf.q_dl_4g.num_qgroups; dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = d->acc_conf.q_ul_5g.num_aqs_per_groups * d->acc_conf.q_ul_5g.num_qgroups; dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = d->acc_conf.q_dl_5g.num_aqs_per_groups * d->acc_conf.q_dl_5g.num_qgroups; dev_info->num_queues[RTE_BBDEV_OP_FFT] = 0; - dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = 0; - dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = d->acc_conf.q_ul_4g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = d->acc_conf.q_dl_4g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = d->acc_conf.q_ul_5g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = d->acc_conf.q_dl_5g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_FFT] = 0; @@ -744,6 +786,70 @@ {.device_id = 0}, }; +/* Fill in a frame control word for turbo decoding. */ +static inline void +acc200_fcw_td_fill(const struct rte_bbdev_dec_op *op, struct acc_fcw_td *fcw) +{ + fcw->fcw_ver = 1; + fcw->num_maps = ACC_FCW_TD_AUTOMAP; + fcw->bypass_sb_deint = !check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE); + if (op->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { + /* FIXME for TB block */ + fcw->k_pos = op->turbo_dec.tb_params.k_pos; + fcw->k_neg = op->turbo_dec.tb_params.k_neg; + } else { + fcw->k_pos = op->turbo_dec.cb_params.k; + fcw->k_neg = op->turbo_dec.cb_params.k; + } + fcw->c = 1; + fcw->c_neg = 1; + if (check_bit(op->turbo_dec.op_flags, RTE_BBDEV_TURBO_SOFT_OUTPUT)) { + fcw->soft_output_en = 1; + fcw->sw_soft_out_dis = 0; + fcw->sw_et_cont = check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH); + fcw->sw_soft_out_saturation = check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_SOFT_OUT_SATURATE); + if (check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_EQUALIZER)) { + fcw->bypass_teq = 0; + fcw->ea = op->turbo_dec.cb_params.e; + fcw->eb = op->turbo_dec.cb_params.e; + if (op->turbo_dec.rv_index == 0) + fcw->k0_start_col = ACC_FCW_TD_RVIDX_0; + else if (op->turbo_dec.rv_index == 1) + fcw->k0_start_col = ACC_FCW_TD_RVIDX_1; + else if (op->turbo_dec.rv_index == 2) + fcw->k0_start_col = ACC_FCW_TD_RVIDX_2; + else + fcw->k0_start_col = ACC_FCW_TD_RVIDX_3; + } else { + fcw->bypass_teq = 1; + fcw->eb = 64; /* avoid undefined value */ + } + } else { + fcw->soft_output_en = 0; + fcw->sw_soft_out_dis = 1; + fcw->bypass_teq = 0; + } + + fcw->code_block_mode = 1; /* FIXME */ + fcw->turbo_crc_type = check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_CRC_TYPE_24B); + + fcw->ext_td_cold_reg_en = 1; + fcw->raw_decoder_input_on = 0; + fcw->max_iter = RTE_MAX((uint8_t) op->turbo_dec.iter_max, 2); + fcw->min_iter = 2; + fcw->half_iter_on = !check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_HALF_ITERATION_EVEN); + + fcw->early_stop_en = check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_EARLY_TERMINATION) & !fcw->soft_output_en; + fcw->ext_scale = 0xF; +} + /* Fill in a frame control word for LDPC decoding. */ static inline void acc200_fcw_ld_fill(struct rte_bbdev_dec_op *op, struct acc_fcw_ld *fcw, @@ -870,6 +976,89 @@ } static inline int +acc200_dma_desc_te_fill(struct rte_bbdev_enc_op *op, + struct acc_dma_req_desc *desc, struct rte_mbuf **input, + struct rte_mbuf *output, uint32_t *in_offset, + uint32_t *out_offset, uint32_t *out_length, + uint32_t *mbuf_total_left, uint32_t *seg_total_left, uint8_t r) +{ + int next_triplet = 1; /* FCW already done */ + uint32_t e, ea, eb, length; + uint16_t k, k_neg, k_pos; + uint8_t cab, c_neg; + + desc->word0 = ACC_DMA_DESC_TYPE; + desc->word1 = 0; /**< Timestamp could be disabled */ + desc->word2 = 0; + desc->word3 = 0; + desc->numCBs = 1; + + if (op->turbo_enc.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { + ea = op->turbo_enc.tb_params.ea; + eb = op->turbo_enc.tb_params.eb; + cab = op->turbo_enc.tb_params.cab; + k_neg = op->turbo_enc.tb_params.k_neg; + k_pos = op->turbo_enc.tb_params.k_pos; + c_neg = op->turbo_enc.tb_params.c_neg; + e = (r < cab) ? ea : eb; + k = (r < c_neg) ? k_neg : k_pos; + } else { + e = op->turbo_enc.cb_params.e; + k = op->turbo_enc.cb_params.k; + } + + if (check_bit(op->turbo_enc.op_flags, RTE_BBDEV_TURBO_CRC_24B_ATTACH)) + length = (k - 24) >> 3; + else + length = k >> 3; + + if (unlikely((*mbuf_total_left == 0) || (*mbuf_total_left < length))) { + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u", + *mbuf_total_left, length); + return -1; + } + + next_triplet = acc_dma_fill_blk_type_in(desc, input, in_offset, + length, seg_total_left, next_triplet, + check_bit(op->turbo_enc.op_flags, + RTE_BBDEV_TURBO_ENC_SCATTER_GATHER)); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + desc->data_ptrs[next_triplet - 1].last = 1; + desc->m2dlen = next_triplet; + *mbuf_total_left -= length; + + /* Set output length */ + if (check_bit(op->turbo_enc.op_flags, RTE_BBDEV_TURBO_RATE_MATCH)) + /* Integer round up division by 8 */ + *out_length = (e + 7) >> 3; + else + *out_length = (k >> 3) * 3 + 2; + + next_triplet = acc_dma_fill_blk_type(desc, output, *out_offset, + *out_length, next_triplet, ACC_DMA_BLKID_OUT_ENC); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + op->turbo_enc.output.length += *out_length; + *out_offset += *out_length; + desc->data_ptrs[next_triplet - 1].last = 1; + desc->d2mlen = next_triplet - desc->m2dlen; + + desc->op_addr = op; + + return 0; +} + +static inline int acc200_dma_desc_le_fill(struct rte_bbdev_enc_op *op, struct acc_dma_req_desc *desc, struct rte_mbuf **input, struct rte_mbuf *output, uint32_t *in_offset, @@ -929,6 +1118,122 @@ } static inline int +acc200_dma_desc_td_fill(struct rte_bbdev_dec_op *op, + struct acc_dma_req_desc *desc, struct rte_mbuf **input, + struct rte_mbuf *h_output, struct rte_mbuf *s_output, + uint32_t *in_offset, uint32_t *h_out_offset, + uint32_t *s_out_offset, uint32_t *h_out_length, + uint32_t *s_out_length, uint32_t *mbuf_total_left, + uint32_t *seg_total_left, uint8_t r) +{ + int next_triplet = 1; /* FCW already done */ + uint16_t k; + uint16_t crc24_overlap = 0; + uint32_t e, kw; + + desc->word0 = ACC_DMA_DESC_TYPE; + desc->word1 = 0; /**< Timestamp could be disabled */ + desc->word2 = 0; + desc->word3 = 0; + desc->numCBs = 1; + + if (op->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { + k = (r < op->turbo_dec.tb_params.c_neg) + ? op->turbo_dec.tb_params.k_neg + : op->turbo_dec.tb_params.k_pos; + e = (r < op->turbo_dec.tb_params.cab) + ? op->turbo_dec.tb_params.ea + : op->turbo_dec.tb_params.eb; + } else { + k = op->turbo_dec.cb_params.k; + e = op->turbo_dec.cb_params.e; + } + + if ((op->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + && !check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP)) + crc24_overlap = 24; + + /* Calculates circular buffer size. + * According to 3gpp 36.212 section 5.1.4.2 + * Kw = 3 * Kpi, + * where: + * Kpi = nCol * nRow + * where nCol is 32 and nRow can be calculated from: + * D =< nCol * nRow + * where D is the size of each output from turbo encoder block (k + 4). + */ + kw = RTE_ALIGN_CEIL(k + 4, 32) * 3; + + if (unlikely((*mbuf_total_left == 0) || (*mbuf_total_left < kw))) { + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u", + *mbuf_total_left, kw); + return -1; + } + + next_triplet = acc_dma_fill_blk_type_in(desc, input, in_offset, kw, + seg_total_left, next_triplet, + check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_DEC_SCATTER_GATHER)); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + desc->data_ptrs[next_triplet - 1].last = 1; + desc->m2dlen = next_triplet; + *mbuf_total_left -= kw; + *h_out_length = ((k - crc24_overlap) >> 3); + next_triplet = acc_dma_fill_blk_type( + desc, h_output, *h_out_offset, + *h_out_length, next_triplet, ACC_DMA_BLKID_OUT_HARD); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + + op->turbo_dec.hard_output.length += *h_out_length; + *h_out_offset += *h_out_length; + + /* Soft output */ + if (check_bit(op->turbo_dec.op_flags, RTE_BBDEV_TURBO_SOFT_OUTPUT)) { + if (op->turbo_dec.soft_output.data == 0) { + rte_bbdev_log(ERR, "Soft output is not defined"); + return -1; + } + if (check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_EQUALIZER)) + *s_out_length = e; + else + *s_out_length = (k * 3) + 12; + + next_triplet = acc_dma_fill_blk_type(desc, s_output, + *s_out_offset, *s_out_length, next_triplet, + ACC_DMA_BLKID_OUT_SOFT); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + + op->turbo_dec.soft_output.length += *s_out_length; + *s_out_offset += *s_out_length; + } + + desc->data_ptrs[next_triplet - 1].last = 1; + desc->d2mlen = next_triplet - desc->m2dlen; + + desc->op_addr = op; + + return 0; +} + +static inline int acc200_dma_desc_ld_fill(struct rte_bbdev_dec_op *op, struct acc_dma_req_desc *desc, struct rte_mbuf **input, struct rte_mbuf *h_output, @@ -1100,6 +1405,51 @@ desc->op_addr = op; } +/* Enqueue one encode operations for ACC200 device in CB mode */ +static inline int +enqueue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op *op, + uint16_t total_enqueued_cbs) +{ + union acc_dma_desc *desc = NULL; + int ret; + uint32_t in_offset, out_offset, out_length, mbuf_total_left, + seg_total_left; + struct rte_mbuf *input, *output_head, *output; + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + acc_fcw_te_fill(op, &desc->req.fcw_te); + + input = op->turbo_enc.input.data; + output_head = output = op->turbo_enc.output.data; + in_offset = op->turbo_enc.input.offset; + out_offset = op->turbo_enc.output.offset; + out_length = 0; + mbuf_total_left = op->turbo_enc.input.length; + seg_total_left = rte_pktmbuf_data_len(op->turbo_enc.input.data) + - in_offset; + + ret = acc200_dma_desc_te_fill(op, &desc->req, &input, output, + &in_offset, &out_offset, &out_length, &mbuf_total_left, + &seg_total_left, 0); + + if (unlikely(ret < 0)) + return ret; + + mbuf_append(output_head, output, out_length); + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_te, + sizeof(desc->req.fcw_te) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); + if (check_mbuf_total_left(mbuf_total_left) != 0) + return -EINVAL; +#endif + /* One CB (one op) was successfully prepared to enqueue */ + return 1; +} + /* Enqueue one encode operations for ACC200 device in CB mode * multiplexed on the same descriptor */ @@ -1262,6 +1612,84 @@ return 1; } + +/* Enqueue one encode operations for ACC200 device in TB mode. */ +static inline int +enqueue_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op *op, + uint16_t total_enqueued_cbs, uint8_t cbs_in_tb) +{ + union acc_dma_desc *desc = NULL; + int ret; + uint8_t r, c; + uint32_t in_offset, out_offset, out_length, mbuf_total_left, + seg_total_left; + struct rte_mbuf *input, *output_head, *output; + uint16_t current_enqueued_cbs = 0; + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + uint64_t fcw_offset = (desc_idx << 8) + ACC_DESC_FCW_OFFSET; + acc_fcw_te_fill(op, &desc->req.fcw_te); + + input = op->turbo_enc.input.data; + output_head = output = op->turbo_enc.output.data; + in_offset = op->turbo_enc.input.offset; + out_offset = op->turbo_enc.output.offset; + out_length = 0; + mbuf_total_left = op->turbo_enc.input.length; + + c = op->turbo_enc.tb_params.c; + r = op->turbo_enc.tb_params.r; + + while (mbuf_total_left > 0 && r < c) { + seg_total_left = rte_pktmbuf_data_len(input) - in_offset; + /* Set up DMA descriptor */ + desc = q->ring_addr + ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc->req.data_ptrs[0].address = q->ring_addr_iova + fcw_offset; + desc->req.data_ptrs[0].blen = ACC_FCW_TE_BLEN; + + ret = acc200_dma_desc_te_fill(op, &desc->req, &input, output, + &in_offset, &out_offset, &out_length, + &mbuf_total_left, &seg_total_left, r); + if (unlikely(ret < 0)) + return ret; + mbuf_append(output_head, output, out_length); + + /* Set total number of CBs in TB */ + desc->req.cbs_in_tb = cbs_in_tb; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_te, + sizeof(desc->req.fcw_te) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + + if (seg_total_left == 0) { + /* Go to the next mbuf */ + input = input->next; + in_offset = 0; + output = output->next; + out_offset = 0; + } + + total_enqueued_cbs++; + current_enqueued_cbs++; + r++; + } + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (check_mbuf_total_left(mbuf_total_left) != 0) + return -EINVAL; +#endif + + /* Set SDone on last CB descriptor for TB mode. */ + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + return current_enqueued_cbs; +} + /* Enqueue one encode operations for ACC200 device in TB mode. * returns the number of descs used */ @@ -1328,6 +1756,69 @@ /** Enqueue one decode operations for ACC200 device in CB mode */ static inline int +enqueue_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op, + uint16_t total_enqueued_cbs) +{ + union acc_dma_desc *desc = NULL; + int ret; + uint32_t in_offset, h_out_offset, s_out_offset, s_out_length, + h_out_length, mbuf_total_left, seg_total_left; + struct rte_mbuf *input, *h_output_head, *h_output, + *s_output_head, *s_output; + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + acc200_fcw_td_fill(op, &desc->req.fcw_td); + + input = op->turbo_dec.input.data; + h_output_head = h_output = op->turbo_dec.hard_output.data; + s_output_head = s_output = op->turbo_dec.soft_output.data; + in_offset = op->turbo_dec.input.offset; + h_out_offset = op->turbo_dec.hard_output.offset; + s_out_offset = op->turbo_dec.soft_output.offset; + h_out_length = s_out_length = 0; + mbuf_total_left = op->turbo_dec.input.length; + seg_total_left = rte_pktmbuf_data_len(input) - in_offset; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(input == NULL)) { + rte_bbdev_log(ERR, "Invalid mbuf pointer"); + return -EFAULT; + } +#endif + + /* Set up DMA descriptor */ + desc = q->ring_addr + ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + + ret = acc200_dma_desc_td_fill(op, &desc->req, &input, h_output, + s_output, &in_offset, &h_out_offset, &s_out_offset, + &h_out_length, &s_out_length, &mbuf_total_left, + &seg_total_left, 0); + + if (unlikely(ret < 0)) + return ret; + + /* Hard output */ + mbuf_append(h_output_head, h_output, h_out_length); + + /* Soft output */ + if (check_bit(op->turbo_dec.op_flags, RTE_BBDEV_TURBO_SOFT_OUTPUT)) + mbuf_append(s_output_head, s_output, s_out_length); + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_td, + sizeof(desc->req.fcw_td)); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + + /* One CB (one op) was successfully prepared to enqueue */ + return 1; +} + +/** Enqueue one decode operations for ACC200 device in CB mode */ +static inline int enqueue_ldpc_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op, uint16_t total_enqueued_cbs, bool same_op) { @@ -1525,10 +2016,147 @@ return current_enqueued_cbs; } -/** Enqueue encode operations for ACC200 device in CB mode. */ -static inline uint16_t -acc200_enqueue_ldpc_enc_cb(struct rte_bbdev_queue_data *q_data, - struct rte_bbdev_enc_op **ops, uint16_t num) +/* Enqueue one decode operations for ACC200 device in TB mode */ +static inline int +enqueue_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op, + uint16_t total_enqueued_cbs, uint8_t cbs_in_tb) +{ + union acc_dma_desc *desc = NULL; + int ret; + uint8_t r, c; + uint32_t in_offset, h_out_offset, s_out_offset, s_out_length, + h_out_length, mbuf_total_left, seg_total_left; + struct rte_mbuf *input, *h_output_head, *h_output, + *s_output_head, *s_output; + uint16_t current_enqueued_cbs = 0; + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + uint64_t fcw_offset = (desc_idx << 8) + ACC_DESC_FCW_OFFSET; + acc200_fcw_td_fill(op, &desc->req.fcw_td); + + input = op->turbo_dec.input.data; + h_output_head = h_output = op->turbo_dec.hard_output.data; + s_output_head = s_output = op->turbo_dec.soft_output.data; + in_offset = op->turbo_dec.input.offset; + h_out_offset = op->turbo_dec.hard_output.offset; + s_out_offset = op->turbo_dec.soft_output.offset; + h_out_length = s_out_length = 0; + mbuf_total_left = op->turbo_dec.input.length; + c = op->turbo_dec.tb_params.c; + r = op->turbo_dec.tb_params.r; + + while (mbuf_total_left > 0 && r < c) { + + seg_total_left = rte_pktmbuf_data_len(input) - in_offset; + + /* Set up DMA descriptor */ + desc = q->ring_addr + ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc->req.data_ptrs[0].address = q->ring_addr_iova + fcw_offset; + desc->req.data_ptrs[0].blen = ACC_FCW_TD_BLEN; + ret = acc200_dma_desc_td_fill(op, &desc->req, &input, + h_output, s_output, &in_offset, &h_out_offset, + &s_out_offset, &h_out_length, &s_out_length, + &mbuf_total_left, &seg_total_left, r); + + if (unlikely(ret < 0)) + return ret; + + /* Hard output */ + mbuf_append(h_output_head, h_output, h_out_length); + + /* Soft output */ + if (check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_SOFT_OUTPUT)) + mbuf_append(s_output_head, s_output, s_out_length); + + /* Set total number of CBs in TB */ + desc->req.cbs_in_tb = cbs_in_tb; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_td, + sizeof(desc->req.fcw_td) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + + if (seg_total_left == 0) { + /* Go to the next mbuf */ + input = input->next; + in_offset = 0; + h_output = h_output->next; + h_out_offset = 0; + + if (check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_SOFT_OUTPUT)) { + s_output = s_output->next; + s_out_offset = 0; + } + } + + total_enqueued_cbs++; + current_enqueued_cbs++; + r++; + } + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (check_mbuf_total_left(mbuf_total_left) != 0) + return -EINVAL; +#endif + /* Set SDone on last CB descriptor for TB mode */ + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + return current_enqueued_cbs; +} + +/* Enqueue encode operations for ACC200 device in CB mode. */ +static uint16_t +acc200_enqueue_enc_cb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + int32_t avail = acc_ring_avail_enq(q); + uint16_t i; + union acc_dma_desc *desc; + int ret; + + for (i = 0; i < num; ++i) { + /* Check if there are available space for further processing */ + if (unlikely(avail - 1 < 0)) { + acc_enqueue_ring_full(q_data); + break; + } + avail -= 1; + + ret = enqueue_enc_one_op_cb(q, ops[i], i); + if (ret < 0) { + acc_enqueue_invalid(q_data); + break; + } + } + + if (unlikely(i == 0)) + return 0; /* Nothing to enqueue */ + + /* Set SDone in last CB in enqueued ops for CB mode*/ + desc = q->ring_addr + ((q->sw_ring_head + i - 1) + & q->sw_ring_wrap_mask); + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + acc_dma_enqueue(q, i, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + return i; +} + +/** Enqueue encode operations for ACC200 device in CB mode. */ +static inline uint16_t +acc200_enqueue_ldpc_enc_cb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) { struct acc_queue *q = q_data->queue_private; int32_t avail = acc_ring_avail_enq(q); @@ -1583,6 +2211,45 @@ return i; } +/* Enqueue encode operations for ACC200 device in TB mode. */ +static uint16_t +acc200_enqueue_enc_tb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + int32_t avail = acc_ring_avail_enq(q); + uint16_t i, enqueued_cbs = 0; + uint8_t cbs_in_tb; + int ret; + + for (i = 0; i < num; ++i) { + cbs_in_tb = get_num_cbs_in_tb_enc(&ops[i]->turbo_enc); + /* Check if there are available space for further processing */ + if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) { + acc_enqueue_ring_full(q_data); + break; + } + avail -= cbs_in_tb; + + ret = enqueue_enc_one_op_tb(q, ops[i], enqueued_cbs, cbs_in_tb); + if (ret <= 0) { + acc_enqueue_invalid(q_data); + break; + } + enqueued_cbs += ret; + } + if (unlikely(enqueued_cbs == 0)) + return 0; /* Nothing to enqueue */ + + acc_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + + return i; +} + /* Enqueue LDPC encode operations for ACC200 device in TB mode. */ static uint16_t acc200_enqueue_ldpc_enc_tb(struct rte_bbdev_queue_data *q_data, @@ -1623,18 +2290,18 @@ return i; } -/* Check room in AQ for the enqueues batches into Qmgr */ -static int32_t -acc200_aq_avail(struct rte_bbdev_queue_data *q_data, uint16_t num_ops) +/* Enqueue encode operations for ACC200 device. */ +static uint16_t +acc200_enqueue_enc(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) { - struct acc_queue *q = q_data->queue_private; - int32_t aq_avail = q->aq_depth - - ((q->aq_enqueued - q->aq_dequeued + - ACC_MAX_QUEUE_DEPTH) % ACC_MAX_QUEUE_DEPTH) - - (num_ops >> 7); - if (aq_avail <= 0) - acc_enqueue_queue_full(q_data); - return aq_avail; + int32_t aq_avail = acc_aq_avail(q_data, num); + if (unlikely((aq_avail <= 0) || (num == 0))) + return 0; + if (ops[0]->turbo_enc.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + return acc200_enqueue_enc_tb(q_data, ops, num); + else + return acc200_enqueue_enc_cb(q_data, ops, num); } /* Enqueue encode operations for ACC200 device. */ @@ -1652,6 +2319,47 @@ return acc200_enqueue_ldpc_enc_cb(q_data, ops, num); } + +/* Enqueue decode operations for ACC200 device in CB mode */ +static uint16_t +acc200_enqueue_dec_cb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + int32_t avail = acc_ring_avail_enq(q); + uint16_t i; + union acc_dma_desc *desc; + int ret; + + for (i = 0; i < num; ++i) { + /* Check if there are available space for further processing */ + if (unlikely(avail - 1 < 0)) + break; + avail -= 1; + + ret = enqueue_dec_one_op_cb(q, ops[i], i); + if (ret < 0) + break; + } + + if (unlikely(i == 0)) + return 0; /* Nothing to enqueue */ + + /* Set SDone in last CB in enqueued ops for CB mode*/ + desc = q->ring_addr + ((q->sw_ring_head + i - 1) + & q->sw_ring_wrap_mask); + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + acc_dma_enqueue(q, i, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + + return i; +} + /* Enqueue decode operations for ACC200 device in TB mode */ static uint16_t acc200_enqueue_ldpc_dec_tb(struct rte_bbdev_queue_data *q_data, @@ -1740,12 +2448,64 @@ return i; } + +/* Enqueue decode operations for ACC200 device in TB mode */ +static uint16_t +acc200_enqueue_dec_tb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + int32_t avail = acc_ring_avail_enq(q); + uint16_t i, enqueued_cbs = 0; + uint8_t cbs_in_tb; + int ret; + + for (i = 0; i < num; ++i) { + cbs_in_tb = get_num_cbs_in_tb_dec(&ops[i]->turbo_dec); + /* Check if there are available space for further processing */ + if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) { + acc_enqueue_ring_full(q_data); + break; + } + avail -= cbs_in_tb; + + ret = enqueue_dec_one_op_tb(q, ops[i], enqueued_cbs, cbs_in_tb); + if (ret <= 0) { + acc_enqueue_invalid(q_data); + break; + } + enqueued_cbs += ret; + } + + acc_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + + return i; +} + +/* Enqueue decode operations for ACC200 device. */ +static uint16_t +acc200_enqueue_dec(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + int32_t aq_avail = acc_aq_avail(q_data, num); + if (unlikely((aq_avail <= 0) || (num == 0))) + return 0; + if (ops[0]->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + return acc200_enqueue_dec_tb(q_data, ops, num); + else + return acc200_enqueue_dec_cb(q_data, ops, num); +} + /* Enqueue decode operations for ACC200 device. */ static uint16_t acc200_enqueue_ldpc_dec(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t num) { - int32_t aq_avail = acc200_aq_avail(q_data, num); + int32_t aq_avail = acc_aq_avail(q_data, num); if (unlikely((aq_avail <= 0) || (num == 0))) return 0; if (ops[0]->ldpc_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) @@ -2093,6 +2853,51 @@ return cb_idx; } +/* Dequeue encode operations from ACC200 device. */ +static uint16_t +acc200_dequeue_enc(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + uint32_t avail = acc_ring_avail_deq(q); + uint32_t aq_dequeued = 0; + uint16_t i, dequeued_ops = 0, dequeued_descs = 0; + int ret; + struct rte_bbdev_enc_op *op; + if (avail == 0) + return 0; + op = (q->ring_addr + (q->sw_ring_tail & + q->sw_ring_wrap_mask))->req.op_addr; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(ops == NULL || q == NULL || op == NULL)) + return 0; +#endif + int cbm = op->turbo_enc.code_block_mode; + + for (i = 0; i < num; i++) { + if (cbm == RTE_BBDEV_TRANSPORT_BLOCK) + ret = dequeue_enc_one_op_tb(q, &ops[dequeued_ops], + &dequeued_ops, &aq_dequeued, + &dequeued_descs); + else + ret = dequeue_enc_one_op_cb(q, &ops[dequeued_ops], + &dequeued_ops, &aq_dequeued, + &dequeued_descs); + if (ret < 0) + break; + if (dequeued_ops >= num) + break; + } + + q->aq_dequeued += aq_dequeued; + q->sw_ring_tail += dequeued_descs; + + /* Update enqueue stats */ + q_data->queue_stats.dequeued_count += dequeued_ops; + + return dequeued_ops; +} + /* Dequeue LDPC encode operations from ACC200 device. */ static uint16_t acc200_dequeue_ldpc_enc(struct rte_bbdev_queue_data *q_data, @@ -2140,6 +2945,51 @@ /* Dequeue decode operations from ACC200 device. */ static uint16_t +acc200_dequeue_dec(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + uint16_t dequeue_num; + uint32_t avail = acc_ring_avail_deq(q); + uint32_t aq_dequeued = 0; + uint16_t i; + uint16_t dequeued_cbs = 0; + struct rte_bbdev_dec_op *op; + int ret; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(ops == 0 && q == NULL)) + return 0; +#endif + + dequeue_num = (avail < num) ? avail : num; + + for (i = 0; i < dequeue_num; ++i) { + op = (q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask))->req.op_addr; + if (op->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + ret = dequeue_dec_one_op_tb(q, &ops[i], dequeued_cbs, + &aq_dequeued); + else + ret = dequeue_dec_one_op_cb(q_data, q, &ops[i], + dequeued_cbs, &aq_dequeued); + + if (ret <= 0) + break; + dequeued_cbs += ret; + } + + q->aq_dequeued += aq_dequeued; + q->sw_ring_tail += dequeued_cbs; + + /* Update enqueue stats */ + q_data->queue_stats.dequeued_count += i; + + return i; +} + +/* Dequeue decode operations from ACC200 device. */ +static uint16_t acc200_dequeue_ldpc_dec(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t num) { @@ -2191,6 +3041,10 @@ struct rte_pci_device *pci_dev = RTE_DEV_TO_PCI(dev->device); dev->dev_ops = &acc200_bbdev_ops; + dev->enqueue_enc_ops = acc200_enqueue_enc; + dev->enqueue_dec_ops = acc200_enqueue_dec; + dev->dequeue_enc_ops = acc200_dequeue_enc; + dev->dequeue_dec_ops = acc200_dequeue_dec; dev->enqueue_ldpc_enc_ops = acc200_enqueue_ldpc_enc; dev->enqueue_ldpc_dec_ops = acc200_enqueue_ldpc_dec; dev->dequeue_ldpc_enc_ops = acc200_dequeue_ldpc_enc; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 08/11] baseband/acc200: add support for FFT operations 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru ` (6 preceding siblings ...) 2022-09-12 1:08 ` [PATCH v2 07/11] baseband/acc200: add LTE " Nic Chautru @ 2022-09-12 1:08 ` Nic Chautru 2022-09-12 1:08 ` [PATCH v2 09/11] baseband/acc200: support interrupt Nic Chautru ` (2 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nicolas Chautru From: Nicolas Chautru <nicolas.chautru@intel.com> Add functions and capability for FFT processing Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/rte_acc200_pmd.c | 251 ++++++++++++++++++++++++++++++- 1 file changed, 249 insertions(+), 2 deletions(-) diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 9c3388f..483dce8 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -717,6 +717,21 @@ .num_buffers_soft_out = 0, } }, + { + .type = RTE_BBDEV_OP_FFT, + .cap.fft = { + .capability_flags = + RTE_BBDEV_FFT_WINDOWING | + RTE_BBDEV_FFT_CS_ADJUSTMENT | + RTE_BBDEV_FFT_DFT_BYPASS | + RTE_BBDEV_FFT_IDFT_BYPASS | + RTE_BBDEV_FFT_WINDOWING_BYPASS, + .num_buffers_src = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + .num_buffers_dst = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + } + }, RTE_BBDEV_END_OF_CAPABILITIES_LIST() }; @@ -739,12 +754,13 @@ d->acc_conf.q_ul_5g.num_qgroups; dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = d->acc_conf.q_dl_5g.num_aqs_per_groups * d->acc_conf.q_dl_5g.num_qgroups; - dev_info->num_queues[RTE_BBDEV_OP_FFT] = 0; + dev_info->num_queues[RTE_BBDEV_OP_FFT] = d->acc_conf.q_fft.num_aqs_per_groups * + d->acc_conf.q_fft.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = d->acc_conf.q_ul_4g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = d->acc_conf.q_dl_4g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = d->acc_conf.q_ul_5g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = d->acc_conf.q_dl_5g.num_qgroups; - dev_info->queue_priority[RTE_BBDEV_OP_FFT] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_FFT] = d->acc_conf.q_fft.num_qgroups; dev_info->max_num_queues = 0; for (i = RTE_BBDEV_OP_NONE; i <= RTE_BBDEV_OP_FFT; i++) dev_info->max_num_queues += dev_info->num_queues[i]; @@ -3034,6 +3050,235 @@ return i; } +/* Fill in a frame control word for FFT processing. */ +static inline void +acc200_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct acc_fcw_fft *fcw) +{ + fcw->in_frame_size = op->fft.input_sequence_size; + fcw->leading_pad_size = op->fft.input_leading_padding; + fcw->out_frame_size = op->fft.output_sequence_size; + fcw->leading_depad_size = op->fft.output_leading_depadding; + fcw->cs_window_sel = op->fft.window_index[0] + + (op->fft.window_index[1] << 8) + + (op->fft.window_index[2] << 16) + + (op->fft.window_index[3] << 24); + fcw->cs_window_sel2 = op->fft.window_index[4] + + (op->fft.window_index[5] << 8); + fcw->cs_enable_bmap = op->fft.cs_bitmap; + fcw->num_antennas = op->fft.num_antennas_log2; + fcw->idft_size = op->fft.idft_log2; + fcw->dft_size = op->fft.dft_log2; + fcw->cs_offset = op->fft.cs_time_adjustment; + fcw->idft_shift = op->fft.idft_shift; + fcw->dft_shift = op->fft.dft_shift; + fcw->cs_multiplier = op->fft.ncs_reciprocal; + if (check_bit(op->fft.op_flags, + RTE_BBDEV_FFT_IDFT_BYPASS)) { + if (check_bit(op->fft.op_flags, + RTE_BBDEV_FFT_WINDOWING_BYPASS)) + fcw->bypass = 2; + else + fcw->bypass = 1; + } else if (check_bit(op->fft.op_flags, + RTE_BBDEV_FFT_DFT_BYPASS)) + fcw->bypass = 3; + else + fcw->bypass = 0; +} + +static inline int +acc200_dma_desc_fft_fill(struct rte_bbdev_fft_op *op, + struct acc_dma_req_desc *desc, + struct rte_mbuf *input, struct rte_mbuf *output, + uint32_t *in_offset, uint32_t *out_offset) +{ + /* FCW already done */ + acc_header_init(desc); + desc->data_ptrs[1].address = + rte_pktmbuf_iova_offset(input, *in_offset); + desc->data_ptrs[1].blen = op->fft.input_sequence_size * 4; + desc->data_ptrs[1].blkid = ACC_DMA_BLKID_IN; + desc->data_ptrs[1].last = 1; + desc->data_ptrs[1].dma_ext = 0; + desc->data_ptrs[2].address = + rte_pktmbuf_iova_offset(output, *out_offset); + desc->data_ptrs[2].blen = op->fft.output_sequence_size * 4; + desc->data_ptrs[2].blkid = ACC_DMA_BLKID_OUT_HARD; + desc->data_ptrs[2].last = 1; + desc->data_ptrs[2].dma_ext = 0; + desc->m2dlen = 2; + desc->d2mlen = 1; + desc->ib_ant_offset = op->fft.input_sequence_size; + desc->num_ant = op->fft.num_antennas_log2 - 3; + int num_cs = 0, i; + for (i = 0; i < 12; i++) + if (check_bit(op->fft.cs_bitmap, 1 << i)) + num_cs++; + desc->num_cs = num_cs; + desc->ob_cyc_offset = op->fft.output_sequence_size; + desc->ob_ant_offset = op->fft.output_sequence_size * num_cs; + desc->op_addr = op; + return 0; +} + + +/** Enqueue one FFT operation for ACC200 device*/ +static inline int +enqueue_fft_one_op(struct acc_queue *q, struct rte_bbdev_fft_op *op, + uint16_t total_enqueued_cbs) +{ + union acc_dma_desc *desc; + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + struct rte_mbuf *input, *output; + uint32_t in_offset, out_offset; + input = op->fft.base_input.data; + output = op->fft.base_output.data; + in_offset = op->fft.base_input.offset; + out_offset = op->fft.base_output.offset; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(input == NULL)) { + rte_bbdev_log(ERR, "Invalid mbuf pointer"); + return -EFAULT; + } +#endif + struct acc_fcw_fft *fcw; + fcw = &desc->req.fcw_fft; + acc200_fcw_fft_fill(op, fcw); + acc200_dma_desc_fft_fill(op, &desc->req, input, output, + &in_offset, &out_offset); +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_fft, + sizeof(desc->req.fcw_fft)); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + return 1; +} + +/* Enqueue decode operations for ACC200 device. */ +static uint16_t +acc200_enqueue_fft(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_fft_op **ops, uint16_t num) +{ + int32_t aq_avail = acc_aq_avail(q_data, num); + if (unlikely((aq_avail <= 0) || (num == 0))) + return 0; + struct acc_queue *q = q_data->queue_private; + int32_t avail = acc_ring_avail_enq(q); + uint16_t i; + union acc_dma_desc *desc; + int ret; + for (i = 0; i < num; ++i) { + /* Check if there are available space for further processing */ + if (unlikely(avail < 1)) + break; + avail -= 1; + ret = enqueue_fft_one_op(q, ops[i], i); + if (ret < 0) + break; + } + + if (unlikely(i == 0)) + return 0; /* Nothing to enqueue */ + + /* Set SDone in last CB in enqueued ops for CB mode*/ + desc = q->ring_addr + ((q->sw_ring_head + i - 1) + & q->sw_ring_wrap_mask); + + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + acc_dma_enqueue(q, i, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + return i; +} + + +/* Dequeue one FFT operations from ACC200 device */ +static inline int +dequeue_fft_one_op(struct rte_bbdev_queue_data *q_data, + struct acc_queue *q, struct rte_bbdev_fft_op **ref_op, + uint16_t dequeued_cbs, uint32_t *aq_dequeued) +{ + union acc_dma_desc *desc, atom_desc; + union acc_dma_rsp_desc rsp; + struct rte_bbdev_fft_op *op; + + desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC_FDONE)) + return -1; + + rsp.val = atom_desc.rsp.val; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "Resp", &desc->rsp.val, + sizeof(desc->rsp.val)); +#endif + /* Dequeue */ + op = desc->req.op_addr; + + /* Clearing status, it will be set based on response */ + op->status = 0; + op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR; + op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR; + op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR; + if (op->status != 0) + q_data->queue_stats.dequeue_err_count++; + + /* Check if this is the last desc in batch (Atomic Queue) */ + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + desc->rsp.val = ACC_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; + *ref_op = op; + /* One CB (op) was successfully dequeued */ + return 1; +} + + +/* Dequeue FFT operations from ACC200 device. */ +static uint16_t +acc200_dequeue_fft(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_fft_op **ops, uint16_t num) +{ + struct acc_queue *q = q_data->queue_private; + uint16_t dequeue_num, i, dequeued_cbs = 0; + uint32_t avail = acc_ring_avail_deq(q); + uint32_t aq_dequeued = 0; + int ret; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(ops == 0 && q == NULL)) + return 0; +#endif + + dequeue_num = RTE_MIN(avail, num); + + for (i = 0; i < dequeue_num; ++i) { + ret = dequeue_fft_one_op( + q_data, q, &ops[i], dequeued_cbs, + &aq_dequeued); + if (ret <= 0) + break; + dequeued_cbs += ret; + } + + q->aq_dequeued += aq_dequeued; + q->sw_ring_tail += dequeued_cbs; + /* Update enqueue stats */ + q_data->queue_stats.dequeued_count += i; + return i; +} + /* Initialization Function */ static void acc200_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv) @@ -3049,6 +3294,8 @@ dev->enqueue_ldpc_dec_ops = acc200_enqueue_ldpc_dec; dev->dequeue_ldpc_enc_ops = acc200_dequeue_ldpc_enc; dev->dequeue_ldpc_dec_ops = acc200_dequeue_ldpc_dec; + dev->enqueue_fft_ops = acc200_enqueue_fft; + dev->dequeue_fft_ops = acc200_dequeue_fft; ((struct acc_device *) dev->data->dev_private)->pf_device = !strcmp(drv->driver.name, -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 09/11] baseband/acc200: support interrupt 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru ` (7 preceding siblings ...) 2022-09-12 1:08 ` [PATCH v2 08/11] baseband/acc200: add support for FFT operations Nic Chautru @ 2022-09-12 1:08 ` Nic Chautru 2022-09-12 1:08 ` [PATCH v2 10/11] baseband/acc200: add device status and vf2pf comms Nic Chautru 2022-09-12 1:08 ` [PATCH v2 11/11] baseband/acc200: add PF configure companion function Nic Chautru 10 siblings, 0 replies; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nicolas Chautru From: Nicolas Chautru <nicolas.chautru@intel.com> Adding support for capability and functions for MSI/MSI-X interrupt and underlying information ring. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/rte_acc200_pmd.c | 346 ++++++++++++++++++++++++++++++- 1 file changed, 344 insertions(+), 2 deletions(-) diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 483dce8..8fe5704 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -220,6 +220,193 @@ acc_conf->q_fft.aq_depth_log2); } +/* Checks PF Info Ring to find the interrupt cause and handles it accordingly */ +static inline void +acc200_check_ir(struct acc_device *acc200_dev) +{ + volatile union acc_info_ring_data *ring_data; + uint16_t info_ring_head = acc200_dev->info_ring_head; + if (acc200_dev->info_ring == NULL) + return; + + ring_data = acc200_dev->info_ring + (acc200_dev->info_ring_head & + ACC_INFO_RING_MASK); + + while (ring_data->valid) { + if ((ring_data->int_nb < ACC200_PF_INT_DMA_DL_DESC_IRQ) || ( + ring_data->int_nb > + ACC200_PF_INT_DMA_DL5G_DESC_IRQ)) { + rte_bbdev_log(WARNING, "InfoRing: ITR:%d Info:0x%x", + ring_data->int_nb, ring_data->detailed_info); + /* Initialize Info Ring entry and move forward */ + ring_data->val = 0; + } + info_ring_head++; + ring_data = acc200_dev->info_ring + + (info_ring_head & ACC_INFO_RING_MASK); + } +} + +/* Checks PF Info Ring to find the interrupt cause and handles it accordingly */ +static inline void +acc200_pf_interrupt_handler(struct rte_bbdev *dev) +{ + struct acc_device *acc200_dev = dev->data->dev_private; + volatile union acc_info_ring_data *ring_data; + struct acc_deq_intr_details deq_intr_det; + + ring_data = acc200_dev->info_ring + (acc200_dev->info_ring_head & + ACC_INFO_RING_MASK); + + while (ring_data->valid) { + + rte_bbdev_log_debug( + "ACC200 PF Interrupt received, Info Ring data: 0x%x -> %d", + ring_data->val, ring_data->int_nb); + + switch (ring_data->int_nb) { + case ACC200_PF_INT_DMA_DL_DESC_IRQ: + case ACC200_PF_INT_DMA_UL_DESC_IRQ: + case ACC200_PF_INT_DMA_FFT_DESC_IRQ: + case ACC200_PF_INT_DMA_UL5G_DESC_IRQ: + case ACC200_PF_INT_DMA_DL5G_DESC_IRQ: + deq_intr_det.queue_id = get_queue_id_from_ring_info( + dev->data, *ring_data); + if (deq_intr_det.queue_id == UINT16_MAX) { + rte_bbdev_log(ERR, + "Couldn't find queue: aq_id: %u, qg_id: %u, vf_id: %u", + ring_data->aq_id, + ring_data->qg_id, + ring_data->vf_id); + return; + } + rte_bbdev_pmd_callback_process(dev, + RTE_BBDEV_EVENT_DEQUEUE, &deq_intr_det); + break; + default: + rte_bbdev_pmd_callback_process(dev, + RTE_BBDEV_EVENT_ERROR, NULL); + break; + } + + /* Initialize Info Ring entry and move forward */ + ring_data->val = 0; + ++acc200_dev->info_ring_head; + ring_data = acc200_dev->info_ring + + (acc200_dev->info_ring_head & ACC_INFO_RING_MASK); + } +} + +/* Checks VF Info Ring to find the interrupt cause and handles it accordingly */ +static inline void +acc200_vf_interrupt_handler(struct rte_bbdev *dev) +{ + struct acc_device *acc200_dev = dev->data->dev_private; + volatile union acc_info_ring_data *ring_data; + struct acc_deq_intr_details deq_intr_det; + + ring_data = acc200_dev->info_ring + (acc200_dev->info_ring_head & ACC_INFO_RING_MASK); + + while (ring_data->valid) { + + rte_bbdev_log_debug( + "ACC200 VF Interrupt received, Info Ring data: 0x%x\n", + ring_data->val); + + switch (ring_data->int_nb) { + case ACC200_VF_INT_DMA_DL_DESC_IRQ: + case ACC200_VF_INT_DMA_UL_DESC_IRQ: + case ACC200_VF_INT_DMA_FFT_DESC_IRQ: + case ACC200_VF_INT_DMA_UL5G_DESC_IRQ: + case ACC200_VF_INT_DMA_DL5G_DESC_IRQ: + /* VFs are not aware of their vf_id - it's set to 0 in + * queue structures. + */ + ring_data->vf_id = 0; + deq_intr_det.queue_id = get_queue_id_from_ring_info( + dev->data, *ring_data); + if (deq_intr_det.queue_id == UINT16_MAX) { + rte_bbdev_log(ERR, + "Couldn't find queue: aq_id: %u, qg_id: %u", + ring_data->aq_id, + ring_data->qg_id); + return; + } + rte_bbdev_pmd_callback_process(dev, + RTE_BBDEV_EVENT_DEQUEUE, &deq_intr_det); + break; + default: + rte_bbdev_pmd_callback_process(dev, + RTE_BBDEV_EVENT_ERROR, NULL); + break; + } + + /* Initialize Info Ring entry and move forward */ + ring_data->valid = 0; + ++acc200_dev->info_ring_head; + ring_data = acc200_dev->info_ring + (acc200_dev->info_ring_head + & ACC_INFO_RING_MASK); + } +} + +/* Interrupt handler triggered by ACC200 dev for handling specific interrupt */ +static void +acc200_dev_interrupt_handler(void *cb_arg) +{ + struct rte_bbdev *dev = cb_arg; + struct acc_device *acc200_dev = dev->data->dev_private; + + /* Read info ring */ + if (acc200_dev->pf_device) + acc200_pf_interrupt_handler(dev); + else + acc200_vf_interrupt_handler(dev); +} + +/* Allocate and setup inforing */ +static int +allocate_info_ring(struct rte_bbdev *dev) +{ + struct acc_device *d = dev->data->dev_private; + const struct acc200_registry_addr *reg_addr; + rte_iova_t info_ring_iova; + uint32_t phys_low, phys_high; + + if (d->info_ring != NULL) + return 0; /* Already configured */ + + /* Choose correct registry addresses for the device type */ + if (d->pf_device) + reg_addr = &pf_reg_addr; + else + reg_addr = &vf_reg_addr; + /* Allocate InfoRing */ + if (d->info_ring == NULL) + d->info_ring = rte_zmalloc_socket("Info Ring", + ACC_INFO_RING_NUM_ENTRIES * + sizeof(*d->info_ring), RTE_CACHE_LINE_SIZE, + dev->data->socket_id); + if (d->info_ring == NULL) { + rte_bbdev_log(ERR, + "Failed to allocate Info Ring for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + return -ENOMEM; + } + info_ring_iova = rte_malloc_virt2iova(d->info_ring); + + /* Setup Info Ring */ + phys_high = (uint32_t)(info_ring_iova >> 32); + phys_low = (uint32_t)(info_ring_iova); + acc_reg_write(d, reg_addr->info_ring_hi, phys_high); + acc_reg_write(d, reg_addr->info_ring_lo, phys_low); + acc_reg_write(d, reg_addr->info_ring_en, ACC200_REG_IRQ_EN_ALL); + d->info_ring_head = (acc_reg_read(d, reg_addr->info_ring_ptr) & + 0xFFF) / sizeof(union acc_info_ring_data); + return 0; +} + + /* Allocate 64MB memory used for all software rings */ static int acc200_setup_queues(struct rte_bbdev *dev, uint16_t num_queues, int socket_id) @@ -227,6 +414,7 @@ uint32_t phys_low, phys_high, value; struct acc_device *d = dev->data->dev_private; const struct acc200_registry_addr *reg_addr; + int ret; if (d->pf_device && !d->acc_conf.pf_mode_en) { rte_bbdev_log(NOTICE, @@ -327,6 +515,14 @@ acc_reg_write(d, reg_addr->tail_ptrs_fft_hi, phys_high); acc_reg_write(d, reg_addr->tail_ptrs_fft_lo, phys_low); + ret = allocate_info_ring(dev); + if (ret < 0) { + rte_bbdev_log(ERR, "Failed to allocate info_ring for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + /* Continue */ + } + if (d->harq_layout == NULL) d->harq_layout = rte_zmalloc_socket("HARQ Layout", ACC_HARQ_LAYOUT * sizeof(*d->harq_layout), @@ -349,17 +545,121 @@ return 0; } +static int +acc200_intr_enable(struct rte_bbdev *dev) +{ + int ret; + struct acc_device *d = dev->data->dev_private; + /* + * MSI/MSI-X are supported + * Option controlled by vfio-intr through EAL parameter + */ + if (rte_intr_type_get(dev->intr_handle) == RTE_INTR_HANDLE_VFIO_MSI) { + + ret = allocate_info_ring(dev); + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't allocate info ring for device: %s", + dev->data->name); + return ret; + } + ret = rte_intr_enable(dev->intr_handle); + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't enable interrupts for device: %s", + dev->data->name); + rte_free(d->info_ring); + return ret; + } + ret = rte_intr_callback_register(dev->intr_handle, + acc200_dev_interrupt_handler, dev); + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't register interrupt callback for device: %s", + dev->data->name); + rte_free(d->info_ring); + return ret; + } + + return 0; + } else if (rte_intr_type_get(dev->intr_handle) == RTE_INTR_HANDLE_VFIO_MSIX) { + + ret = allocate_info_ring(dev); + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't allocate info ring for device: %s", + dev->data->name); + return ret; + } + + int i, max_queues; + struct acc_device *acc200_dev = dev->data->dev_private; + + if (acc200_dev->pf_device) + max_queues = ACC200_MAX_PF_MSIX; + else + max_queues = ACC200_MAX_VF_MSIX; + + if (rte_intr_efd_enable(dev->intr_handle, max_queues)) { + rte_bbdev_log(ERR, "Failed to create fds for %u queues", + dev->data->num_queues); + return -1; + } + + for (i = 0; i < max_queues; ++i) { + if (rte_intr_efds_index_set(dev->intr_handle, i, + rte_intr_fd_get(dev->intr_handle))) + return -rte_errno; + } + + if (rte_intr_vec_list_alloc(dev->intr_handle, "intr_vec", + dev->data->num_queues)) { + rte_bbdev_log(ERR, "Failed to allocate %u vectors", + dev->data->num_queues); + return -ENOMEM; + } + + ret = rte_intr_enable(dev->intr_handle); + + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't enable interrupts for device: %s", + dev->data->name); + rte_free(d->info_ring); + return ret; + } + ret = rte_intr_callback_register(dev->intr_handle, + acc200_dev_interrupt_handler, dev); + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't register interrupt callback for device: %s", + dev->data->name); + rte_free(d->info_ring); + return ret; + } + + return 0; + } + + rte_bbdev_log(ERR, "ACC200 (%s) supports only VFIO MSI/MSI-X interrupts\n", + dev->data->name); + return -ENOTSUP; +} + /* Free memory used for software rings */ static int acc200_dev_close(struct rte_bbdev *dev) { struct acc_device *d = dev->data->dev_private; + acc200_check_ir(d); if (d->sw_rings_base != NULL) { rte_free(d->tail_ptrs); + rte_free(d->info_ring); rte_free(d->sw_rings_base); rte_free(d->harq_layout); d->sw_rings_base = NULL; d->tail_ptrs = NULL; + d->info_ring = NULL; d->harq_layout = NULL; } /* Ensure all in flight HW transactions are completed */ @@ -652,6 +952,7 @@ RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH | RTE_BBDEV_TURBO_SOFT_OUTPUT | RTE_BBDEV_TURBO_EARLY_TERMINATION | + RTE_BBDEV_TURBO_DEC_INTERRUPTS | RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN | RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT | RTE_BBDEV_TURBO_MAP_DEC | @@ -673,6 +974,7 @@ RTE_BBDEV_TURBO_CRC_24B_ATTACH | RTE_BBDEV_TURBO_RV_INDEX_BYPASS | RTE_BBDEV_TURBO_RATE_MATCH | + RTE_BBDEV_TURBO_ENC_INTERRUPTS | RTE_BBDEV_TURBO_ENC_SCATTER_GATHER, .num_buffers_src = RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, @@ -686,7 +988,8 @@ .capability_flags = RTE_BBDEV_LDPC_RATE_MATCH | RTE_BBDEV_LDPC_CRC_24B_ATTACH | - RTE_BBDEV_LDPC_INTERLEAVER_BYPASS, + RTE_BBDEV_LDPC_INTERLEAVER_BYPASS | + RTE_BBDEV_LDPC_ENC_INTERRUPTS, .num_buffers_src = RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, .num_buffers_dst = @@ -707,7 +1010,8 @@ RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS | RTE_BBDEV_LDPC_DEC_SCATTER_GATHER | RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION | - RTE_BBDEV_LDPC_LLR_COMPRESSION, + RTE_BBDEV_LDPC_LLR_COMPRESSION | + RTE_BBDEV_LDPC_DEC_INTERRUPTS, .llr_size = 8, .llr_decimals = 1, .num_buffers_src = @@ -775,15 +1079,46 @@ dev_info->min_alignment = 1; dev_info->capabilities = bbdev_capabilities; dev_info->harq_buffer_size = 0; + + acc200_check_ir(d); +} + +static int +acc200_queue_intr_enable(struct rte_bbdev *dev, uint16_t queue_id) +{ + struct acc_queue *q = dev->data->queues[queue_id].queue_private; + + if (rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_VFIO_MSI && + rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_VFIO_MSIX) + return -ENOTSUP; + + q->irq_enable = 1; + return 0; +} + +static int +acc200_queue_intr_disable(struct rte_bbdev *dev, uint16_t queue_id) +{ + struct acc_queue *q = dev->data->queues[queue_id].queue_private; + + if (rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_VFIO_MSI && + rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_VFIO_MSIX) + return -ENOTSUP; + + q->irq_enable = 0; + return 0; } static const struct rte_bbdev_ops acc200_bbdev_ops = { .setup_queues = acc200_setup_queues, + .intr_enable = acc200_intr_enable, .close = acc200_dev_close, .info_get = acc200_dev_info_get, .queue_setup = acc200_queue_setup, .queue_release = acc200_queue_release, .queue_stop = acc_queue_stop, + .queue_intr_enable = acc200_queue_intr_enable, + .queue_intr_disable = acc200_queue_intr_disable }; /* ACC200 PCI PF address map */ @@ -2694,6 +3029,7 @@ if (op->status != 0) { /* These errors are not expected */ q_data->queue_stats.dequeue_err_count++; + acc200_check_ir(q->d); } /* CRC invalid if error exists */ @@ -2763,6 +3099,9 @@ op->ldpc_dec.iter_count = (uint8_t) rsp.iter_cnt; + if (op->status & (1 << RTE_BBDEV_DRV_ERROR)) + acc200_check_ir(q->d); + /* Check if this is the last desc in batch (Atomic Queue) */ if (desc->req.last_desc_in_batch) { (*aq_dequeued)++; @@ -3232,6 +3571,9 @@ if (op->status != 0) q_data->queue_stats.dequeue_err_count++; + if (op->status & (1 << RTE_BBDEV_DRV_ERROR)) + acc200_check_ir(q->d); + /* Check if this is the last desc in batch (Atomic Queue) */ if (desc->req.last_desc_in_batch) { (*aq_dequeued)++; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 10/11] baseband/acc200: add device status and vf2pf comms 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru ` (8 preceding siblings ...) 2022-09-12 1:08 ` [PATCH v2 09/11] baseband/acc200: support interrupt Nic Chautru @ 2022-09-12 1:08 ` Nic Chautru 2022-09-12 1:08 ` [PATCH v2 11/11] baseband/acc200: add PF configure companion function Nic Chautru 10 siblings, 0 replies; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nicolas Chautru From: Nicolas Chautru <nicolas.chautru@intel.com> Add support to expose the device status seen from the host through v2pf mailbox communication. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/rte_acc200_pmd.c | 61 +++++++++++++++++++++++--------- 1 file changed, 44 insertions(+), 17 deletions(-) diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 8fe5704..a969679 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -201,23 +201,47 @@ rte_bbdev_log_debug( "%s Config LLR SIGN IN/OUT %s %s QG %u %u %u %u %u AQ %u %u %u %u %u Len %u %u %u %u %u\n", (d->pf_device) ? "PF" : "VF", - (acc_conf->input_pos_llr_1_bit) ? "POS" : "NEG", - (acc_conf->output_pos_llr_1_bit) ? "POS" : "NEG", - acc_conf->q_ul_4g.num_qgroups, - acc_conf->q_dl_4g.num_qgroups, - acc_conf->q_ul_5g.num_qgroups, - acc_conf->q_dl_5g.num_qgroups, - acc_conf->q_fft.num_qgroups, - acc_conf->q_ul_4g.num_aqs_per_groups, - acc_conf->q_dl_4g.num_aqs_per_groups, - acc_conf->q_ul_5g.num_aqs_per_groups, - acc_conf->q_dl_5g.num_aqs_per_groups, - acc_conf->q_fft.num_aqs_per_groups, - acc_conf->q_ul_4g.aq_depth_log2, - acc_conf->q_dl_4g.aq_depth_log2, - acc_conf->q_ul_5g.aq_depth_log2, - acc_conf->q_dl_5g.aq_depth_log2, - acc_conf->q_fft.aq_depth_log2); + (acc200_conf->input_pos_llr_1_bit) ? "POS" : "NEG", + (acc200_conf->output_pos_llr_1_bit) ? "POS" : "NEG", + acc200_conf->q_ul_4g.num_qgroups, + acc200_conf->q_dl_4g.num_qgroups, + acc200_conf->q_ul_5g.num_qgroups, + acc200_conf->q_dl_5g.num_qgroups, + acc200_conf->q_fft.num_qgroups, + acc200_conf->q_ul_4g.num_aqs_per_groups, + acc200_conf->q_dl_4g.num_aqs_per_groups, + acc200_conf->q_ul_5g.num_aqs_per_groups, + acc200_conf->q_dl_5g.num_aqs_per_groups, + acc200_conf->q_fft.num_aqs_per_groups, + acc200_conf->q_ul_4g.aq_depth_log2, + acc200_conf->q_dl_4g.aq_depth_log2, + acc200_conf->q_ul_5g.aq_depth_log2, + acc200_conf->q_dl_5g.aq_depth_log2, + acc200_conf->q_fft.aq_depth_log2); +} + +static inline void +acc200_vf2pf(struct acc_device *d, unsigned int payload) +{ + acc_reg_write(d, HWVfHiVfToPfDbellVf, payload); +} + +/* Request device status information */ +static inline uint32_t +acc200_device_status(struct rte_bbdev *dev) +{ + struct acc_device *d = dev->data->dev_private; + uint32_t reg, time_out = 0; + if (d->pf_device) + return RTE_BBDEV_DEV_NOT_SUPPORTED; + acc200_vf2pf(d, ACC_VF2PF_STATUS_REQUEST); + reg = acc_reg_read(d, HWVfHiPfToVfDbellVf); + while ((time_out < ACC200_STATUS_TO) && (reg == RTE_BBDEV_DEV_NOSTATUS)) { + usleep(ACC200_STATUS_WAIT); /*< Wait or VF->PF->VF Comms */ + reg = acc_reg_read(d, HWVfHiPfToVfDbellVf); + time_out++; + } + return reg; } /* Checks PF Info Ring to find the interrupt cause and handles it accordingly */ @@ -537,6 +561,7 @@ /* Mark as configured properly */ d->configured = true; + acc200_vf2pf(d, ACC_VF2PF_USING_VF); rte_bbdev_log_debug( "ACC200 (%s) configured sw_rings = %p, sw_rings_iova = %#" @@ -1047,6 +1072,8 @@ /* Read and save the populated config from ACC200 registers */ fetch_acc200_config(dev); + /* Check the status of device */ + dev_info->device_status = acc200_device_status(dev); /* Exposed number of queues */ dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 11/11] baseband/acc200: add PF configure companion function 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru ` (9 preceding siblings ...) 2022-09-12 1:08 ` [PATCH v2 10/11] baseband/acc200: add device status and vf2pf comms Nic Chautru @ 2022-09-12 1:08 ` Nic Chautru 10 siblings, 0 replies; 50+ messages in thread From: Nic Chautru @ 2022-09-12 1:08 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal Cc: maxime.coquelin, trix, mdr, bruce.richardson, david.marchand, stephen, hernan.vargas, Nicolas Chautru From: Nicolas Chautru <nicolas.chautru@intel.com> Add configure function notably to configure the device from the PF within DPDK and bbdev-test (without external dependency). Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- app/test-bbdev/meson.build | 3 + app/test-bbdev/test_bbdev_perf.c | 76 ++++++ drivers/baseband/acc200/meson.build | 2 + drivers/baseband/acc200/rte_acc200_cfg.h | 21 ++ drivers/baseband/acc200/rte_acc200_pmd.c | 453 +++++++++++++++++++++++++++++++ drivers/baseband/acc200/version.map | 7 + 6 files changed, 562 insertions(+) diff --git a/app/test-bbdev/meson.build b/app/test-bbdev/meson.build index 76d4c26..1ffaa54 100644 --- a/app/test-bbdev/meson.build +++ b/app/test-bbdev/meson.build @@ -23,6 +23,9 @@ endif if dpdk_conf.has('RTE_BASEBAND_ACC100') deps += ['baseband_acc100'] endif +if dpdk_conf.has('RTE_BASEBAND_ACC200') + deps += ['baseband_acc200'] +endif if dpdk_conf.has('RTE_LIBRTE_PMD_BBDEV_LA12XX') deps += ['baseband_la12xx'] endif diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c index af9ceca..8886143 100644 --- a/app/test-bbdev/test_bbdev_perf.c +++ b/app/test-bbdev/test_bbdev_perf.c @@ -64,6 +64,18 @@ #define ACC100_QOS_GBR 0 #endif +#ifdef RTE_BASEBAND_ACC200 +#include <rte_acc200_cfg.h> +#define ACC200PF_DRIVER_NAME ("intel_acc200_pf") +#define ACC200VF_DRIVER_NAME ("intel_acc200_vf") +#define ACC200_QMGR_NUM_AQS 16 +#define ACC200_QMGR_NUM_QGS 2 +#define ACC200_QMGR_AQ_DEPTH 5 +#define ACC200_QMGR_INVALID_IDX -1 +#define ACC200_QMGR_RR 1 +#define ACC200_QOS_GBR 0 +#endif + #define OPS_CACHE_SIZE 256U #define OPS_POOL_SIZE_MIN 511U /* 0.5K per queue */ @@ -762,6 +774,70 @@ typedef int (test_case_function)(struct active_device *ad, info->dev_name); } #endif +#ifdef RTE_BASEBAND_ACC200 + if ((get_init_device() == true) && + (!strcmp(info->drv.driver_name, ACC200PF_DRIVER_NAME))) { + struct rte_acc_conf conf; + unsigned int i; + + printf("Configure ACC200 FEC Driver %s with default values\n", + info->drv.driver_name); + + /* clear default configuration before initialization */ + memset(&conf, 0, sizeof(struct rte_acc_conf)); + + /* Always set in PF mode for built-in configuration */ + conf.pf_mode_en = true; + for (i = 0; i < RTE_ACC_NUM_VFS; ++i) { + conf.arb_dl_4g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_dl_4g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_dl_4g[i].round_robin_weight = ACC200_QMGR_RR; + conf.arb_ul_4g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_ul_4g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_ul_4g[i].round_robin_weight = ACC200_QMGR_RR; + conf.arb_dl_5g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_dl_5g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_dl_5g[i].round_robin_weight = ACC200_QMGR_RR; + conf.arb_ul_5g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_ul_5g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_ul_5g[i].round_robin_weight = ACC200_QMGR_RR; + conf.arb_fft[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_fft[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_fft[i].round_robin_weight = ACC200_QMGR_RR; + } + + conf.input_pos_llr_1_bit = true; + conf.output_pos_llr_1_bit = true; + conf.num_vf_bundles = 1; /**< Number of VF bundles to setup */ + + conf.q_ul_4g.num_qgroups = ACC200_QMGR_NUM_QGS; + conf.q_ul_4g.first_qgroup_index = ACC200_QMGR_INVALID_IDX; + conf.q_ul_4g.num_aqs_per_groups = ACC200_QMGR_NUM_AQS; + conf.q_ul_4g.aq_depth_log2 = ACC200_QMGR_AQ_DEPTH; + conf.q_dl_4g.num_qgroups = ACC200_QMGR_NUM_QGS; + conf.q_dl_4g.first_qgroup_index = ACC200_QMGR_INVALID_IDX; + conf.q_dl_4g.num_aqs_per_groups = ACC200_QMGR_NUM_AQS; + conf.q_dl_4g.aq_depth_log2 = ACC200_QMGR_AQ_DEPTH; + conf.q_ul_5g.num_qgroups = ACC200_QMGR_NUM_QGS; + conf.q_ul_5g.first_qgroup_index = ACC200_QMGR_INVALID_IDX; + conf.q_ul_5g.num_aqs_per_groups = ACC200_QMGR_NUM_AQS; + conf.q_ul_5g.aq_depth_log2 = ACC200_QMGR_AQ_DEPTH; + conf.q_dl_5g.num_qgroups = ACC200_QMGR_NUM_QGS; + conf.q_dl_5g.first_qgroup_index = ACC200_QMGR_INVALID_IDX; + conf.q_dl_5g.num_aqs_per_groups = ACC200_QMGR_NUM_AQS; + conf.q_dl_5g.aq_depth_log2 = ACC200_QMGR_AQ_DEPTH; + conf.q_fft.num_qgroups = ACC200_QMGR_NUM_QGS; + conf.q_fft.first_qgroup_index = ACC200_QMGR_INVALID_IDX; + conf.q_fft.num_aqs_per_groups = ACC200_QMGR_NUM_AQS; + conf.q_fft.aq_depth_log2 = ACC200_QMGR_AQ_DEPTH; + + /* setup PF with configuration information */ + ret = rte_acc200_configure(info->dev_name, &conf); + TEST_ASSERT_SUCCESS(ret, + "Failed to configure ACC200 PF for bbdev %s", + info->dev_name); + } +#endif /* Let's refresh this now this is configured */ rte_bbdev_info_get(dev_id, info); nb_queues = RTE_MIN(rte_lcore_count(), info->drv.max_num_queues); diff --git a/drivers/baseband/acc200/meson.build b/drivers/baseband/acc200/meson.build index 7ec8679..b964c21 100644 --- a/drivers/baseband/acc200/meson.build +++ b/drivers/baseband/acc200/meson.build @@ -4,3 +4,5 @@ deps += ['bbdev', 'bus_vdev', 'ring', 'pci', 'bus_pci'] sources = files('rte_acc200_pmd.c') + +headers = files('rte_acc200_cfg.h') diff --git a/drivers/baseband/acc200/rte_acc200_cfg.h b/drivers/baseband/acc200/rte_acc200_cfg.h index 9ae96c6..9fefd5c 100644 --- a/drivers/baseband/acc200/rte_acc200_cfg.h +++ b/drivers/baseband/acc200/rte_acc200_cfg.h @@ -24,4 +24,25 @@ extern "C" { #endif +/** + * Configure a ACC200 device + * + * @param dev_name + * The name of the device. This is the short form of PCI BDF, e.g. 00:01.0. + * It can also be retrieved for a bbdev device from the dev_name field in the + * rte_bbdev_info structure returned by rte_bbdev_info_get(). + * @param conf + * Configuration to apply to ACC200 HW. + * + * @return + * Zero on success, negative value on failure. + */ +__rte_experimental +int +rte_acc200_configure(const char *dev_name, struct rte_acc_conf *conf); + +#ifdef __cplusplus +} +#endif + #endif /* _RTE_ACC200_CFG_H_ */ diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index a969679..8f63873 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -43,6 +43,27 @@ enum {UL_4G = 0, UL_5G, DL_4G, DL_5G, FFT, NUM_ACC}; +/* Return the accelerator enum for a Queue Group Index */ +static inline int +accFromQgid(int qg_idx, const struct rte_acc_conf *acc_conf) +{ + int accQg[ACC200_NUM_QGRPS]; + int NumQGroupsPerFn[NUM_ACC]; + int acc, qgIdx, qgIndex = 0; + for (qgIdx = 0; qgIdx < ACC200_NUM_QGRPS; qgIdx++) + accQg[qgIdx] = 0; + NumQGroupsPerFn[UL_4G] = acc_conf->q_ul_4g.num_qgroups; + NumQGroupsPerFn[UL_5G] = acc_conf->q_ul_5g.num_qgroups; + NumQGroupsPerFn[DL_4G] = acc_conf->q_dl_4g.num_qgroups; + NumQGroupsPerFn[DL_5G] = acc_conf->q_dl_5g.num_qgroups; + NumQGroupsPerFn[FFT] = acc_conf->q_fft.num_qgroups; + for (acc = UL_4G; acc < NUM_ACC; acc++) + for (qgIdx = 0; qgIdx < NumQGroupsPerFn[acc]; qgIdx++) + accQg[qgIndex++] = acc; + acc = accQg[qg_idx]; + return acc; +} + /* Return the queue topology for a Queue Group Index */ static inline void qtopFromAcc(struct rte_acc_queue_topology **qtop, int acc_enum, @@ -75,6 +96,30 @@ *qtop = p_qtop; } +/* Return the AQ depth for a Queue Group Index */ +static inline int +aqDepth(int qg_idx, struct rte_acc_conf *acc_conf) +{ + struct rte_acc_queue_topology *q_top = NULL; + int acc_enum = accFromQgid(qg_idx, acc_conf); + qtopFromAcc(&q_top, acc_enum, acc_conf); + if (unlikely(q_top == NULL)) + return 0; + return q_top->aq_depth_log2; +} + +/* Return the AQ depth for a Queue Group Index */ +static inline int +aqNum(int qg_idx, struct rte_acc_conf *acc_conf) +{ + struct rte_acc_queue_topology *q_top = NULL; + int acc_enum = accFromQgid(qg_idx, acc_conf); + qtopFromAcc(&q_top, acc_enum, acc_conf); + if (unlikely(q_top == NULL)) + return 0; + return q_top->num_aqs_per_groups; +} + static void initQTop(struct rte_acc_conf *acc_conf) { @@ -3740,3 +3785,411 @@ static int acc200_pci_probe(struct rte_pci_driver *pci_drv, RTE_PMD_REGISTER_PCI_TABLE(ACC200PF_DRIVER_NAME, pci_id_acc200_pf_map); RTE_PMD_REGISTER_PCI(ACC200VF_DRIVER_NAME, acc200_pci_vf_driver); RTE_PMD_REGISTER_PCI_TABLE(ACC200VF_DRIVER_NAME, pci_id_acc200_vf_map); + +/* Initial configuration of a ACC200 device prior to running configure() */ +int +rte_acc200_configure(const char *dev_name, struct rte_acc_conf *conf) +{ + rte_bbdev_log(INFO, "rte_acc200_configure"); + uint32_t value, address, status; + int qg_idx, template_idx, vf_idx, acc, i; + struct rte_bbdev *bbdev = rte_bbdev_get_named_dev(dev_name); + + /* Compile time checks */ + RTE_BUILD_BUG_ON(sizeof(struct acc_dma_req_desc) != 256); + RTE_BUILD_BUG_ON(sizeof(union acc_dma_desc) != 256); + RTE_BUILD_BUG_ON(sizeof(struct acc_fcw_td) != 24); + RTE_BUILD_BUG_ON(sizeof(struct acc_fcw_te) != 32); + + if (bbdev == NULL) { + rte_bbdev_log(ERR, + "Invalid dev_name (%s), or device is not yet initialised", + dev_name); + return -ENODEV; + } + struct acc_device *d = bbdev->data->dev_private; + + /* Store configuration */ + rte_memcpy(&d->acc_conf, conf, sizeof(d->acc_conf)); + + + /* Check we are already out of PG */ + status = acc_reg_read(d, HWPfHiSectionPowerGatingAck); + if (status > 0) { + if (status != ACC200_PG_MASK_0) { + rte_bbdev_log(ERR, "Unexpected status %x %x", + status, ACC200_PG_MASK_0); + return -ENODEV; + } + /* Clock gate sections that will be un-PG */ + acc_reg_write(d, HWPfHiClkGateHystReg, ACC200_CLK_DIS); + /* Un-PG required sections */ + acc_reg_write(d, HWPfHiSectionPowerGatingReq, + ACC200_PG_MASK_1); + status = acc_reg_read(d, HWPfHiSectionPowerGatingAck); + if (status != ACC200_PG_MASK_1) { + rte_bbdev_log(ERR, "Unexpected status %x %x", + status, ACC200_PG_MASK_1); + return -ENODEV; + } + acc_reg_write(d, HWPfHiSectionPowerGatingReq, + ACC200_PG_MASK_2); + status = acc_reg_read(d, HWPfHiSectionPowerGatingAck); + if (status != ACC200_PG_MASK_2) { + rte_bbdev_log(ERR, "Unexpected status %x %x", + status, ACC200_PG_MASK_2); + return -ENODEV; + } + acc_reg_write(d, HWPfHiSectionPowerGatingReq, + ACC200_PG_MASK_3); + status = acc_reg_read(d, HWPfHiSectionPowerGatingAck); + if (status != ACC200_PG_MASK_3) { + rte_bbdev_log(ERR, "Unexpected status %x %x", + status, ACC200_PG_MASK_3); + return -ENODEV; + } + /* Enable clocks for all sections */ + acc_reg_write(d, HWPfHiClkGateHystReg, ACC200_CLK_EN); + } + + /* Explicitly releasing AXI as this may be stopped after PF FLR/BME */ + address = HWPfDmaAxiControl; + value = 1; + acc_reg_write(d, address, value); + + /* Set the fabric mode */ + address = HWPfFabricM2iBufferReg; + value = ACC200_FABRIC_MODE; + acc_reg_write(d, address, value); + + /* Set default descriptor signature */ + address = HWPfDmaDescriptorSignatuture; + value = 0; + acc_reg_write(d, address, value); + + /* Enable the Error Detection in DMA */ + value = ACC200_CFG_DMA_ERROR; + address = HWPfDmaErrorDetectionEn; + acc_reg_write(d, address, value); + + /* AXI Cache configuration */ + value = ACC200_CFG_AXI_CACHE; + address = HWPfDmaAxcacheReg; + acc_reg_write(d, address, value); + + /* Default DMA Configuration (Qmgr Enabled) */ + address = HWPfDmaConfig0Reg; + value = 0; + acc_reg_write(d, address, value); + address = HWPfDmaQmanen; + value = 0; + acc_reg_write(d, address, value); + + /* Default RLIM/ALEN configuration */ + int rlim = 0; + int alen = 1; + int timestamp = 0; + address = HWPfDmaConfig1Reg; + value = (1 << 31) + (rlim << 8) + (timestamp << 6) + alen; + acc_reg_write(d, address, value); + + /* Default FFT configuration */ + address = HWPfFftConfig0; + value = ACC200_FFT_CFG_0; + acc_reg_write(d, address, value); + + /* Configure DMA Qmanager addresses */ + address = HWPfDmaQmgrAddrReg; + value = HWPfQmgrEgressQueuesTemplate; + acc_reg_write(d, address, value); + + /* ===== Qmgr Configuration ===== */ + /* Configuration of the AQueue Depth QMGR_GRP_0_DEPTH_LOG2 for UL */ + int totalQgs = conf->q_ul_4g.num_qgroups + + conf->q_ul_5g.num_qgroups + + conf->q_dl_4g.num_qgroups + + conf->q_dl_5g.num_qgroups + + conf->q_fft.num_qgroups; + for (qg_idx = 0; qg_idx < ACC200_NUM_QGRPS; qg_idx++) { + address = HWPfQmgrDepthLog2Grp + + ACC_BYTES_IN_WORD * qg_idx; + value = aqDepth(qg_idx, conf); + acc_reg_write(d, address, value); + address = HWPfQmgrTholdGrp + + ACC_BYTES_IN_WORD * qg_idx; + value = (1 << 16) + (1 << (aqDepth(qg_idx, conf) - 1)); + acc_reg_write(d, address, value); + } + + /* Template Priority in incremental order */ + for (template_idx = 0; template_idx < ACC_NUM_TMPL; + template_idx++) { + address = HWPfQmgrGrpTmplateReg0Indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_0; + acc_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg1Indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_1; + acc_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg2indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_2; + acc_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg3Indx + ACC_BYTES_IN_WORD * template_idx; + value = ACC_TMPL_PRI_3; + acc_reg_write(d, address, value); + } + + address = HWPfQmgrGrpPriority; + value = ACC200_CFG_QMGR_HI_P; + acc_reg_write(d, address, value); + + /* Template Configuration */ + for (template_idx = 0; template_idx < ACC_NUM_TMPL; + template_idx++) { + value = 0; + address = HWPfQmgrGrpTmplateReg4Indx + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); + } + /* 4GUL */ + int numQgs = conf->q_ul_4g.num_qgroups; + int numQqsAcc = 0; + value = 0; + for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++) + value |= (1 << qg_idx); + for (template_idx = ACC200_SIG_UL_4G; + template_idx <= ACC200_SIG_UL_4G_LAST; + template_idx++) { + address = HWPfQmgrGrpTmplateReg4Indx + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); + } + /* 5GUL */ + numQqsAcc += numQgs; + numQgs = conf->q_ul_5g.num_qgroups; + value = 0; + int numEngines = 0; + for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++) + value |= (1 << qg_idx); + for (template_idx = ACC200_SIG_UL_5G; + template_idx <= ACC200_SIG_UL_5G_LAST; + template_idx++) { + /* Check engine power-on status */ + address = HwPfFecUl5gIbDebugReg + ACC_ENGINE_OFFSET * template_idx; + status = (acc_reg_read(d, address) >> 4) & 0x7; + address = HWPfQmgrGrpTmplateReg4Indx + + ACC_BYTES_IN_WORD * template_idx; + if (status == 1) { + acc_reg_write(d, address, value); + numEngines++; + } else + acc_reg_write(d, address, 0); + } + printf("Number of 5GUL engines %d\n", numEngines); + /* 4GDL */ + numQqsAcc += numQgs; + numQgs = conf->q_dl_4g.num_qgroups; + value = 0; + for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++) + value |= (1 << qg_idx); + for (template_idx = ACC200_SIG_DL_4G; + template_idx <= ACC200_SIG_DL_4G_LAST; + template_idx++) { + address = HWPfQmgrGrpTmplateReg4Indx + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); + } + /* 5GDL */ + numQqsAcc += numQgs; + numQgs = conf->q_dl_5g.num_qgroups; + value = 0; + for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++) + value |= (1 << qg_idx); + for (template_idx = ACC200_SIG_DL_5G; + template_idx <= ACC200_SIG_DL_5G_LAST; + template_idx++) { + address = HWPfQmgrGrpTmplateReg4Indx + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); + } + /* FFT */ + numQqsAcc += numQgs; + numQgs = conf->q_fft.num_qgroups; + value = 0; + for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++) + value |= (1 << qg_idx); + for (template_idx = ACC200_SIG_FFT; + template_idx <= ACC200_SIG_FFT_LAST; + template_idx++) { + address = HWPfQmgrGrpTmplateReg4Indx + + ACC_BYTES_IN_WORD * template_idx; + acc_reg_write(d, address, value); + } + + /* Queue Group Function mapping */ + int qman_func_id[8] = {0, 2, 1, 3, 4, 0, 0, 0}; + value = 0; + for (qg_idx = 0; qg_idx < ACC_NUM_QGRPS_PER_WORD; qg_idx++) { + acc = accFromQgid(qg_idx, conf); + value |= qman_func_id[acc] << (qg_idx * 4); + } + acc_reg_write(d, HWPfQmgrGrpFunction0, value); + value = 0; + for (qg_idx = 0; qg_idx < ACC_NUM_QGRPS_PER_WORD; qg_idx++) { + acc = accFromQgid(qg_idx + ACC_NUM_QGRPS_PER_WORD, conf); + value |= qman_func_id[acc] << (qg_idx * 4); + } + acc_reg_write(d, HWPfQmgrGrpFunction1, value); + + /* Configuration of the Arbitration QGroup depth to 1 */ + for (qg_idx = 0; qg_idx < ACC200_NUM_QGRPS; qg_idx++) { + address = HWPfQmgrArbQDepthGrp + + ACC_BYTES_IN_WORD * qg_idx; + value = 0; + acc_reg_write(d, address, value); + } + + /* This pointer to ARAM (256kB) is shifted by 2 (4B per register) */ + uint32_t aram_address = 0; + for (qg_idx = 0; qg_idx < totalQgs; qg_idx++) { + for (vf_idx = 0; vf_idx < conf->num_vf_bundles; vf_idx++) { + address = HWPfQmgrVfBaseAddr + vf_idx + * ACC_BYTES_IN_WORD + qg_idx + * ACC_BYTES_IN_WORD * 64; + value = aram_address; + acc_reg_write(d, address, value); + /* Offset ARAM Address for next memory bank + * - increment of 4B + */ + aram_address += aqNum(qg_idx, conf) * + (1 << aqDepth(qg_idx, conf)); + } + } + + if (aram_address > ACC200_WORDS_IN_ARAM_SIZE) { + rte_bbdev_log(ERR, "ARAM Configuration not fitting %d %d\n", + aram_address, ACC200_WORDS_IN_ARAM_SIZE); + return -EINVAL; + } + + /* Performance tuning */ + acc_reg_write(d, HWPfFabricI2Mdma_weight, 0x0FFF); + acc_reg_write(d, HWPfDma4gdlIbThld, 0x1f10); + + /* ==== HI Configuration ==== */ + + /* No Info Ring/MSI by default */ + address = HWPfHiInfoRingIntWrEnRegPf; + value = 0; + acc_reg_write(d, address, value); + address = HWPfHiCfgMsiIntWrEnRegPf; + value = 0xFFFFFFFF; + acc_reg_write(d, address, value); + /* Prevent Block on Transmit Error */ + address = HWPfHiBlockTransmitOnErrorEn; + value = 0; + acc_reg_write(d, address, value); + /* Prevents to drop MSI */ + address = HWPfHiMsiDropEnableReg; + value = 0; + acc_reg_write(d, address, value); + /* Set the PF Mode register */ + address = HWPfHiPfMode; + value = (conf->pf_mode_en) ? ACC_PF_VAL : 0; + acc_reg_write(d, address, value); + + /* QoS overflow init */ + value = 1; + address = HWPfQosmonAEvalOverflow0; + acc_reg_write(d, address, value); + address = HWPfQosmonBEvalOverflow0; + acc_reg_write(d, address, value); + + /* Configure the FFT RAM LUT */ + uint32_t fft_lut[ACC200_FFT_RAM_SIZE] = { + 0x1FFFF, 0x1FFFF, 0x1FFFE, 0x1FFFA, 0x1FFF6, 0x1FFF1, 0x1FFEA, 0x1FFE2, + 0x1FFD9, 0x1FFCE, 0x1FFC2, 0x1FFB5, 0x1FFA7, 0x1FF98, 0x1FF87, 0x1FF75, + 0x1FF62, 0x1FF4E, 0x1FF38, 0x1FF21, 0x1FF09, 0x1FEF0, 0x1FED6, 0x1FEBA, + 0x1FE9D, 0x1FE7F, 0x1FE5F, 0x1FE3F, 0x1FE1D, 0x1FDFA, 0x1FDD5, 0x1FDB0, + 0x1FD89, 0x1FD61, 0x1FD38, 0x1FD0D, 0x1FCE1, 0x1FCB4, 0x1FC86, 0x1FC57, + 0x1FC26, 0x1FBF4, 0x1FBC1, 0x1FB8D, 0x1FB58, 0x1FB21, 0x1FAE9, 0x1FAB0, + 0x1FA75, 0x1FA3A, 0x1F9FD, 0x1F9BF, 0x1F980, 0x1F93F, 0x1F8FD, 0x1F8BA, + 0x1F876, 0x1F831, 0x1F7EA, 0x1F7A3, 0x1F75A, 0x1F70F, 0x1F6C4, 0x1F677, + 0x1F629, 0x1F5DA, 0x1F58A, 0x1F539, 0x1F4E6, 0x1F492, 0x1F43D, 0x1F3E7, + 0x1F38F, 0x1F337, 0x1F2DD, 0x1F281, 0x1F225, 0x1F1C8, 0x1F169, 0x1F109, + 0x1F0A8, 0x1F046, 0x1EFE2, 0x1EF7D, 0x1EF18, 0x1EEB0, 0x1EE48, 0x1EDDF, + 0x1ED74, 0x1ED08, 0x1EC9B, 0x1EC2D, 0x1EBBE, 0x1EB4D, 0x1EADB, 0x1EA68, + 0x1E9F4, 0x1E97F, 0x1E908, 0x1E891, 0x1E818, 0x1E79E, 0x1E722, 0x1E6A6, + 0x1E629, 0x1E5AA, 0x1E52A, 0x1E4A9, 0x1E427, 0x1E3A3, 0x1E31F, 0x1E299, + 0x1E212, 0x1E18A, 0x1E101, 0x1E076, 0x1DFEB, 0x1DF5E, 0x1dED0, 0x1DE41, + 0x1DDB1, 0x1DD20, 0x1DC8D, 0x1DBFA, 0x1DB65, 0x1DACF, 0x1DA38, 0x1D9A0, + 0x1D907, 0x1D86C, 0x1D7D1, 0x1D734, 0x1D696, 0x1D5F7, 0x1D557, 0x1D4B6, + 0x1D413, 0x1D370, 0x1D2CB, 0x1D225, 0x1D17E, 0x1D0D6, 0x1D02D, 0x1CF83, + 0x1CED8, 0x1CE2B, 0x1CD7E, 0x1CCCF, 0x1CC1F, 0x1CB6E, 0x1CABC, 0x1CA09, + 0x1C955, 0x1C89F, 0x1C7E9, 0x1C731, 0x1C679, 0x1C5BF, 0x1C504, 0x1C448, + 0x1C38B, 0x1C2CD, 0x1C20E, 0x1C14E, 0x1C08C, 0x1BFCA, 0x1BF06, 0x1BE42, + 0x1BD7C, 0x1BCB5, 0x1BBED, 0x1BB25, 0x1BA5B, 0x1B990, 0x1B8C4, 0x1B7F6, + 0x1B728, 0x1B659, 0x1B589, 0x1B4B7, 0x1B3E5, 0x1B311, 0x1B23D, 0x1B167, + 0x1B091, 0x1AFB9, 0x1AEE0, 0x1AE07, 0x1AD2C, 0x1AC50, 0x1AB73, 0x1AA95, + 0x1A9B6, 0x1A8D6, 0x1A7F6, 0x1A714, 0x1A631, 0x1A54D, 0x1A468, 0x1A382, + 0x1A29A, 0x1A1B2, 0x1A0C9, 0x19FDF, 0x19EF4, 0x19E08, 0x19D1B, 0x19C2D, + 0x19B3E, 0x19A4E, 0x1995D, 0x1986B, 0x19778, 0x19684, 0x1958F, 0x19499, + 0x193A2, 0x192AA, 0x191B1, 0x190B8, 0x18FBD, 0x18EC1, 0x18DC4, 0x18CC7, + 0x18BC8, 0x18AC8, 0x189C8, 0x188C6, 0x187C4, 0x186C1, 0x185BC, 0x184B7, + 0x183B1, 0x182AA, 0x181A2, 0x18099, 0x17F8F, 0x17E84, 0x17D78, 0x17C6C, + 0x17B5E, 0x17A4F, 0x17940, 0x17830, 0x1771E, 0x1760C, 0x174F9, 0x173E5, + 0x172D1, 0x171BB, 0x170A4, 0x16F8D, 0x16E74, 0x16D5B, 0x16C41, 0x16B26, + 0x16A0A, 0x168ED, 0x167CF, 0x166B1, 0x16592, 0x16471, 0x16350, 0x1622E, + 0x1610B, 0x15FE8, 0x15EC3, 0x15D9E, 0x15C78, 0x15B51, 0x15A29, 0x15900, + 0x157D7, 0x156AC, 0x15581, 0x15455, 0x15328, 0x151FB, 0x150CC, 0x14F9D, + 0x14E6D, 0x14D3C, 0x14C0A, 0x14AD8, 0x149A4, 0x14870, 0x1473B, 0x14606, + 0x144CF, 0x14398, 0x14260, 0x14127, 0x13FEE, 0x13EB3, 0x13D78, 0x13C3C, + 0x13B00, 0x139C2, 0x13884, 0x13745, 0x13606, 0x134C5, 0x13384, 0x13242, + 0x130FF, 0x12FBC, 0x12E78, 0x12D33, 0x12BEE, 0x12AA7, 0x12960, 0x12819, + 0x126D0, 0x12587, 0x1243D, 0x122F3, 0x121A8, 0x1205C, 0x11F0F, 0x11DC2, + 0x11C74, 0x11B25, 0x119D6, 0x11886, 0x11735, 0x115E3, 0x11491, 0x1133F, + 0x111EB, 0x11097, 0x10F42, 0x10dED, 0x10C97, 0x10B40, 0x109E9, 0x10891, + 0x10738, 0x105DF, 0x10485, 0x1032B, 0x101D0, 0x10074, 0x0FF18, 0x0FDBB, + 0x0FC5D, 0x0FAFF, 0x0F9A0, 0x0F841, 0x0F6E1, 0x0F580, 0x0F41F, 0x0F2BD, + 0x0F15B, 0x0EFF8, 0x0EE94, 0x0ED30, 0x0EBCC, 0x0EA67, 0x0E901, 0x0E79A, + 0x0E633, 0x0E4CC, 0x0E364, 0x0E1FB, 0x0E092, 0x0DF29, 0x0DDBE, 0x0DC54, + 0x0DAE9, 0x0D97D, 0x0D810, 0x0D6A4, 0x0D536, 0x0D3C8, 0x0D25A, 0x0D0EB, + 0x0CF7C, 0x0CE0C, 0x0CC9C, 0x0CB2B, 0x0C9B9, 0x0C847, 0x0C6D5, 0x0C562, + 0x0C3EF, 0x0C27B, 0x0C107, 0x0BF92, 0x0BE1D, 0x0BCA8, 0x0BB32, 0x0B9BB, + 0x0B844, 0x0B6CD, 0x0B555, 0x0B3DD, 0x0B264, 0x0B0EB, 0x0AF71, 0x0ADF7, + 0x0AC7D, 0x0AB02, 0x0A987, 0x0A80B, 0x0A68F, 0x0A513, 0x0A396, 0x0A219, + 0x0A09B, 0x09F1D, 0x09D9E, 0x09C20, 0x09AA1, 0x09921, 0x097A1, 0x09621, + 0x094A0, 0x0931F, 0x0919E, 0x0901C, 0x08E9A, 0x08D18, 0x08B95, 0x08A12, + 0x0888F, 0x0870B, 0x08587, 0x08402, 0x0827E, 0x080F9, 0x07F73, 0x07DEE, + 0x07C68, 0x07AE2, 0x0795B, 0x077D4, 0x0764D, 0x074C6, 0x0733E, 0x071B6, + 0x0702E, 0x06EA6, 0x06D1D, 0x06B94, 0x06A0B, 0x06881, 0x066F7, 0x0656D, + 0x063E3, 0x06258, 0x060CE, 0x05F43, 0x05DB7, 0x05C2C, 0x05AA0, 0x05914, + 0x05788, 0x055FC, 0x0546F, 0x052E3, 0x05156, 0x04FC9, 0x04E3B, 0x04CAE, + 0x04B20, 0x04992, 0x04804, 0x04676, 0x044E8, 0x04359, 0x041CB, 0x0403C, + 0x03EAD, 0x03D1D, 0x03B8E, 0x039FF, 0x0386F, 0x036DF, 0x0354F, 0x033BF, + 0x0322F, 0x0309F, 0x02F0F, 0x02D7E, 0x02BEE, 0x02A5D, 0x028CC, 0x0273B, + 0x025AA, 0x02419, 0x02288, 0x020F7, 0x01F65, 0x01DD4, 0x01C43, 0x01AB1, + 0x0191F, 0x0178E, 0x015FC, 0x0146A, 0x012D8, 0x01147, 0x00FB5, 0x00E23, + 0x00C91, 0x00AFF, 0x0096D, 0x007DB, 0x00648, 0x004B6, 0x00324, 0x00192}; + + acc_reg_write(d, HWPfFftRamPageAccess, ACC200_FFT_RAM_EN + 64); + for (i = 0; i < ACC200_FFT_RAM_SIZE; i++) + acc_reg_write(d, HWPfFftRamOff + i * 4, fft_lut[i]); + acc_reg_write(d, HWPfFftRamPageAccess, ACC200_FFT_RAM_DIS); + + /* Enabling AQueues through the Queue hierarchy*/ + for (vf_idx = 0; vf_idx < ACC200_NUM_VFS; vf_idx++) { + for (qg_idx = 0; qg_idx < ACC200_NUM_QGRPS; qg_idx++) { + value = 0; + if (vf_idx < conf->num_vf_bundles && + qg_idx < totalQgs) + value = (1 << aqNum(qg_idx, conf)) - 1; + address = HWPfQmgrAqEnableVf + + vf_idx * ACC_BYTES_IN_WORD; + value += (qg_idx << 16); + acc_reg_write(d, address, value); + } + } + + rte_bbdev_log_debug("PF Tip configuration complete for %s", dev_name); + return 0; +} diff --git a/drivers/baseband/acc200/version.map b/drivers/baseband/acc200/version.map index c2e0723..9542f2b 100644 --- a/drivers/baseband/acc200/version.map +++ b/drivers/baseband/acc200/version.map @@ -1,3 +1,10 @@ DPDK_22 { local: *; }; + +EXPERIMENTAL { + global: + + rte_acc200_configure; + +}; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v1 02/10] baseband/acc200: add HW register definitions 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 01/10] baseband/acc200: introduce PMD for ACC200 Nicolas Chautru @ 2022-07-08 0:01 ` Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 03/10] baseband/acc200: add info get function Nicolas Chautru ` (8 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nicolas Chautru @ 2022-07-08 0:01 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal, trix Cc: maxime.coquelin, mdr, bruce.richardson, david.marchand, stephen, Nicolas Chautru Add registers list and structure to access the device. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/acc200_pf_enum.h | 468 ++++++++++++++++++++++++ drivers/baseband/acc200/acc200_pmd.h | 588 +++++++++++++++++++++++++++++++ drivers/baseband/acc200/acc200_vf_enum.h | 89 +++++ drivers/baseband/acc200/rte_acc200_pmd.c | 2 + 4 files changed, 1147 insertions(+) create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h diff --git a/drivers/baseband/acc200/acc200_pf_enum.h b/drivers/baseband/acc200/acc200_pf_enum.h new file mode 100644 index 0000000..e8d7001 --- /dev/null +++ b/drivers/baseband/acc200/acc200_pf_enum.h @@ -0,0 +1,468 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#ifndef ACC200_PF_ENUM_H +#define ACC200_PF_ENUM_H + +/* + * ACC200 Register mapping on PF BAR0 + * This is automatically generated from RDL, format may change with new RDL + * Release. + * Variable names are as is + */ +enum { + HWPfQmgrEgressQueuesTemplate = 0x0007FC00, + HWPfQmgrIngressAq = 0x00080000, + HWPfQmgrArbQAvail = 0x00A00010, + HWPfQmgrArbQBlock = 0x00A00020, + HWPfQmgrAqueueDropNotifEn = 0x00A00024, + HWPfQmgrAqueueDisableNotifEn = 0x00A00028, + HWPfQmgrSoftReset = 0x00A00038, + HWPfQmgrInitStatus = 0x00A0003C, + HWPfQmgrAramWatchdogCount = 0x00A00040, + HWPfQmgrAramWatchdogCounterEn = 0x00A00044, + HWPfQmgrAxiWatchdogCount = 0x00A00048, + HWPfQmgrAxiWatchdogCounterEn = 0x00A0004C, + HWPfQmgrProcessWatchdogCount = 0x00A00060, + HWPfQmgrProcessWatchdogCounterEn = 0x00A00054, + HWPfQmgrProcessWatchdogCounter = 0x00A00060, + HWPfQmgrMsiOverflowUpperVf = 0x00A00080, + HWPfQmgrMsiOverflowLowerVf = 0x00A00084, + HWPfQmgrMsiWatchdogOverflow = 0x00A00088, + HWPfQmgrMsiOverflowEnable = 0x00A0008C, + HWPfQmgrDebugAqPointerMemGrp = 0x00A00100, + HWPfQmgrDebugOutputArbQFifoGrp = 0x00A00140, + HWPfQmgrDebugMsiFifoGrp = 0x00A00180, + HWPfQmgrDebugAxiWdTimeoutMsiFifo = 0x00A001C0, + HWPfQmgrDebugProcessWdTimeoutMsiFifo = 0x00A001C4, + HWPfQmgrDepthLog2Grp = 0x00A00200, + HWPfQmgrTholdGrp = 0x00A00300, + HWPfQmgrGrpTmplateReg0Indx = 0x00A00600, + HWPfQmgrGrpTmplateReg1Indx = 0x00A00700, + HWPfQmgrGrpTmplateReg2indx = 0x00A00800, + HWPfQmgrGrpTmplateReg3Indx = 0x00A00900, + HWPfQmgrGrpTmplateReg4Indx = 0x00A00A00, + HWPfQmgrVfBaseAddr = 0x00A01000, + HWPfQmgrUl4GWeightRrVf = 0x00A02000, + HWPfQmgrDl4GWeightRrVf = 0x00A02100, + HWPfQmgrUl5GWeightRrVf = 0x00A02200, + HWPfQmgrDl5GWeightRrVf = 0x00A02300, + HWPfQmgrMldWeightRrVf = 0x00A02400, + HWPfQmgrArbQDepthGrp = 0x00A02F00, + HWPfQmgrGrpFunction0 = 0x00A02F40, + HWPfQmgrGrpFunction1 = 0x00A02F44, + HWPfQmgrGrpPriority = 0x00A02F48, + HWPfQmgrWeightSync = 0x00A03000, + HWPfQmgrAqEnableVf = 0x00A10000, + HWPfQmgrAqResetVf = 0x00A20000, + HWPfQmgrRingSizeVf = 0x00A20004, + HWPfQmgrGrpDepthLog20Vf = 0x00A20008, + HWPfQmgrGrpDepthLog21Vf = 0x00A2000C, + HWPfQmgrGrpFunction0Vf = 0x00A20010, + HWPfQmgrGrpFunction1Vf = 0x00A20014, + HWPfFabricM2iBufferReg = 0x00B30000, + HWPfFabricI2Mcore_reg_g0 = 0x00B31000, + HWPfFabricI2Mcore_weight_g0 = 0x00B31004, + HWPfFabricI2Mbuffer_g0 = 0x00B31008, + HWPfFabricI2Mcore_reg_g1 = 0x00B31010, + HWPfFabricI2Mcore_weight_g1 = 0x00B31014, + HWPfFabricI2Mbuffer_g1 = 0x00B31018, + HWPfFabricI2Mcore_reg_g2 = 0x00B31020, + HWPfFabricI2Mcore_weight_g2 = 0x00B31024, + HWPfFabricI2Mbuffer_g2 = 0x00B31028, + HWPfFabricI2Mcore_reg_g3 = 0x00B31030, + HWPfFabricI2Mcore_weight_g3 = 0x00B31034, + HWPfFabricI2Mbuffer_g3 = 0x00B31038, + HWPfFabricI2Mdma_weight = 0x00B31044, + HWPfFecUl5gCntrlReg = 0x00B40000, + HWPfFecUl5gI2MThreshReg = 0x00B40004, + HWPfFecUl5gVersionReg = 0x00B40100, + HWPfFecUl5gFcwStatusReg = 0x00B40104, + HWPfFecUl5gWarnReg = 0x00B40108, + HwPfFecUl5gIbDebugReg = 0x00B40200, + HwPfFecUl5gObLlrDebugReg = 0x00B40204, + HwPfFecUl5gObHarqDebugReg = 0x00B40208, + HwPfFecUl5g1CntrlReg = 0x00B41000, + HwPfFecUl5g1I2MThreshReg = 0x00B41004, + HwPfFecUl5g1VersionReg = 0x00B41100, + HwPfFecUl5g1FcwStatusReg = 0x00B41104, + HwPfFecUl5g1WarnReg = 0x00B41108, + HwPfFecUl5g1IbDebugReg = 0x00B41200, + HwPfFecUl5g1ObLlrDebugReg = 0x00B41204, + HwPfFecUl5g1ObHarqDebugReg = 0x00B41208, + HwPfFecUl5g2CntrlReg = 0x00B42000, + HwPfFecUl5g2I2MThreshReg = 0x00B42004, + HwPfFecUl5g2VersionReg = 0x00B42100, + HwPfFecUl5g2FcwStatusReg = 0x00B42104, + HwPfFecUl5g2WarnReg = 0x00B42108, + HwPfFecUl5g2IbDebugReg = 0x00B42200, + HwPfFecUl5g2ObLlrDebugReg = 0x00B42204, + HwPfFecUl5g2ObHarqDebugReg = 0x00B42208, + HwPfFecUl5g3CntrlReg = 0x00B43000, + HwPfFecUl5g3I2MThreshReg = 0x00B43004, + HwPfFecUl5g3VersionReg = 0x00B43100, + HwPfFecUl5g3FcwStatusReg = 0x00B43104, + HwPfFecUl5g3WarnReg = 0x00B43108, + HwPfFecUl5g3IbDebugReg = 0x00B43200, + HwPfFecUl5g3ObLlrDebugReg = 0x00B43204, + HwPfFecUl5g3ObHarqDebugReg = 0x00B43208, + HwPfFecUl5g4CntrlReg = 0x00B44000, + HwPfFecUl5g4I2MThreshReg = 0x00B44004, + HwPfFecUl5g4VersionReg = 0x00B44100, + HwPfFecUl5g4FcwStatusReg = 0x00B44104, + HwPfFecUl5g4WarnReg = 0x00B44108, + HwPfFecUl5g4IbDebugReg = 0x00B44200, + HwPfFecUl5g4ObLlrDebugReg = 0x00B44204, + HwPfFecUl5g4ObHarqDebugReg = 0x00B44208, + HwPfFecUl5g5CntrlReg = 0x00B45000, + HwPfFecUl5g5I2MThreshReg = 0x00B45004, + HwPfFecUl5g5VersionReg = 0x00B45100, + HwPfFecUl5g5FcwStatusReg = 0x00B45104, + HwPfFecUl5g5WarnReg = 0x00B45108, + HwPfFecUl5g5IbDebugReg = 0x00B45200, + HwPfFecUl5g5ObLlrDebugReg = 0x00B45204, + HwPfFecUl5g5ObHarqDebugReg = 0x00B45208, + HwPfFecUl5g6CntrlReg = 0x00B46000, + HwPfFecUl5g6I2MThreshReg = 0x00B46004, + HwPfFecUl5g6VersionReg = 0x00B46100, + HwPfFecUl5g6FcwStatusReg = 0x00B46104, + HwPfFecUl5g6WarnReg = 0x00B46108, + HwPfFecUl5g6IbDebugReg = 0x00B46200, + HwPfFecUl5g6ObLlrDebugReg = 0x00B46204, + HwPfFecUl5g6ObHarqDebugReg = 0x00B46208, + HwPfFecUl5g7CntrlReg = 0x00B47000, + HwPfFecUl5g7I2MThreshReg = 0x00B47004, + HwPfFecUl5g7VersionReg = 0x00B47100, + HwPfFecUl5g7FcwStatusReg = 0x00B47104, + HwPfFecUl5g7WarnReg = 0x00B47108, + HwPfFecUl5g7IbDebugReg = 0x00B47200, + HwPfFecUl5g7ObLlrDebugReg = 0x00B47204, + HwPfFecUl5g7ObHarqDebugReg = 0x00B47208, + HwPfFecUl5g8CntrlReg = 0x00B48000, + HwPfFecUl5g8I2MThreshReg = 0x00B48004, + HwPfFecUl5g8VersionReg = 0x00B48100, + HwPfFecUl5g8FcwStatusReg = 0x00B48104, + HwPfFecUl5g8WarnReg = 0x00B48108, + HwPfFecUl5g8IbDebugReg = 0x00B48200, + HwPfFecUl5g8ObLlrDebugReg = 0x00B48204, + HwPfFecUl5g8ObHarqDebugReg = 0x00B48208, + HWPfFecDl5gCntrlReg = 0x00B4F000, + HWPfFecDl5gI2MThreshReg = 0x00B4F004, + HWPfFecDl5gVersionReg = 0x00B4F100, + HWPfFecDl5gFcwStatusReg = 0x00B4F104, + HWPfFecDl5gWarnReg = 0x00B4F108, + HWPfFecUlVersionReg = 0x00B50000, + HWPfFecUlControlReg = 0x00B50004, + HWPfFecUlStatusReg = 0x00B50008, + HWPfFftConfig0 = 0x00B58004, + HWPfFftConfig1 = 0x00B58008, + HWPfFftRamPageAccess = 0x00B5800C, + HWPfFftRamOff = 0x00B58800, + HWPfFecDlVersionReg = 0x00B5F000, + HWPfFecDlClusterConfigReg = 0x00B5F004, + HWPfFecDlBurstThres = 0x00B5F00C, + HWPfFecDlClusterStatusReg0 = 0x00B5F040, + HWPfFecDlClusterStatusReg1 = 0x00B5F044, + HWPfFecDlClusterStatusReg2 = 0x00B5F048, + HWPfFecDlClusterStatusReg3 = 0x00B5F04C, + HWPfFecDlClusterStatusReg4 = 0x00B5F050, + HWPfFecDlClusterStatusReg5 = 0x00B5F054, + HWPfDmaConfig0Reg = 0x00B80000, + HWPfDmaConfig1Reg = 0x00B80004, + HWPfDmaQmgrAddrReg = 0x00B80008, + HWPfDmaSoftResetReg = 0x00B8000C, + HWPfDmaAxcacheReg = 0x00B80010, + HWPfDmaVersionReg = 0x00B80014, + HWPfDmaFrameThreshold = 0x00B80018, + HWPfDmaTimestampLo = 0x00B8001C, + HWPfDmaTimestampHi = 0x00B80020, + HWPfDmaAxiStatus = 0x00B80028, + HWPfDmaAxiControl = 0x00B8002C, + HWPfDmaNoQmgr = 0x00B80030, + HWPfDmaQosScale = 0x00B80034, + HWPfDmaQmanen = 0x00B80040, + HWPfDmaFftModeThld = 0x00B80054, + HWPfDmaQmgrQosBase = 0x00B80060, + HWPfDmaFecClkGatingEnable = 0x00B80080, + HWPfDmaPmEnable = 0x00B80084, + HWPfDmaQosEnable = 0x00B80088, + HWPfDmaHarqWeightedRrFrameThreshold = 0x00B800B0, + HWPfDmaDataSmallWeightedRrFrameThresh = 0x00B800B4, + HWPfDmaDataLargeWeightedRrFrameThresh = 0x00B800B8, + HWPfDmaInboundCbMaxSize = 0x00B800BC, + HWPfDmaInboundDrainDataSize = 0x00B800C0, + HWPfDmaEngineTypeSmall = 0x00B800C4, + HWPfDma5gdlIbThld = 0x00B800C8, + HWPfDma4gdlIbThld = 0x00B800CC, + HWPfDmafftIbThld = 0x00B800D0, + HWPfDmaVfDdrBaseRw = 0x00B80400, + HWPfDmaCmplTmOutCnt = 0x00B80800, + HWPfDmaProcTmOutCnt = 0x00B80804, + HWPfDmaStatusRrespBresp = 0x00B80810, + HWPfDmaCfgRrespBresp = 0x00B80814, + HWPfDmaStatusMemParErr = 0x00B80818, + HWPfDmaCfgMemParErrEn = 0x00B8081C, + HWPfDmaStatusDmaHwErr = 0x00B80820, + HWPfDmaCfgDmaHwErrEn = 0x00B80824, + HWPfDmaStatusFecCoreErr = 0x00B80828, + HWPfDmaCfgFecCoreErrEn = 0x00B8082C, + HWPfDmaStatusFcwDescrErr = 0x00B80830, + HWPfDmaCfgFcwDescrErrEn = 0x00B80834, + HWPfDmaStatusBlockTransmit = 0x00B80838, + HWPfDmaBlockOnErrEn = 0x00B8083C, + HWPfDmaStatusFlushDma = 0x00B80840, + HWPfDmaFlushDmaOnErrEn = 0x00B80844, + HWPfDmaStatusSdoneFifoFull = 0x00B80848, + HWPfDmaStatusDescriptorErrLoVf = 0x00B8084C, + HWPfDmaStatusDescriptorErrHiVf = 0x00B80850, + HWPfDmaStatusFcwErrLoVf = 0x00B80854, + HWPfDmaStatusFcwErrHiVf = 0x00B80858, + HWPfDmaStatusDataErrLoVf = 0x00B8085C, + HWPfDmaStatusDataErrHiVf = 0x00B80860, + HWPfDmaCfgMsiEnSoftwareErr = 0x00B80864, + HWPfDmaDescriptorSignatuture = 0x00B80868, + HWPfDmaFcwSignature = 0x00B8086C, + HWPfDmaErrorDetectionEn = 0x00B80870, + HWPfDmaErrCntrlFifoDebug = 0x00B8087C, + HWPfDmaStatusToutData = 0x00B80880, + HWPfDmaStatusToutDesc = 0x00B80884, + HWPfDmaStatusToutUnexpData = 0x00B80888, + HWPfDmaStatusToutUnexpDesc = 0x00B8088C, + HWPfDmaStatusToutProcess = 0x00B80890, + HWPfDmaConfigCtoutOutDataEn = 0x00B808A0, + HWPfDmaConfigCtoutOutDescrEn = 0x00B808A4, + HWPfDmaConfigUnexpComplDataEn = 0x00B808A8, + HWPfDmaConfigUnexpComplDescrEn = 0x00B808AC, + HWPfDmaConfigPtoutOutEn = 0x00B808B0, + HWPfDmaFec5GulDescBaseLoRegVf = 0x00B88020, + HWPfDmaFec5GulDescBaseHiRegVf = 0x00B88024, + HWPfDmaFec5GulRespPtrLoRegVf = 0x00B88028, + HWPfDmaFec5GulRespPtrHiRegVf = 0x00B8802C, + HWPfDmaFec5GdlDescBaseLoRegVf = 0x00B88040, + HWPfDmaFec5GdlDescBaseHiRegVf = 0x00B88044, + HWPfDmaFec5GdlRespPtrLoRegVf = 0x00B88048, + HWPfDmaFec5GdlRespPtrHiRegVf = 0x00B8804C, + HWPfDmaFec4GulDescBaseLoRegVf = 0x00B88060, + HWPfDmaFec4GulDescBaseHiRegVf = 0x00B88064, + HWPfDmaFec4GulRespPtrLoRegVf = 0x00B88068, + HWPfDmaFec4GulRespPtrHiRegVf = 0x00B8806C, + HWPfDmaFec4GdlDescBaseLoRegVf = 0x00B88080, + HWPfDmaFec4GdlDescBaseHiRegVf = 0x00B88084, + HWPfDmaFec4GdlRespPtrLoRegVf = 0x00B88088, + HWPfDmaFec4GdlRespPtrHiRegVf = 0x00B8808C, + HWPDmaFftDescBaseLoRegVf = 0x00B880A0, + HWPDmaFftDescBaseHiRegVf = 0x00B880A4, + HWPDmaFftRespPtrLoRegVf = 0x00B880A8, + HWPDmaFftRespPtrHiRegVf = 0x00B880AC, + HWPfQosmonACntrlReg = 0x00B90000, + HWPfQosmonAEvalOverflow0 = 0x00B90008, + HWPfQosmonAEvalOverflow1 = 0x00B9000C, + HWPfQosmonADivTerm = 0x00B90010, + HWPfQosmonATickTerm = 0x00B90014, + HWPfQosmonAEvalTerm = 0x00B90018, + HWPfQosmonAAveTerm = 0x00B9001C, + HWPfQosmonAForceEccErr = 0x00B90020, + HWPfQosmonAEccErrDetect = 0x00B90024, + HWPfQosmonAIterationConfig0Low = 0x00B90060, + HWPfQosmonAIterationConfig0High = 0x00B90064, + HWPfQosmonAIterationConfig1Low = 0x00B90068, + HWPfQosmonAIterationConfig1High = 0x00B9006C, + HWPfQosmonAIterationConfig2Low = 0x00B90070, + HWPfQosmonAIterationConfig2High = 0x00B90074, + HWPfQosmonAIterationConfig3Low = 0x00B90078, + HWPfQosmonAIterationConfig3High = 0x00B9007C, + HWPfQosmonAEvalMemAddr = 0x00B90080, + HWPfQosmonAEvalMemData = 0x00B90084, + HWPfQosmonAXaction = 0x00B900C0, + HWPfQosmonARemThres1Vf = 0x00B90400, + HWPfQosmonAThres2Vf = 0x00B90404, + HWPfQosmonAWeiFracVf = 0x00B90408, + HWPfQosmonARrWeiVf = 0x00B9040C, + HWPfPermonACntrlRegVf = 0x00B98000, + HWPfPermonACountVf = 0x00B98008, + HWPfPermonAKCntLoVf = 0x00B98010, + HWPfPermonAKCntHiVf = 0x00B98014, + HWPfPermonADeltaCntLoVf = 0x00B98020, + HWPfPermonADeltaCntHiVf = 0x00B98024, + HWPfPermonAVersionReg = 0x00B9C000, + HWPfPermonACbControlFec = 0x00B9C0F0, + HWPfPermonADltTimerLoFec = 0x00B9C0F4, + HWPfPermonADltTimerHiFec = 0x00B9C0F8, + HWPfPermonACbCountFec = 0x00B9C100, + HWPfPermonAAccExecTimerLoFec = 0x00B9C104, + HWPfPermonAAccExecTimerHiFec = 0x00B9C108, + HWPfPermonAExecTimerMinFec = 0x00B9C200, + HWPfPermonAExecTimerMaxFec = 0x00B9C204, + HWPfPermonAControlBusMon = 0x00B9C400, + HWPfPermonAConfigBusMon = 0x00B9C404, + HWPfPermonASkipCountBusMon = 0x00B9C408, + HWPfPermonAMinLatBusMon = 0x00B9C40C, + HWPfPermonAMaxLatBusMon = 0x00B9C500, + HWPfPermonATotalLatLowBusMon = 0x00B9C504, + HWPfPermonATotalLatUpperBusMon = 0x00B9C508, + HWPfPermonATotalReqCntBusMon = 0x00B9C50C, + HWPfQosmonBCntrlReg = 0x00BA0000, + HWPfQosmonBEvalOverflow0 = 0x00BA0008, + HWPfQosmonBEvalOverflow1 = 0x00BA000C, + HWPfQosmonBDivTerm = 0x00BA0010, + HWPfQosmonBTickTerm = 0x00BA0014, + HWPfQosmonBEvalTerm = 0x00BA0018, + HWPfQosmonBAveTerm = 0x00BA001C, + HWPfQosmonBForceEccErr = 0x00BA0020, + HWPfQosmonBEccErrDetect = 0x00BA0024, + HWPfQosmonBIterationConfig0Low = 0x00BA0060, + HWPfQosmonBIterationConfig0High = 0x00BA0064, + HWPfQosmonBIterationConfig1Low = 0x00BA0068, + HWPfQosmonBIterationConfig1High = 0x00BA006C, + HWPfQosmonBIterationConfig2Low = 0x00BA0070, + HWPfQosmonBIterationConfig2High = 0x00BA0074, + HWPfQosmonBIterationConfig3Low = 0x00BA0078, + HWPfQosmonBIterationConfig3High = 0x00BA007C, + HWPfQosmonBEvalMemAddr = 0x00BA0080, + HWPfQosmonBEvalMemData = 0x00BA0084, + HWPfQosmonBXaction = 0x00BA00C0, + HWPfQosmonBRemThres1Vf = 0x00BA0400, + HWPfQosmonBThres2Vf = 0x00BA0404, + HWPfQosmonBWeiFracVf = 0x00BA0408, + HWPfQosmonBRrWeiVf = 0x00BA040C, + HWPfPermonBCntrlRegVf = 0x00BA8000, + HWPfPermonBCountVf = 0x00BA8008, + HWPfPermonBKCntLoVf = 0x00BA8010, + HWPfPermonBKCntHiVf = 0x00BA8014, + HWPfPermonBDeltaCntLoVf = 0x00BA8020, + HWPfPermonBDeltaCntHiVf = 0x00BA8024, + HWPfPermonBVersionReg = 0x00BAC000, + HWPfPermonBCbControlFec = 0x00BAC0F0, + HWPfPermonBDltTimerLoFec = 0x00BAC0F4, + HWPfPermonBDltTimerHiFec = 0x00BAC0F8, + HWPfPermonBCbCountFec = 0x00BAC100, + HWPfPermonBAccExecTimerLoFec = 0x00BAC104, + HWPfPermonBAccExecTimerHiFec = 0x00BAC108, + HWPfPermonBExecTimerMinFec = 0x00BAC200, + HWPfPermonBExecTimerMaxFec = 0x00BAC204, + HWPfPermonBControlBusMon = 0x00BAC400, + HWPfPermonBConfigBusMon = 0x00BAC404, + HWPfPermonBSkipCountBusMon = 0x00BAC408, + HWPfPermonBMinLatBusMon = 0x00BAC40C, + HWPfPermonBMaxLatBusMon = 0x00BAC500, + HWPfPermonBTotalLatLowBusMon = 0x00BAC504, + HWPfPermonBTotalLatUpperBusMon = 0x00BAC508, + HWPfPermonBTotalReqCntBusMon = 0x00BAC50C, + HWPfQosmonCCntrlReg = 0x00BB0000, + HWPfQosmonCEvalOverflow0 = 0x00BB0008, + HWPfQosmonCEvalOverflow1 = 0x00BB000C, + HWPfQosmonCDivTerm = 0x00BB0010, + HWPfQosmonCTickTerm = 0x00BB0014, + HWPfQosmonCEvalTerm = 0x00BB0018, + HWPfQosmonCAveTerm = 0x00BB001C, + HWPfQosmonCForceEccErr = 0x00BB0020, + HWPfQosmonCEccErrDetect = 0x00BB0024, + HWPfQosmonCIterationConfig0Low = 0x00BB0060, + HWPfQosmonCIterationConfig0High = 0x00BB0064, + HWPfQosmonCIterationConfig1Low = 0x00BB0068, + HWPfQosmonCIterationConfig1High = 0x00BB006C, + HWPfQosmonCIterationConfig2Low = 0x00BB0070, + HWPfQosmonCIterationConfig2High = 0x00BB0074, + HWPfQosmonCIterationConfig3Low = 0x00BB0078, + HWPfQosmonCIterationConfig3High = 0x00BB007C, + HWPfQosmonCEvalMemAddr = 0x00BB0080, + HWPfQosmonCEvalMemData = 0x00BB0084, + HWPfQosmonCXaction = 0x00BB00C0, + HWPfQosmonCRemThres1Vf = 0x00BB0400, + HWPfQosmonCThres2Vf = 0x00BB0404, + HWPfQosmonCWeiFracVf = 0x00BB0408, + HWPfQosmonCRrWeiVf = 0x00BB040C, + HWPfPermonCCntrlRegVf = 0x00BB8000, + HWPfPermonCCountVf = 0x00BB8008, + HWPfPermonCKCntLoVf = 0x00BB8010, + HWPfPermonCKCntHiVf = 0x00BB8014, + HWPfPermonCDeltaCntLoVf = 0x00BB8020, + HWPfPermonCDeltaCntHiVf = 0x00BB8024, + HWPfPermonCVersionReg = 0x00BBC000, + HWPfPermonCCbControlFec = 0x00BBC0F0, + HWPfPermonCDltTimerLoFec = 0x00BBC0F4, + HWPfPermonCDltTimerHiFec = 0x00BBC0F8, + HWPfPermonCCbCountFec = 0x00BBC100, + HWPfPermonCAccExecTimerLoFec = 0x00BBC104, + HWPfPermonCAccExecTimerHiFec = 0x00BBC108, + HWPfPermonCExecTimerMinFec = 0x00BBC200, + HWPfPermonCExecTimerMaxFec = 0x00BBC204, + HWPfPermonCControlBusMon = 0x00BBC400, + HWPfPermonCConfigBusMon = 0x00BBC404, + HWPfPermonCSkipCountBusMon = 0x00BBC408, + HWPfPermonCMinLatBusMon = 0x00BBC40C, + HWPfPermonCMaxLatBusMon = 0x00BBC500, + HWPfPermonCTotalLatLowBusMon = 0x00BBC504, + HWPfPermonCTotalLatUpperBusMon = 0x00BBC508, + HWPfPermonCTotalReqCntBusMon = 0x00BBC50C, + HWPfHiVfToPfDbellVf = 0x00C80000, + HWPfHiPfToVfDbellVf = 0x00C80008, + HWPfHiInfoRingBaseLoVf = 0x00C80010, + HWPfHiInfoRingBaseHiVf = 0x00C80014, + HWPfHiInfoRingPointerVf = 0x00C80018, + HWPfHiInfoRingIntWrEnVf = 0x00C80020, + HWPfHiInfoRingPf2VfWrEnVf = 0x00C80024, + HWPfHiMsixVectorMapperVf = 0x00C80060, + HWPfHiModuleVersionReg = 0x00C84000, + HWPfHiIosf2axiErrLogReg = 0x00C84004, + HWPfHiHardResetReg = 0x00C84008, + HWPfHi5GHardResetReg = 0x00C8400C, + HWPfHiInfoRingBaseLoRegPf = 0x00C84014, + HWPfHiInfoRingBaseHiRegPf = 0x00C84018, + HWPfHiInfoRingPointerRegPf = 0x00C8401C, + HWPfHiInfoRingIntWrEnRegPf = 0x00C84020, + HWPfHiInfoRingVf2pfLoWrEnReg = 0x00C84024, + HWPfHiInfoRingVf2pfHiWrEnReg = 0x00C84028, + HWPfHiLogParityErrStatusReg = 0x00C8402C, + HWPfHiLogDataParityErrorVfStatusLo = 0x00C84030, + HWPfHiLogDataParityErrorVfStatusHi = 0x00C84034, + HWPfHiBlockTransmitOnErrorEn = 0x00C84038, + HWPfHiCfgMsiIntWrEnRegPf = 0x00C84040, + HWPfHiCfgMsiVf2pfLoWrEnReg = 0x00C84044, + HWPfHiCfgMsiVf2pfHighWrEnReg = 0x00C84048, + HWPfHiMsixVectorMapperPf = 0x00C84060, + HWPfHiApbWrWaitTime = 0x00C84100, + HWPfHiXCounterMaxValue = 0x00C84104, + HWPfHiPfMode = 0x00C84108, + HWPfHiClkGateHystReg = 0x00C8410C, + HWPfHiSnoopBitsReg = 0x00C84110, + HWPfHiMsiDropEnableReg = 0x00C84114, + HWPfHiMsiStatReg = 0x00C84120, + HWPfHiFifoOflStatReg = 0x00C84124, + HWPfHiSectionPowerGatingReq = 0x00C84128, + HWPfHiSectionPowerGatingAck = 0x00C8412C, + HWPfHiSectionPowerGatingWaitCounter = 0x00C84130, + HWPfHiHiDebugReg = 0x00C841F4, + HWPfHiDebugMemSnoopMsiFifo = 0x00C841F8, + HWPfHiDebugMemSnoopInputFifo = 0x00C841FC, + HWPfHiMsixMappingConfig = 0x00C84200, + HWPfHiJunkReg = 0x00C8FF00, + HWPfHiMSIXBaseLoRegPf = 0x00D20000, + HWPfHiMSIXBaseHiRegPf = 0x00D20004, + HWPfHiMSIXBaseDataRegPf = 0x00D20008, + HWPfHiMSIXBaseMaskRegPf = 0x00D2000c, + HWPfHiMSIXPBABaseLoRegPf = 0x00E01000, +}; + +/* TIP PF Interrupt numbers */ +enum { + ACC200_PF_INT_QMGR_AQ_OVERFLOW = 0, + ACC200_PF_INT_DOORBELL_VF_2_PF = 1, + ACC200_PF_INT_ILLEGAL_FORMAT = 2, + ACC200_PF_INT_QMGR_DISABLED_ACCESS = 3, + ACC200_PF_INT_QMGR_AQ_OVERTHRESHOLD = 4, + ACC200_PF_INT_DMA_DL_DESC_IRQ = 5, + ACC200_PF_INT_DMA_UL_DESC_IRQ = 6, + ACC200_PF_INT_DMA_FFT_DESC_IRQ = 7, + ACC200_PF_INT_DMA_UL5G_DESC_IRQ = 8, + ACC200_PF_INT_DMA_DL5G_DESC_IRQ = 9, + ACC200_PF_INT_DMA_MLD_DESC_IRQ = 10, + ACC200_PF_INT_ARAM_ECC_1BIT_ERR = 11, + ACC200_PF_INT_PARITY_ERR = 12, + ACC200_PF_INT_QMGR_ERR = 13, + ACC200_PF_INT_INT_REQ_OVERFLOW = 14, + ACC200_PF_INT_APB_TIMEOUT = 15, +}; + +#endif /* ACC200_PF_ENUM_H */ diff --git a/drivers/baseband/acc200/acc200_pmd.h b/drivers/baseband/acc200/acc200_pmd.h index a22ca67..b420524 100644 --- a/drivers/baseband/acc200/acc200_pmd.h +++ b/drivers/baseband/acc200/acc200_pmd.h @@ -5,6 +5,9 @@ #ifndef _RTE_ACC200_PMD_H_ #define _RTE_ACC200_PMD_H_ +#include "acc200_pf_enum.h" +#include "acc200_vf_enum.h" + /* Helper macro for logging */ #define rte_bbdev_log(level, fmt, ...) \ rte_log(RTE_LOG_ ## level, acc200_logtype, fmt "\n", \ @@ -27,6 +30,591 @@ #define RTE_ACC200_PF_DEVICE_ID (0x57C0) #define RTE_ACC200_VF_DEVICE_ID (0x57C1) +/* Define as 1 to use only a single FEC engine */ +#ifndef RTE_ACC200_SINGLE_FEC +#define RTE_ACC200_SINGLE_FEC 0 +#endif + +/* Values used in filling in descriptors */ +#define ACC200_DMA_DESC_TYPE 2 +#define ACC200_DMA_CODE_BLK_MODE 0 +#define ACC200_DMA_BLKID_FCW 1 +#define ACC200_DMA_BLKID_IN 2 +#define ACC200_DMA_BLKID_OUT_ENC 1 +#define ACC200_DMA_BLKID_OUT_HARD 1 +#define ACC200_DMA_BLKID_OUT_SOFT 2 +#define ACC200_DMA_BLKID_OUT_HARQ 3 +#define ACC200_DMA_BLKID_IN_HARQ 3 + +/* Values used in filling in decode FCWs */ +#define ACC200_FCW_TD_VER 1 +#define ACC200_FCW_TD_EXT_COLD_REG_EN 1 +#define ACC200_FCW_TD_AUTOMAP 0x0f +#define ACC200_FCW_TD_RVIDX_0 2 +#define ACC200_FCW_TD_RVIDX_1 26 +#define ACC200_FCW_TD_RVIDX_2 50 +#define ACC200_FCW_TD_RVIDX_3 74 +#define ACC200_MAX_PF_MSIX (256+32) +#define ACC200_MAX_VF_MSIX (256+7) + +/* Values used in writing to the registers */ +#define ACC200_REG_IRQ_EN_ALL 0x1FF83FF /* Enable all interrupts */ + +/* ACC200 Specific Dimensioning */ +#define ACC200_SIZE_64MBYTE (64*1024*1024) +/* Number of elements in an Info Ring */ +#define ACC200_INFO_RING_NUM_ENTRIES 1024 +/* Number of elements in HARQ layout memory + * 128M x 32kB = 4GB addressable memory + */ +#define ACC200_HARQ_LAYOUT (128 * 1024 * 1024) +/* Assume offset for HARQ in memory */ +#define ACC200_HARQ_OFFSET (32 * 1024) +#define ACC200_HARQ_OFFSET_SHIFT 15 +#define ACC200_HARQ_OFFSET_MASK 0x7ffffff +/* Mask used to calculate an index in an Info Ring array (not a byte offset) */ +#define ACC200_INFO_RING_MASK (ACC200_INFO_RING_NUM_ENTRIES-1) +/* Number of Virtual Functions ACC200 supports */ +#define ACC200_NUM_VFS 16 +#define ACC200_NUM_QGRPS 16 +#define ACC200_NUM_QGRPS_PER_WORD 8 +#define ACC200_NUM_AQS 16 +#define MAX_ENQ_BATCH_SIZE 255 +/* All ACC200 Registers alignment are 32bits = 4B */ +#define ACC200_BYTES_IN_WORD 4 +#define ACC200_MAX_E_MBUF 64000 +#define ACC200_ALGO_SPA 0 +#define ACC200_ALGO_MSA 1 + +#define ACC200_GRP_ID_SHIFT 10 /* Queue Index Hierarchy */ +#define ACC200_VF_ID_SHIFT 4 /* Queue Index Hierarchy */ +#define ACC200_VF_OFFSET_QOS 16 /* offset in Memory specific to QoS Mon */ +#define ACC200_TMPL_PRI_0 0x03020100 +#define ACC200_TMPL_PRI_1 0x07060504 +#define ACC200_TMPL_PRI_2 0x0b0a0908 +#define ACC200_TMPL_PRI_3 0x0f0e0d0c +#define ACC200_QUEUE_ENABLE 0x80000000 /* Bit to mark Queue as Enabled */ +#define ACC200_WORDS_IN_ARAM_SIZE (256 * 1024 / 4) +#define ACC200_FDONE 0x80000000 +#define ACC200_SDONE 0x40000000 + +#define ACC200_NUM_TMPL 32 +/* Mapping of signals for the available engines */ +#define ACC200_SIG_UL_5G 0 +#define ACC200_SIG_UL_5G_LAST 4 +#define ACC200_SIG_DL_5G 10 +#define ACC200_SIG_DL_5G_LAST 11 +#define ACC200_SIG_UL_4G 12 +#define ACC200_SIG_UL_4G_LAST 16 +#define ACC200_SIG_DL_4G 21 +#define ACC200_SIG_DL_4G_LAST 23 +#define ACC200_SIG_FFT 24 +#define ACC200_SIG_FFT_LAST 24 + +#define ACC200_NUM_ACCS 5 /* FIXMEFFT */ +#define ACC200_ACCMAP_0 0 +#define ACC200_ACCMAP_1 2 +#define ACC200_ACCMAP_2 1 +#define ACC200_ACCMAP_3 3 +#define ACC200_ACCMAP_4 4 +#define ACC200_PF_VAL 2 + +/* max number of iterations to allocate memory block for all rings */ +#define ACC200_SW_RING_MEM_ALLOC_ATTEMPTS 5 +#define ACC200_MAX_QUEUE_DEPTH 1024 +#define ACC200_DMA_MAX_NUM_POINTERS 14 +#define ACC200_DMA_MAX_NUM_POINTERS_IN 7 +#define ACC200_DMA_DESC_PADDING 8 +#define ACC200_FCW_PADDING 12 +#define ACC200_DESC_FCW_OFFSET 192 +#define ACC200_DESC_SIZE 256 +#define ACC200_DESC_OFFSET (ACC200_DESC_SIZE / 64) +#define ACC200_FCW_TE_BLEN 32 +#define ACC200_FCW_TD_BLEN 24 +#define ACC200_FCW_LE_BLEN 32 +#define ACC200_FCW_LD_BLEN 36 +#define ACC200_FCW_FFT_BLEN 28 +#define ACC200_5GUL_SIZE_0 16 +#define ACC200_5GUL_SIZE_1 40 +#define ACC200_5GUL_OFFSET_0 36 +#define ACC200_COMPANION_PTRS 8 + +#define ACC200_FCW_VER 2 +#define ACC200_MUX_5GDL_DESC 6 +#define ACC200_CMP_ENC_SIZE 20 +#define ACC200_CMP_DEC_SIZE 24 +#define ACC200_ENC_OFFSET (32) +#define ACC200_DEC_OFFSET (80) +#define ACC200_HARQ_OFFSET_THRESHOLD 1024 +#define ACC200_LIMIT_DL_MUX_BITS 534 + +/* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2 */ +#define ACC200_N_ZC_1 66 /* N = 66 Zc for BG 1 */ +#define ACC200_N_ZC_2 50 /* N = 50 Zc for BG 2 */ +#define ACC200_K0_1_1 17 /* K0 fraction numerator for rv 1 and BG 1 */ +#define ACC200_K0_1_2 13 /* K0 fraction numerator for rv 1 and BG 2 */ +#define ACC200_K0_2_1 33 /* K0 fraction numerator for rv 2 and BG 1 */ +#define ACC200_K0_2_2 25 /* K0 fraction numerator for rv 2 and BG 2 */ +#define ACC200_K0_3_1 56 /* K0 fraction numerator for rv 3 and BG 1 */ +#define ACC200_K0_3_2 43 /* K0 fraction numerator for rv 3 and BG 2 */ + +/* ACC200 Configuration */ +#define ACC200_FABRIC_MODE 0x8000103 +#define ACC200_CFG_DMA_ERROR 0x3DF +#define ACC200_CFG_AXI_CACHE 0x11 +#define ACC200_CFG_QMGR_HI_P 0x0F0F +#define ACC200_ENGINE_OFFSET 0x1000 +#define ACC200_RESET_HI 0x20100 +#define ACC200_RESET_LO 0x20000 +#define ACC200_RESET_HARD 0x1FF +#define ACC200_ENGINES_MAX 9 +#define ACC200_LONG_WAIT 1000 +#define ACC200_GPEX_AXIMAP_NUM 17 +#define ACC200_CLOCK_GATING_EN 0x30000 +#define ACC200_MS_IN_US (1000) +#define ACC200_FFT_CFG_0 0x2001 +#define ACC200_FFT_RAM_EN 0x80008000 +#define ACC200_FFT_RAM_DIS 0x0 +#define ACC200_FFT_RAM_SIZE 512 +#define ACC200_CLK_EN 0x00010A01 +#define ACC200_CLK_DIS 0x01F10A01 +#define ACC200_PG_MASK_0 0x1F +#define ACC200_PG_MASK_1 0xF +#define ACC200_PG_MASK_2 0x1 +#define ACC200_PG_MASK_3 0x0 +#define ACC200_PG_MASK_FFT 1 +#define ACC200_PG_MASK_4GUL 4 +#define ACC200_PG_MASK_5GUL 8 +#define ACC200_STATUS_WAIT 10 +#define ACC200_STATUS_TO 100 + +/* ACC200 DMA Descriptor triplet */ +struct acc200_dma_triplet { + uint64_t address; + uint32_t blen:20, + res0:4, + last:1, + dma_ext:1, + res1:2, + blkid:4; +} __rte_packed; + +/* ACC200 DMA Response Descriptor */ +union acc200_dma_rsp_desc { + uint32_t val; + struct { + uint32_t crc_status:1, + synd_ok:1, + dma_err:1, + neg_stop:1, + fcw_err:1, + output_truncat:1, + input_err:1, + timestampEn:1, + iterCountFrac:8, + iter_cnt:8, + rsrvd3:6, + sdone:1, + fdone:1; + uint32_t add_info_0; + uint32_t add_info_1; + }; +}; + + +/* ACC200 Queue Manager Enqueue PCI Register */ +union acc200_enqueue_reg_fmt { + uint32_t val; + struct { + uint32_t num_elem:8, + addr_offset:3, + rsrvd:1, + req_elem_addr:20; + }; +}; + +/* FEC 4G Uplink Frame Control Word */ +struct __rte_packed acc200_fcw_td { + uint8_t fcw_ver:4, + num_maps:4; + uint8_t filler:6, + rsrvd0:1, + bypass_sb_deint:1; + uint16_t k_pos; + uint16_t k_neg; + uint8_t c_neg; + uint8_t c; + uint32_t ea; + uint32_t eb; + uint8_t cab; + uint8_t k0_start_col; + uint8_t rsrvd1; + uint8_t code_block_mode:1, + turbo_crc_type:1, + rsrvd2:3, + bypass_teq:1, + soft_output_en:1, + ext_td_cold_reg_en:1; + union { /* External Cold register */ + uint32_t ext_td_cold_reg; + struct { + uint32_t min_iter:4, + max_iter:4, + ext_scale:5, + rsrvd3:3, + early_stop_en:1, + sw_soft_out_dis:1, + sw_et_cont:1, + sw_soft_out_saturation:1, + half_iter_on:1, + raw_decoder_input_on:1, /* Unused */ + rsrvd4:10; + }; + }; +}; + +/* FEC 5GNR Uplink Frame Control Word */ +struct __rte_packed acc200_fcw_ld { + uint32_t FCWversion:4, + qm:4, + nfiller:11, + BG:1, + Zc:9, + cnu_algo:1, + synd_precoder:1, + synd_post:1; + uint32_t ncb:16, + k0:16; + uint32_t rm_e:24, + hcin_en:1, + hcout_en:1, + crc_select:1, + bypass_dec:1, + bypass_intlv:1, + so_en:1, + so_bypass_rm:1, + so_bypass_intlv:1; + uint32_t hcin_offset:16, + hcin_size0:16; + uint32_t hcin_size1:16, + hcin_decomp_mode:3, + llr_pack_mode:1, + hcout_comp_mode:3, + res2:1, + dec_convllr:4, + hcout_convllr:4; + uint32_t itmax:7, + itstop:1, + so_it:7, + res3:1, + hcout_offset:16; + uint32_t hcout_size0:16, + hcout_size1:16; + uint32_t gain_i:8, + gain_h:8, + negstop_th:16; + uint32_t negstop_it:7, + negstop_en:1, + tb_crc_select:2, + res4:2, + tb_trailer_size:20; +}; + +/* FEC 4G Downlink Frame Control Word */ +struct __rte_packed acc200_fcw_te { + uint16_t k_neg; + uint16_t k_pos; + uint8_t c_neg; + uint8_t c; + uint8_t filler; + uint8_t cab; + uint32_t ea:17, + rsrvd0:15; + uint32_t eb:17, + rsrvd1:15; + uint16_t ncb_neg; + uint16_t ncb_pos; + uint8_t rv_idx0:2, + rsrvd2:2, + rv_idx1:2, + rsrvd3:2; + uint8_t bypass_rv_idx0:1, + bypass_rv_idx1:1, + bypass_rm:1, + rsrvd4:5; + uint8_t rsrvd5:1, + rsrvd6:3, + code_block_crc:1, + rsrvd7:3; + uint8_t code_block_mode:1, + rsrvd8:7; + uint64_t rsrvd9; +}; + +/* FEC 5GNR Downlink Frame Control Word */ +struct __rte_packed acc200_fcw_le { + uint32_t FCWversion:4, + qm:4, + nfiller:11, + BG:1, + Zc:9, + res0:3; + uint32_t ncb:16, + k0:16; + uint32_t rm_e:24, + res1:2, + crc_select:1, + res2:1, + bypass_intlv:1, + res3:3; + uint32_t res4_a:12, + mcb_count:3, + res4_b:17; + uint32_t res5; + uint32_t res6; + uint32_t res7; + uint32_t res8; +}; + +/* FFT Frame Control Word */ +struct __rte_packed acc200_fcw_fft { + uint32_t in_frame_size:16, + leading_pad_size:16; + uint32_t out_frame_size:16, + leading_depad_size:16; + uint32_t cs_window_sel; + uint32_t cs_window_sel2:16, + cs_enable_bmap:16; + uint32_t num_antennas:8, + idft_size:8, + dft_size:8, + cs_offset:8; + uint32_t idft_shift:8, + dft_shift:8, + cs_multiplier:16; + uint32_t bypass:2, + res:30; +}; + +struct __rte_packed acc200_pad_ptr { + void *op_addr; + uint64_t pad1; /* pad to 64 bits */ +}; + +struct __rte_packed acc200_ptrs { + struct acc200_pad_ptr ptr[ACC200_COMPANION_PTRS]; +}; + +/* ACC200 DMA Request Descriptor */ +struct __rte_packed acc200_dma_req_desc { + union { + struct{ + uint32_t type:4, + rsrvd0:26, + sdone:1, + fdone:1; + uint32_t ib_ant_offset:16, + res2:12, + num_ant:4; + uint32_t ob_ant_offset:16, + ob_cyc_offset:12, + num_cs:4; + uint32_t pass_param:8, + sdone_enable:1, + irq_enable:1, + timeStampEn:1, + res0:5, + numCBs:4, + res1:4, + m2dlen:4, + d2mlen:4; + }; + struct{ + uint32_t word0; + uint32_t word1; + uint32_t word2; + uint32_t word3; + }; + }; + struct acc200_dma_triplet data_ptrs[ACC200_DMA_MAX_NUM_POINTERS]; + + /* Virtual addresses used to retrieve SW context info */ + union { + void *op_addr; + uint64_t pad1; /* pad to 64 bits */ + }; + /* + * Stores additional information needed for driver processing: + * - last_desc_in_batch - flag used to mark last descriptor (CB) + * in batch + * - cbs_in_tb - stores information about total number of Code Blocks + * in currently processed Transport Block + */ + union { + struct { + union { + struct acc200_fcw_ld fcw_ld; + struct acc200_fcw_td fcw_td; + struct acc200_fcw_le fcw_le; + struct acc200_fcw_te fcw_te; + struct acc200_fcw_fft fcw_fft; + uint32_t pad2[ACC200_FCW_PADDING]; + }; + uint32_t last_desc_in_batch :8, + cbs_in_tb:8, + pad4 : 16; + }; + uint64_t pad3[ACC200_DMA_DESC_PADDING]; /* pad to 64 bits */ + }; +}; + +/* ACC200 DMA Descriptor */ +union acc200_dma_desc { + struct acc200_dma_req_desc req; + union acc200_dma_rsp_desc rsp; + uint64_t atom_hdr; +}; + + +/* Union describing Info Ring entry */ +union acc200_harq_layout_data { + uint32_t val; + struct { + uint16_t offset; + uint16_t size0; + }; +} __rte_packed; + + +/* Union describing Info Ring entry */ +union acc200_info_ring_data { + uint32_t val; + struct { + union { + uint16_t detailed_info; + struct { + uint16_t aq_id: 4; + uint16_t qg_id: 4; + uint16_t vf_id: 6; + uint16_t reserved: 2; + }; + }; + uint16_t int_nb: 7; + uint16_t msi_0: 1; + uint16_t vf2pf: 6; + uint16_t loop: 1; + uint16_t valid: 1; + }; +} __rte_packed; + +struct acc200_registry_addr { + unsigned int dma_ring_dl5g_hi; + unsigned int dma_ring_dl5g_lo; + unsigned int dma_ring_ul5g_hi; + unsigned int dma_ring_ul5g_lo; + unsigned int dma_ring_dl4g_hi; + unsigned int dma_ring_dl4g_lo; + unsigned int dma_ring_ul4g_hi; + unsigned int dma_ring_ul4g_lo; + unsigned int dma_ring_fft_hi; + unsigned int dma_ring_fft_lo; + unsigned int ring_size; + unsigned int info_ring_hi; + unsigned int info_ring_lo; + unsigned int info_ring_en; + unsigned int info_ring_ptr; + unsigned int tail_ptrs_dl5g_hi; + unsigned int tail_ptrs_dl5g_lo; + unsigned int tail_ptrs_ul5g_hi; + unsigned int tail_ptrs_ul5g_lo; + unsigned int tail_ptrs_dl4g_hi; + unsigned int tail_ptrs_dl4g_lo; + unsigned int tail_ptrs_ul4g_hi; + unsigned int tail_ptrs_ul4g_lo; + unsigned int tail_ptrs_fft_hi; + unsigned int tail_ptrs_fft_lo; + unsigned int depth_log0_offset; + unsigned int depth_log1_offset; + unsigned int qman_group_func; + unsigned int hi_mode; + unsigned int pmon_ctrl_a; + unsigned int pmon_ctrl_b; + unsigned int pmon_ctrl_c; +}; + +/* Structure holding registry addresses for PF */ +static const struct acc200_registry_addr pf_reg_addr = { + .dma_ring_dl5g_hi = HWPfDmaFec5GdlDescBaseHiRegVf, + .dma_ring_dl5g_lo = HWPfDmaFec5GdlDescBaseLoRegVf, + .dma_ring_ul5g_hi = HWPfDmaFec5GulDescBaseHiRegVf, + .dma_ring_ul5g_lo = HWPfDmaFec5GulDescBaseLoRegVf, + .dma_ring_dl4g_hi = HWPfDmaFec4GdlDescBaseHiRegVf, + .dma_ring_dl4g_lo = HWPfDmaFec4GdlDescBaseLoRegVf, + .dma_ring_ul4g_hi = HWPfDmaFec4GulDescBaseHiRegVf, + .dma_ring_ul4g_lo = HWPfDmaFec4GulDescBaseLoRegVf, + .dma_ring_fft_hi = HWPDmaFftDescBaseHiRegVf, + .dma_ring_fft_lo = HWPDmaFftDescBaseLoRegVf, + .ring_size = HWPfQmgrRingSizeVf, + .info_ring_hi = HWPfHiInfoRingBaseHiRegPf, + .info_ring_lo = HWPfHiInfoRingBaseLoRegPf, + .info_ring_en = HWPfHiInfoRingIntWrEnRegPf, + .info_ring_ptr = HWPfHiInfoRingPointerRegPf, + .tail_ptrs_dl5g_hi = HWPfDmaFec5GdlRespPtrHiRegVf, + .tail_ptrs_dl5g_lo = HWPfDmaFec5GdlRespPtrLoRegVf, + .tail_ptrs_ul5g_hi = HWPfDmaFec5GulRespPtrHiRegVf, + .tail_ptrs_ul5g_lo = HWPfDmaFec5GulRespPtrLoRegVf, + .tail_ptrs_dl4g_hi = HWPfDmaFec4GdlRespPtrHiRegVf, + .tail_ptrs_dl4g_lo = HWPfDmaFec4GdlRespPtrLoRegVf, + .tail_ptrs_ul4g_hi = HWPfDmaFec4GulRespPtrHiRegVf, + .tail_ptrs_ul4g_lo = HWPfDmaFec4GulRespPtrLoRegVf, + .tail_ptrs_fft_hi = HWPDmaFftRespPtrHiRegVf, + .tail_ptrs_fft_lo = HWPDmaFftRespPtrLoRegVf, + .depth_log0_offset = HWPfQmgrGrpDepthLog20Vf, + .depth_log1_offset = HWPfQmgrGrpDepthLog21Vf, + .qman_group_func = HWPfQmgrGrpFunction0, + .hi_mode = HWPfHiMsixVectorMapperPf, + .pmon_ctrl_a = HWPfPermonACntrlRegVf, + .pmon_ctrl_b = HWPfPermonBCntrlRegVf, + .pmon_ctrl_c = HWPfPermonCCntrlRegVf, +}; + +/* Structure holding registry addresses for VF */ +static const struct acc200_registry_addr vf_reg_addr = { + .dma_ring_dl5g_hi = HWVfDmaFec5GdlDescBaseHiRegVf, + .dma_ring_dl5g_lo = HWVfDmaFec5GdlDescBaseLoRegVf, + .dma_ring_ul5g_hi = HWVfDmaFec5GulDescBaseHiRegVf, + .dma_ring_ul5g_lo = HWVfDmaFec5GulDescBaseLoRegVf, + .dma_ring_dl4g_hi = HWVfDmaFec4GdlDescBaseHiRegVf, + .dma_ring_dl4g_lo = HWVfDmaFec4GdlDescBaseLoRegVf, + .dma_ring_ul4g_hi = HWVfDmaFec4GulDescBaseHiRegVf, + .dma_ring_ul4g_lo = HWVfDmaFec4GulDescBaseLoRegVf, + .dma_ring_fft_hi = HWVfDmaFftDescBaseHiRegVf, + .dma_ring_fft_lo = HWVfDmaFftDescBaseLoRegVf, + .ring_size = HWVfQmgrRingSizeVf, + .info_ring_hi = HWVfHiInfoRingBaseHiVf, + .info_ring_lo = HWVfHiInfoRingBaseLoVf, + .info_ring_en = HWVfHiInfoRingIntWrEnVf, + .info_ring_ptr = HWVfHiInfoRingPointerVf, + .tail_ptrs_dl5g_hi = HWVfDmaFec5GdlRespPtrHiRegVf, + .tail_ptrs_dl5g_lo = HWVfDmaFec5GdlRespPtrLoRegVf, + .tail_ptrs_ul5g_hi = HWVfDmaFec5GulRespPtrHiRegVf, + .tail_ptrs_ul5g_lo = HWVfDmaFec5GulRespPtrLoRegVf, + .tail_ptrs_dl4g_hi = HWVfDmaFec4GdlRespPtrHiRegVf, + .tail_ptrs_dl4g_lo = HWVfDmaFec4GdlRespPtrLoRegVf, + .tail_ptrs_ul4g_hi = HWVfDmaFec4GulRespPtrHiRegVf, + .tail_ptrs_ul4g_lo = HWVfDmaFec4GulRespPtrLoRegVf, + .tail_ptrs_fft_hi = HWVfDmaFftRespPtrHiRegVf, + .tail_ptrs_fft_lo = HWVfDmaFftRespPtrLoRegVf, + .depth_log0_offset = HWVfQmgrGrpDepthLog20Vf, + .depth_log1_offset = HWVfQmgrGrpDepthLog21Vf, + .qman_group_func = HWVfQmgrGrpFunction0Vf, + .hi_mode = HWVfHiMsixVectorMapperVf, + .pmon_ctrl_a = HWVfPmACntrlRegVf, + .pmon_ctrl_b = HWVfPmBCntrlRegVf, + .pmon_ctrl_c = HWVfPmCCntrlRegVf, +}; + + /* Private data structure for each ACC200 device */ struct acc200_device { void *mmio_base; /**< Base address of MMIO registers (BAR0) */ diff --git a/drivers/baseband/acc200/acc200_vf_enum.h b/drivers/baseband/acc200/acc200_vf_enum.h new file mode 100644 index 0000000..616edb6 --- /dev/null +++ b/drivers/baseband/acc200/acc200_vf_enum.h @@ -0,0 +1,89 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#ifndef ACC200_VF_ENUM_H +#define ACC200_VF_ENUM_H + +/* + * ACC200 Register mapping on VF BAR0 + * This is automatically generated from RDL, format may change with new RDL + */ +enum { + HWVfQmgrIngressAq = 0x00000000, + HWVfHiVfToPfDbellVf = 0x00000800, + HWVfHiPfToVfDbellVf = 0x00000808, + HWVfHiInfoRingBaseLoVf = 0x00000810, + HWVfHiInfoRingBaseHiVf = 0x00000814, + HWVfHiInfoRingPointerVf = 0x00000818, + HWVfHiInfoRingIntWrEnVf = 0x00000820, + HWVfHiInfoRingPf2VfWrEnVf = 0x00000824, + HWVfHiMsixVectorMapperVf = 0x00000860, + HWVfDmaFec5GulDescBaseLoRegVf = 0x00000920, + HWVfDmaFec5GulDescBaseHiRegVf = 0x00000924, + HWVfDmaFec5GulRespPtrLoRegVf = 0x00000928, + HWVfDmaFec5GulRespPtrHiRegVf = 0x0000092C, + HWVfDmaFec5GdlDescBaseLoRegVf = 0x00000940, + HWVfDmaFec5GdlDescBaseHiRegVf = 0x00000944, + HWVfDmaFec5GdlRespPtrLoRegVf = 0x00000948, + HWVfDmaFec5GdlRespPtrHiRegVf = 0x0000094C, + HWVfDmaFec4GulDescBaseLoRegVf = 0x00000960, + HWVfDmaFec4GulDescBaseHiRegVf = 0x00000964, + HWVfDmaFec4GulRespPtrLoRegVf = 0x00000968, + HWVfDmaFec4GulRespPtrHiRegVf = 0x0000096C, + HWVfDmaFec4GdlDescBaseLoRegVf = 0x00000980, + HWVfDmaFec4GdlDescBaseHiRegVf = 0x00000984, + HWVfDmaFec4GdlRespPtrLoRegVf = 0x00000988, + HWVfDmaFec4GdlRespPtrHiRegVf = 0x0000098C, + HWVfDmaFftDescBaseLoRegVf = 0x000009A0, + HWVfDmaFftDescBaseHiRegVf = 0x000009A4, + HWVfDmaFftRespPtrLoRegVf = 0x000009A8, + HWVfDmaFftRespPtrHiRegVf = 0x000009AC, + HWVfQmgrAqResetVf = 0x00000E00, + HWVfQmgrRingSizeVf = 0x00000E04, + HWVfQmgrGrpDepthLog20Vf = 0x00000E08, + HWVfQmgrGrpDepthLog21Vf = 0x00000E0C, + HWVfQmgrGrpFunction0Vf = 0x00000E10, + HWVfQmgrGrpFunction1Vf = 0x00000E14, + HWVfPmACntrlRegVf = 0x00000F40, + HWVfPmACountVf = 0x00000F48, + HWVfPmAKCntLoVf = 0x00000F50, + HWVfPmAKCntHiVf = 0x00000F54, + HWVfPmADeltaCntLoVf = 0x00000F60, + HWVfPmADeltaCntHiVf = 0x00000F64, + HWVfPmBCntrlRegVf = 0x00000F80, + HWVfPmBCountVf = 0x00000F88, + HWVfPmBKCntLoVf = 0x00000F90, + HWVfPmBKCntHiVf = 0x00000F94, + HWVfPmBDeltaCntLoVf = 0x00000FA0, + HWVfPmBDeltaCntHiVf = 0x00000FA4, + HWVfPmCCntrlRegVf = 0x00000FC0, + HWVfPmCCountVf = 0x00000FC8, + HWVfPmCKCntLoVf = 0x00000FD0, + HWVfPmCKCntHiVf = 0x00000FD4, + HWVfPmCDeltaCntLoVf = 0x00000FE0, + HWVfPmCDeltaCntHiVf = 0x00000FE4 +}; + +/* TIP VF Interrupt numbers */ +enum { + ACC200_VF_INT_QMGR_AQ_OVERFLOW = 0, + ACC200_VF_INT_DOORBELL_PF_2_VF = 1, + ACC200_VF_INT_ILLEGAL_FORMAT = 2, + ACC200_VF_INT_QMGR_DISABLED_ACCESS = 3, + ACC200_VF_INT_QMGR_AQ_OVERTHRESHOLD = 4, + ACC200_VF_INT_DMA_DL_DESC_IRQ = 5, + ACC200_VF_INT_DMA_UL_DESC_IRQ = 6, + ACC200_VF_INT_DMA_FFT_DESC_IRQ = 7, + ACC200_VF_INT_DMA_UL5G_DESC_IRQ = 8, + ACC200_VF_INT_DMA_DL5G_DESC_IRQ = 9, + ACC200_VF_INT_DMA_MLD_DESC_IRQ = 10, +}; + +/* TIP VF2PF Comms */ +enum { + ACC200_VF2PF_STATUS_REQUEST = 0, + ACC200_VF2PF_USING_VF = 1, +}; + +#endif /* ACC200_VF_ENUM_H */ diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 4103e48..70b6cc5 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -34,6 +34,8 @@ acc200_dev_close(struct rte_bbdev *dev) { RTE_SET_USED(dev); + /* Ensure all in flight HW transactions are completed */ + usleep(ACC200_LONG_WAIT); return 0; } -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v1 03/10] baseband/acc200: add info get function 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 01/10] baseband/acc200: introduce PMD for ACC200 Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 02/10] baseband/acc200: add HW register definitions Nicolas Chautru @ 2022-07-08 0:01 ` Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 04/10] baseband/acc200: add queue configuration Nicolas Chautru ` (7 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nicolas Chautru @ 2022-07-08 0:01 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal, trix Cc: maxime.coquelin, mdr, bruce.richardson, david.marchand, stephen, Nicolas Chautru Add support for info_get to allow to query the device. Null capability exposed. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/acc200_pmd.h | 2 + drivers/baseband/acc200/rte_acc200_cfg.h | 94 ++++++++++++ drivers/baseband/acc200/rte_acc200_pmd.c | 256 +++++++++++++++++++++++++++++++ 3 files changed, 352 insertions(+) create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h diff --git a/drivers/baseband/acc200/acc200_pmd.h b/drivers/baseband/acc200/acc200_pmd.h index b420524..91e0798 100644 --- a/drivers/baseband/acc200/acc200_pmd.h +++ b/drivers/baseband/acc200/acc200_pmd.h @@ -7,6 +7,7 @@ #include "acc200_pf_enum.h" #include "acc200_vf_enum.h" +#include "rte_acc200_cfg.h" /* Helper macro for logging */ #define rte_bbdev_log(level, fmt, ...) \ @@ -619,6 +620,7 @@ struct acc200_registry_addr { struct acc200_device { void *mmio_base; /**< Base address of MMIO registers (BAR0) */ uint32_t ddr_size; /* Size in kB */ + struct rte_acc200_conf acc200_conf; /* ACC200 Initial configuration */ bool pf_device; /**< True if this is a PF ACC200 device */ bool configured; /**< True if this ACC200 device is configured */ }; diff --git a/drivers/baseband/acc200/rte_acc200_cfg.h b/drivers/baseband/acc200/rte_acc200_cfg.h new file mode 100644 index 0000000..fcccfbf --- /dev/null +++ b/drivers/baseband/acc200/rte_acc200_cfg.h @@ -0,0 +1,94 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#ifndef _RTE_ACC200_CFG_H_ +#define _RTE_ACC200_CFG_H_ + +/** + * @file rte_acc200_cfg.h + * + * Functions for configuring ACC200 HW, exposed directly to applications. + * Configuration related to encoding/decoding is done through the + * librte_bbdev library. + * + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + */ + +#include <stdint.h> +#include <stdbool.h> + +#ifdef __cplusplus +extern "C" { +#endif +/**< Number of Virtual Functions ACC200 supports */ +#define RTE_ACC200_NUM_VFS 16 + +/** + * Definition of Queue Topology for ACC200 Configuration + * Some level of details is abstracted out to expose a clean interface + * given that comprehensive flexibility is not required + */ +struct rte_acc200_queue_topology { + /** Number of QGroups in incremental order of priority */ + uint16_t num_qgroups; + /** + * All QGroups have the same number of AQs here. + * Note : Could be made a 16-array if more flexibility is really + * required + */ + uint16_t num_aqs_per_groups; + /** + * Depth of the AQs is the same of all QGroups here. Log2 Enum : 2^N + * Note : Could be made a 16-array if more flexibility is really + * required + */ + uint16_t aq_depth_log2; + /** + * Index of the first Queue Group Index - assuming contiguity + * Initialized as -1 + */ + int8_t first_qgroup_index; +}; + +/** + * Definition of Arbitration related parameters for ACC200 Configuration + */ +struct rte_acc200_arbitration { + /** Default Weight for VF Fairness Arbitration */ + uint16_t round_robin_weight; + uint32_t gbr_threshold1; /**< Guaranteed Bitrate Threshold 1 */ + uint32_t gbr_threshold2; /**< Guaranteed Bitrate Threshold 2 */ +}; + +/** + * Structure to pass ACC200 configuration. + * Note: all VF Bundles will have the same configuration. + */ +struct rte_acc200_conf { + bool pf_mode_en; /**< 1 if PF is used for dataplane, 0 for VFs */ + /** 1 if input '1' bit is represented by a positive LLR value, 0 if '1' + * bit is represented by a negative value. + */ + bool input_pos_llr_1_bit; + /** 1 if output '1' bit is represented by a positive value, 0 if '1' + * bit is represented by a negative value. + */ + bool output_pos_llr_1_bit; + uint16_t num_vf_bundles; /**< Number of VF bundles to setup */ + /** Queue topology for each operation type */ + struct rte_acc200_queue_topology q_ul_4g; + struct rte_acc200_queue_topology q_dl_4g; + struct rte_acc200_queue_topology q_ul_5g; + struct rte_acc200_queue_topology q_dl_5g; + struct rte_acc200_queue_topology q_fft; + /** Arbitration configuration for each operation type */ + struct rte_acc200_arbitration arb_ul_4g[RTE_ACC200_NUM_VFS]; + struct rte_acc200_arbitration arb_dl_4g[RTE_ACC200_NUM_VFS]; + struct rte_acc200_arbitration arb_ul_5g[RTE_ACC200_NUM_VFS]; + struct rte_acc200_arbitration arb_dl_5g[RTE_ACC200_NUM_VFS]; + struct rte_acc200_arbitration arb_fft[RTE_ACC200_NUM_VFS]; +}; + +#endif /* _RTE_ACC200_CFG_H_ */ diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 70b6cc5..ce72654 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -29,6 +29,207 @@ RTE_LOG_REGISTER_DEFAULT(acc200_logtype, NOTICE); #endif +/* Read a register of a ACC200 device */ +static inline uint32_t +acc200_reg_read(struct acc200_device *d, uint32_t offset) +{ + + void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset); + uint32_t ret = *((volatile uint32_t *)(reg_addr)); + return rte_le_to_cpu_32(ret); +} + +/* Calculate the offset of the enqueue register */ +static inline uint32_t +queue_offset(bool pf_device, uint8_t vf_id, uint8_t qgrp_id, uint16_t aq_id) +{ + if (pf_device) + return ((vf_id << 12) + (qgrp_id << 7) + (aq_id << 3) + + HWPfQmgrIngressAq); + else + return ((qgrp_id << 7) + (aq_id << 3) + + HWVfQmgrIngressAq); +} + +enum {UL_4G = 0, UL_5G, DL_4G, DL_5G, FFT, NUM_ACC}; + +/* Return the queue topology for a Queue Group Index */ +static inline void +qtopFromAcc(struct rte_acc200_queue_topology **qtop, int acc_enum, + struct rte_acc200_conf *acc200_conf) +{ + struct rte_acc200_queue_topology *p_qtop; + p_qtop = NULL; + switch (acc_enum) { + case UL_4G: + p_qtop = &(acc200_conf->q_ul_4g); + break; + case UL_5G: + p_qtop = &(acc200_conf->q_ul_5g); + break; + case DL_4G: + p_qtop = &(acc200_conf->q_dl_4g); + break; + case DL_5G: + p_qtop = &(acc200_conf->q_dl_5g); + break; + case FFT: + p_qtop = &(acc200_conf->q_fft); + break; + default: + /* NOTREACHED */ + rte_bbdev_log(ERR, "Unexpected error evaluating qtopFromAcc %d", + acc_enum); + break; + } + *qtop = p_qtop; +} + +static void +initQTop(struct rte_acc200_conf *acc200_conf) +{ + acc200_conf->q_ul_4g.num_aqs_per_groups = 0; + acc200_conf->q_ul_4g.num_qgroups = 0; + acc200_conf->q_ul_4g.first_qgroup_index = -1; + acc200_conf->q_ul_5g.num_aqs_per_groups = 0; + acc200_conf->q_ul_5g.num_qgroups = 0; + acc200_conf->q_ul_5g.first_qgroup_index = -1; + acc200_conf->q_dl_4g.num_aqs_per_groups = 0; + acc200_conf->q_dl_4g.num_qgroups = 0; + acc200_conf->q_dl_4g.first_qgroup_index = -1; + acc200_conf->q_dl_5g.num_aqs_per_groups = 0; + acc200_conf->q_dl_5g.num_qgroups = 0; + acc200_conf->q_dl_5g.first_qgroup_index = -1; + acc200_conf->q_fft.num_aqs_per_groups = 0; + acc200_conf->q_fft.num_qgroups = 0; + acc200_conf->q_fft.first_qgroup_index = -1; +} + +static inline void +updateQtop(uint8_t acc, uint8_t qg, struct rte_acc200_conf *acc200_conf, + struct acc200_device *d) { + uint32_t reg; + struct rte_acc200_queue_topology *q_top = NULL; + qtopFromAcc(&q_top, acc, acc200_conf); + if (unlikely(q_top == NULL)) + return; + uint16_t aq; + q_top->num_qgroups++; + if (q_top->first_qgroup_index == -1) { + q_top->first_qgroup_index = qg; + /* Can be optimized to assume all are enabled by default */ + reg = acc200_reg_read(d, queue_offset(d->pf_device, + 0, qg, ACC200_NUM_AQS - 1)); + if (reg & ACC200_QUEUE_ENABLE) { + q_top->num_aqs_per_groups = ACC200_NUM_AQS; + return; + } + q_top->num_aqs_per_groups = 0; + for (aq = 0; aq < ACC200_NUM_AQS; aq++) { + reg = acc200_reg_read(d, queue_offset(d->pf_device, + 0, qg, aq)); + if (reg & ACC200_QUEUE_ENABLE) + q_top->num_aqs_per_groups++; + } + } +} + +/* Fetch configuration enabled for the PF/VF using MMIO Read (slow) */ +static inline void +fetch_acc200_config(struct rte_bbdev *dev) +{ + struct acc200_device *d = dev->data->dev_private; + struct rte_acc200_conf *acc200_conf = &d->acc200_conf; + const struct acc200_registry_addr *reg_addr; + uint8_t acc, qg; + uint32_t reg_aq, reg_len0, reg_len1, reg0, reg1; + uint32_t reg_mode, idx; + + /* No need to retrieve the configuration is already done */ + if (d->configured) + return; + + /* Choose correct registry addresses for the device type */ + if (d->pf_device) + reg_addr = &pf_reg_addr; + else + reg_addr = &vf_reg_addr; + + d->ddr_size = 0; + + /* Single VF Bundle by VF */ + acc200_conf->num_vf_bundles = 1; + initQTop(acc200_conf); + + struct rte_acc200_queue_topology *q_top = NULL; + int qman_func_id[ACC200_NUM_ACCS] = {ACC200_ACCMAP_0, ACC200_ACCMAP_1, + ACC200_ACCMAP_2, ACC200_ACCMAP_3, ACC200_ACCMAP_4}; + reg0 = acc200_reg_read(d, reg_addr->qman_group_func); + reg1 = acc200_reg_read(d, reg_addr->qman_group_func + 4); + for (qg = 0; qg < ACC200_NUM_QGRPS; qg++) { + reg_aq = acc200_reg_read(d, + queue_offset(d->pf_device, 0, qg, 0)); + if (reg_aq & ACC200_QUEUE_ENABLE) { + /* printf("Qg enabled %d %x\n", qg, reg_aq); */ + if (qg < ACC200_NUM_QGRPS_PER_WORD) + idx = (reg0 >> (qg * 4)) & 0x7; + else + idx = (reg1 >> ((qg - + ACC200_NUM_QGRPS_PER_WORD) * 4)) & 0x7; + if (idx < ACC200_NUM_ACCS) { + acc = qman_func_id[idx]; + updateQtop(acc, qg, acc200_conf, d); + } + } + } + + /* Check the depth of the AQs*/ + reg_len0 = acc200_reg_read(d, reg_addr->depth_log0_offset); + reg_len1 = acc200_reg_read(d, reg_addr->depth_log1_offset); + for (acc = 0; acc < NUM_ACC; acc++) { + qtopFromAcc(&q_top, acc, acc200_conf); + if (q_top->first_qgroup_index < ACC200_NUM_QGRPS_PER_WORD) + q_top->aq_depth_log2 = (reg_len0 >> + (q_top->first_qgroup_index * 4)) + & 0xF; + else + q_top->aq_depth_log2 = (reg_len1 >> + ((q_top->first_qgroup_index - + ACC200_NUM_QGRPS_PER_WORD) * 4)) + & 0xF; + } + + /* Read PF mode */ + if (d->pf_device) { + reg_mode = acc200_reg_read(d, HWPfHiPfMode); + acc200_conf->pf_mode_en = (reg_mode == ACC200_PF_VAL) ? 1 : 0; + } else { + reg_mode = acc200_reg_read(d, reg_addr->hi_mode); + acc200_conf->pf_mode_en = reg_mode & 1; + } + + rte_bbdev_log_debug( + "%s Config LLR SIGN IN/OUT %s %s QG %u %u %u %u %u AQ %u %u %u %u %u Len %u %u %u %u %u\n", + (d->pf_device) ? "PF" : "VF", + (acc200_conf->input_pos_llr_1_bit) ? "POS" : "NEG", + (acc200_conf->output_pos_llr_1_bit) ? "POS" : "NEG", + acc200_conf->q_ul_4g.num_qgroups, + acc200_conf->q_dl_4g.num_qgroups, + acc200_conf->q_ul_5g.num_qgroups, + acc200_conf->q_dl_5g.num_qgroups, + acc200_conf->q_fft.num_qgroups, + acc200_conf->q_ul_4g.num_aqs_per_groups, + acc200_conf->q_dl_4g.num_aqs_per_groups, + acc200_conf->q_ul_5g.num_aqs_per_groups, + acc200_conf->q_dl_5g.num_aqs_per_groups, + acc200_conf->q_fft.num_aqs_per_groups, + acc200_conf->q_ul_4g.aq_depth_log2, + acc200_conf->q_dl_4g.aq_depth_log2, + acc200_conf->q_ul_5g.aq_depth_log2, + acc200_conf->q_dl_5g.aq_depth_log2, + acc200_conf->q_fft.aq_depth_log2); +} + /* Free memory used for software rings */ static int acc200_dev_close(struct rte_bbdev *dev) @@ -39,9 +240,57 @@ return 0; } +/* Get ACC200 device info */ +static void +acc200_dev_info_get(struct rte_bbdev *dev, + struct rte_bbdev_driver_info *dev_info) +{ + struct acc200_device *d = dev->data->dev_private; + int i; + static const struct rte_bbdev_op_cap bbdev_capabilities[] = { + RTE_BBDEV_END_OF_CAPABILITIES_LIST() + }; + + static struct rte_bbdev_queue_conf default_queue_conf; + default_queue_conf.socket = dev->data->socket_id; + default_queue_conf.queue_size = ACC200_MAX_QUEUE_DEPTH; + + dev_info->driver_name = dev->device->driver->name; + + /* Read and save the populated config from ACC200 registers */ + fetch_acc200_config(dev); + + /* Exposed number of queues */ + dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_FFT] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_FFT] = 0; + dev_info->max_num_queues = 0; + for (i = RTE_BBDEV_OP_NONE; i <= RTE_BBDEV_OP_FFT; i++) + dev_info->max_num_queues += dev_info->num_queues[i]; + dev_info->queue_size_lim = ACC200_MAX_QUEUE_DEPTH; + dev_info->hardware_accelerated = true; + dev_info->max_dl_queue_priority = + d->acc200_conf.q_dl_4g.num_qgroups - 1; + dev_info->max_ul_queue_priority = + d->acc200_conf.q_ul_4g.num_qgroups - 1; + dev_info->default_queue_conf = default_queue_conf; + dev_info->cpu_flag_reqs = NULL; + dev_info->min_alignment = 1; + dev_info->capabilities = bbdev_capabilities; + dev_info->harq_buffer_size = 0; +} static const struct rte_bbdev_ops acc200_bbdev_ops = { .close = acc200_dev_close, + .info_get = acc200_dev_info_get, }; /* ACC200 PCI PF address map */ @@ -60,6 +309,13 @@ {.device_id = 0}, }; +/* Read flag value 0/1 from bitmap */ +static inline bool +check_bit(uint32_t bitmap, uint32_t bitmask) +{ + return bitmap & bitmask; +} + /* Initialization Function */ static void acc200_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v1 04/10] baseband/acc200: add queue configuration 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru ` (2 preceding siblings ...) 2022-07-08 0:01 ` [PATCH v1 03/10] baseband/acc200: add info get function Nicolas Chautru @ 2022-07-08 0:01 ` Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 05/10] baseband/acc200: add LDPC processing functions Nicolas Chautru ` (6 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nicolas Chautru @ 2022-07-08 0:01 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal, trix Cc: maxime.coquelin, mdr, bruce.richardson, david.marchand, stephen, Nicolas Chautru Adding fuinction to create and configure queues for the device. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/acc200_pmd.h | 62 ++++ drivers/baseband/acc200/rte_acc200_pmd.c | 506 ++++++++++++++++++++++++++++++- 2 files changed, 567 insertions(+), 1 deletion(-) diff --git a/drivers/baseband/acc200/acc200_pmd.h b/drivers/baseband/acc200/acc200_pmd.h index 91e0798..47ad00e 100644 --- a/drivers/baseband/acc200/acc200_pmd.h +++ b/drivers/baseband/acc200/acc200_pmd.h @@ -615,14 +615,76 @@ struct acc200_registry_addr { .pmon_ctrl_c = HWVfPmCCntrlRegVf, }; +/* Structure associated with each queue. */ +struct __rte_cache_aligned acc200_queue { + union acc200_dma_desc *ring_addr; /* Virtual address of sw ring */ + rte_iova_t ring_addr_iova; /* IOVA address of software ring */ + uint32_t sw_ring_head; /* software ring head */ + uint32_t sw_ring_tail; /* software ring tail */ + /* software ring size (descriptors, not bytes) */ + uint32_t sw_ring_depth; + /* mask used to wrap enqueued descriptors on the sw ring */ + uint32_t sw_ring_wrap_mask; + /* Virtual address of companion ring */ + struct acc200_ptrs *companion_ring_addr; + /* MMIO register used to enqueue descriptors */ + void *mmio_reg_enqueue; + uint8_t vf_id; /* VF ID (max = 63) */ + uint8_t qgrp_id; /* Queue Group ID */ + uint16_t aq_id; /* Atomic Queue ID */ + uint16_t aq_depth; /* Depth of atomic queue */ + uint32_t aq_enqueued; /* Count how many "batches" have been enqueued */ + uint32_t aq_dequeued; /* Count how many "batches" have been dequeued */ + uint32_t irq_enable; /* Enable ops dequeue interrupts if set to 1 */ + struct rte_mempool *fcw_mempool; /* FCW mempool */ + enum rte_bbdev_op_type op_type; /* Type of this Queue: TE or TD */ + /* Internal Buffers for loopback input */ + uint8_t *lb_in; + uint8_t *lb_out; + rte_iova_t lb_in_addr_iova; + rte_iova_t lb_out_addr_iova; + struct acc200_device *d; +}; /* Private data structure for each ACC200 device */ struct acc200_device { void *mmio_base; /**< Base address of MMIO registers (BAR0) */ + void *sw_rings_base; /* Base addr of un-aligned memory for sw rings */ + void *sw_rings; /* 64MBs of 64MB aligned memory for sw rings */ + rte_iova_t sw_rings_iova; /* IOVA address of sw_rings */ + /* Virtual address of the info memory routed to the this function under + * operation, whether it is PF or VF. + * HW may DMA information data at this location asynchronously + */ + union acc200_info_ring_data *info_ring; + + union acc200_harq_layout_data *harq_layout; + /* Virtual Info Ring head */ + uint16_t info_ring_head; + /* Number of bytes available for each queue in device, depending on + * how many queues are enabled with configure() + */ + uint32_t sw_ring_size; uint32_t ddr_size; /* Size in kB */ + uint32_t *tail_ptrs; /* Base address of response tail pointer buffer */ + rte_iova_t tail_ptr_iova; /* IOVA address of tail pointers */ + /* Max number of entries available for each queue in device, depending + * on how many queues are enabled with configure() + */ + uint32_t sw_ring_max_depth; struct rte_acc200_conf acc200_conf; /* ACC200 Initial configuration */ + /* Bitmap capturing which Queues have already been assigned */ + uint16_t q_assigned_bit_map[ACC200_NUM_QGRPS]; bool pf_device; /**< True if this is a PF ACC200 device */ bool configured; /**< True if this ACC200 device is configured */ }; +/** + * Structure with details about RTE_BBDEV_EVENT_DEQUEUE event. It's passed to + * the callback function. + */ +struct acc200_deq_intr_details { + uint16_t queue_id; +}; + #endif /* _RTE_ACC200_PMD_H_ */ diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index ce72654..ec082f1 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -29,6 +29,22 @@ RTE_LOG_REGISTER_DEFAULT(acc200_logtype, NOTICE); #endif +/* Write to MMIO register address */ +static inline void +mmio_write(void *addr, uint32_t value) +{ + *((volatile uint32_t *)(addr)) = rte_cpu_to_le_32(value); +} + +/* Write a register of a ACC200 device */ +static inline void +acc200_reg_write(struct acc200_device *d, uint32_t offset, uint32_t value) +{ + void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset); + mmio_write(reg_addr, value); + usleep(ACC200_LONG_WAIT); +} + /* Read a register of a ACC200 device */ static inline uint32_t acc200_reg_read(struct acc200_device *d, uint32_t offset) @@ -39,6 +55,22 @@ return rte_le_to_cpu_32(ret); } +/* Basic Implementation of Log2 for exact 2^N */ +static inline uint32_t +log2_basic(uint32_t value) +{ + return (value == 0) ? 0 : rte_bsf32(value); +} + +/* Calculate memory alignment offset assuming alignment is 2^N */ +static inline uint32_t +calc_mem_alignment_offset(void *unaligned_virt_mem, uint32_t alignment) +{ + rte_iova_t unaligned_phy_mem = rte_malloc_virt2iova(unaligned_virt_mem); + return (uint32_t)(alignment - + (unaligned_phy_mem & (alignment-1))); +} + /* Calculate the offset of the enqueue register */ static inline uint32_t queue_offset(bool pf_device, uint8_t vf_id, uint8_t qgrp_id, uint16_t aq_id) @@ -230,16 +262,484 @@ acc200_conf->q_fft.aq_depth_log2); } +static void +free_base_addresses(void **base_addrs, int size) +{ + int i; + for (i = 0; i < size; i++) + rte_free(base_addrs[i]); +} + +static inline uint32_t +get_desc_len(void) +{ + return sizeof(union acc200_dma_desc); +} + +/* Allocate the 2 * 64MB block for the sw rings */ +static int +alloc_2x64mb_sw_rings_mem(struct rte_bbdev *dev, struct acc200_device *d, + int socket) +{ + uint32_t sw_ring_size = ACC200_SIZE_64MBYTE; + d->sw_rings_base = rte_zmalloc_socket(dev->device->driver->name, + 2 * sw_ring_size, RTE_CACHE_LINE_SIZE, socket); + if (d->sw_rings_base == NULL) { + rte_bbdev_log(ERR, "Failed to allocate memory for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + return -ENOMEM; + } + uint32_t next_64mb_align_offset = calc_mem_alignment_offset( + d->sw_rings_base, ACC200_SIZE_64MBYTE); + d->sw_rings = RTE_PTR_ADD(d->sw_rings_base, next_64mb_align_offset); + d->sw_rings_iova = rte_malloc_virt2iova(d->sw_rings_base) + + next_64mb_align_offset; + d->sw_ring_size = ACC200_MAX_QUEUE_DEPTH * get_desc_len(); + d->sw_ring_max_depth = ACC200_MAX_QUEUE_DEPTH; + + return 0; +} + +/* Attempt to allocate minimised memory space for sw rings */ +static void +alloc_sw_rings_min_mem(struct rte_bbdev *dev, struct acc200_device *d, + uint16_t num_queues, int socket) +{ + rte_iova_t sw_rings_base_iova, next_64mb_align_addr_iova; + uint32_t next_64mb_align_offset; + rte_iova_t sw_ring_iova_end_addr; + void *base_addrs[ACC200_SW_RING_MEM_ALLOC_ATTEMPTS]; + void *sw_rings_base; + int i = 0; + uint32_t q_sw_ring_size = ACC200_MAX_QUEUE_DEPTH * get_desc_len(); + uint32_t dev_sw_ring_size = q_sw_ring_size * num_queues; + /* Free first in case this is a reconfiguration */ + rte_free(d->sw_rings_base); + + /* Find an aligned block of memory to store sw rings */ + while (i < ACC200_SW_RING_MEM_ALLOC_ATTEMPTS) { + /* + * sw_ring allocated memory is guaranteed to be aligned to + * q_sw_ring_size at the condition that the requested size is + * less than the page size + */ + sw_rings_base = rte_zmalloc_socket( + dev->device->driver->name, + dev_sw_ring_size, q_sw_ring_size, socket); + + if (sw_rings_base == NULL) { + rte_bbdev_log(ERR, + "Failed to allocate memory for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + break; + } + + sw_rings_base_iova = rte_malloc_virt2iova(sw_rings_base); + next_64mb_align_offset = calc_mem_alignment_offset( + sw_rings_base, ACC200_SIZE_64MBYTE); + next_64mb_align_addr_iova = sw_rings_base_iova + + next_64mb_align_offset; + sw_ring_iova_end_addr = sw_rings_base_iova + dev_sw_ring_size; + + /* Check if the end of the sw ring memory block is before the + * start of next 64MB aligned mem address + */ + if (sw_ring_iova_end_addr < next_64mb_align_addr_iova) { + d->sw_rings_iova = sw_rings_base_iova; + d->sw_rings = sw_rings_base; + d->sw_rings_base = sw_rings_base; + d->sw_ring_size = q_sw_ring_size; + d->sw_ring_max_depth = ACC200_MAX_QUEUE_DEPTH; + break; + } + /* Store the address of the unaligned mem block */ + base_addrs[i] = sw_rings_base; + i++; + } + + /* Free all unaligned blocks of mem allocated in the loop */ + free_base_addresses(base_addrs, i); +} + +/* Allocate 64MB memory used for all software rings */ +static int +acc200_setup_queues(struct rte_bbdev *dev, uint16_t num_queues, int socket_id) +{ + uint32_t phys_low, phys_high, value; + struct acc200_device *d = dev->data->dev_private; + const struct acc200_registry_addr *reg_addr; + + if (d->pf_device && !d->acc200_conf.pf_mode_en) { + rte_bbdev_log(NOTICE, + "%s has PF mode disabled. This PF can't be used.", + dev->data->name); + return -ENODEV; + } + if (!d->pf_device && d->acc200_conf.pf_mode_en) { + rte_bbdev_log(NOTICE, + "%s has PF mode enabled. This VF can't be used.", + dev->data->name); + return -ENODEV; + } + + alloc_sw_rings_min_mem(dev, d, num_queues, socket_id); + + /* If minimal memory space approach failed, then allocate + * the 2 * 64MB block for the sw rings + */ + if (d->sw_rings == NULL) + alloc_2x64mb_sw_rings_mem(dev, d, socket_id); + + if (d->sw_rings == NULL) { + rte_bbdev_log(NOTICE, + "Failure allocating sw_rings memory"); + return -ENODEV; + } + + /* Configure ACC200 with the base address for DMA descriptor rings + * Same descriptor rings used for UL and DL DMA Engines + * Note : Assuming only VF0 bundle is used for PF mode + */ + phys_high = (uint32_t)(d->sw_rings_iova >> 32); + phys_low = (uint32_t)(d->sw_rings_iova & ~(ACC200_SIZE_64MBYTE-1)); + + /* Choose correct registry addresses for the device type */ + if (d->pf_device) + reg_addr = &pf_reg_addr; + else + reg_addr = &vf_reg_addr; + + /* Read the populated cfg from ACC200 registers */ + fetch_acc200_config(dev); + + /* Start Pmon */ + for (value = 0; value <= 2; value++) { + acc200_reg_write(d, reg_addr->pmon_ctrl_a, value); + acc200_reg_write(d, reg_addr->pmon_ctrl_b, value); + acc200_reg_write(d, reg_addr->pmon_ctrl_c, value); + } + + /* Release AXI from PF */ + if (d->pf_device) + acc200_reg_write(d, HWPfDmaAxiControl, 1); + + acc200_reg_write(d, reg_addr->dma_ring_ul5g_hi, phys_high); + acc200_reg_write(d, reg_addr->dma_ring_ul5g_lo, phys_low); + acc200_reg_write(d, reg_addr->dma_ring_dl5g_hi, phys_high); + acc200_reg_write(d, reg_addr->dma_ring_dl5g_lo, phys_low); + acc200_reg_write(d, reg_addr->dma_ring_ul4g_hi, phys_high); + acc200_reg_write(d, reg_addr->dma_ring_ul4g_lo, phys_low); + acc200_reg_write(d, reg_addr->dma_ring_dl4g_hi, phys_high); + acc200_reg_write(d, reg_addr->dma_ring_dl4g_lo, phys_low); + acc200_reg_write(d, reg_addr->dma_ring_fft_hi, phys_high); + acc200_reg_write(d, reg_addr->dma_ring_fft_lo, phys_low); + /* + * Configure Ring Size to the max queue ring size + * (used for wrapping purpose) + */ + value = log2_basic(d->sw_ring_size / 64); + acc200_reg_write(d, reg_addr->ring_size, value); + + /* Configure tail pointer for use when SDONE enabled */ + if (d->tail_ptrs == NULL) + d->tail_ptrs = rte_zmalloc_socket( + dev->device->driver->name, + ACC200_NUM_QGRPS * ACC200_NUM_AQS * sizeof(uint32_t), + RTE_CACHE_LINE_SIZE, socket_id); + if (d->tail_ptrs == NULL) { + rte_bbdev_log(ERR, "Failed to allocate tail ptr for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + rte_free(d->sw_rings); + return -ENOMEM; + } + d->tail_ptr_iova = rte_malloc_virt2iova(d->tail_ptrs); + + phys_high = (uint32_t)(d->tail_ptr_iova >> 32); + phys_low = (uint32_t)(d->tail_ptr_iova); + acc200_reg_write(d, reg_addr->tail_ptrs_ul5g_hi, phys_high); + acc200_reg_write(d, reg_addr->tail_ptrs_ul5g_lo, phys_low); + acc200_reg_write(d, reg_addr->tail_ptrs_dl5g_hi, phys_high); + acc200_reg_write(d, reg_addr->tail_ptrs_dl5g_lo, phys_low); + acc200_reg_write(d, reg_addr->tail_ptrs_ul4g_hi, phys_high); + acc200_reg_write(d, reg_addr->tail_ptrs_ul4g_lo, phys_low); + acc200_reg_write(d, reg_addr->tail_ptrs_dl4g_hi, phys_high); + acc200_reg_write(d, reg_addr->tail_ptrs_dl4g_lo, phys_low); + acc200_reg_write(d, reg_addr->tail_ptrs_fft_hi, phys_high); + acc200_reg_write(d, reg_addr->tail_ptrs_fft_lo, phys_low); + + if (d->harq_layout == NULL) + d->harq_layout = rte_zmalloc_socket("HARQ Layout", + ACC200_HARQ_LAYOUT * sizeof(*d->harq_layout), + RTE_CACHE_LINE_SIZE, dev->data->socket_id); + if (d->harq_layout == NULL) { + rte_bbdev_log(ERR, "Failed to allocate harq_layout for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + rte_free(d->sw_rings); + return -ENOMEM; + } + + /* Mark as configured properly */ + d->configured = true; + + rte_bbdev_log_debug( + "ACC200 (%s) configured sw_rings = %p, sw_rings_iova = %#" + PRIx64, dev->data->name, d->sw_rings, d->sw_rings_iova); + + return 0; +} + /* Free memory used for software rings */ static int acc200_dev_close(struct rte_bbdev *dev) { - RTE_SET_USED(dev); + struct acc200_device *d = dev->data->dev_private; + if (d->sw_rings_base != NULL) { + rte_free(d->tail_ptrs); + rte_free(d->sw_rings_base); + rte_free(d->harq_layout); + d->sw_rings_base = NULL; + d->tail_ptrs = NULL; + d->harq_layout = NULL; + } /* Ensure all in flight HW transactions are completed */ usleep(ACC200_LONG_WAIT); return 0; } +/** + * Report a ACC200 queue index which is free + * Return 0 to 16k for a valid queue_idx or -1 when no queue is available + * Note : Only supporting VF0 Bundle for PF mode + */ +static int +acc200_find_free_queue_idx(struct rte_bbdev *dev, + const struct rte_bbdev_queue_conf *conf) +{ + struct acc200_device *d = dev->data->dev_private; + int op_2_acc[6] = {0, UL_4G, DL_4G, UL_5G, DL_5G, FFT}; + int acc = op_2_acc[conf->op_type]; + struct rte_acc200_queue_topology *qtop = NULL; + + qtopFromAcc(&qtop, acc, &(d->acc200_conf)); + if (qtop == NULL) + return -1; + /* Identify matching QGroup Index which are sorted in priority order */ + uint16_t group_idx = qtop->first_qgroup_index; + group_idx += conf->priority; + if (group_idx >= ACC200_NUM_QGRPS || + conf->priority >= qtop->num_qgroups) { + rte_bbdev_log(INFO, "Invalid Priority on %s, priority %u", + dev->data->name, conf->priority); + return -1; + } + /* Find a free AQ_idx */ + uint16_t aq_idx; + for (aq_idx = 0; aq_idx < qtop->num_aqs_per_groups; aq_idx++) { + if (((d->q_assigned_bit_map[group_idx] >> aq_idx) & 0x1) == 0) { + /* Mark the Queue as assigned */ + d->q_assigned_bit_map[group_idx] |= (1 << aq_idx); + /* Report the AQ Index */ + return (group_idx << ACC200_GRP_ID_SHIFT) + aq_idx; + } + } + rte_bbdev_log(INFO, "Failed to find free queue on %s, priority %u", + dev->data->name, conf->priority); + return -1; +} + +/* Setup ACC200 queue */ +static int +acc200_queue_setup(struct rte_bbdev *dev, uint16_t queue_id, + const struct rte_bbdev_queue_conf *conf) +{ + struct acc200_device *d = dev->data->dev_private; + struct acc200_queue *q; + int16_t q_idx; + + if (d == NULL) { + rte_bbdev_log(ERR, "Undefined device"); + return -ENODEV; + } + /* Allocate the queue data structure. */ + q = rte_zmalloc_socket(dev->device->driver->name, sizeof(*q), + RTE_CACHE_LINE_SIZE, conf->socket); + if (q == NULL) { + rte_bbdev_log(ERR, "Failed to allocate queue memory"); + return -ENOMEM; + } + + q->d = d; + q->ring_addr = RTE_PTR_ADD(d->sw_rings, (d->sw_ring_size * queue_id)); + q->ring_addr_iova = d->sw_rings_iova + (d->sw_ring_size * queue_id); + + /* Prepare the Ring with default descriptor format */ + union acc200_dma_desc *desc = NULL; + unsigned int desc_idx, b_idx; + int fcw_len = (conf->op_type == RTE_BBDEV_OP_LDPC_ENC ? + ACC200_FCW_LE_BLEN : (conf->op_type == RTE_BBDEV_OP_TURBO_DEC ? + ACC200_FCW_TD_BLEN : (conf->op_type == RTE_BBDEV_OP_LDPC_DEC ? + ACC200_FCW_LD_BLEN : ACC200_FCW_FFT_BLEN))); + + for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) { + desc = q->ring_addr + desc_idx; + desc->req.word0 = ACC200_DMA_DESC_TYPE; + desc->req.word1 = 0; /**< Timestamp */ + desc->req.word2 = 0; + desc->req.word3 = 0; + uint64_t fcw_offset = (desc_idx << 8) + ACC200_DESC_FCW_OFFSET; + desc->req.data_ptrs[0].address = q->ring_addr_iova + fcw_offset; + desc->req.data_ptrs[0].blen = fcw_len; + desc->req.data_ptrs[0].blkid = ACC200_DMA_BLKID_FCW; + desc->req.data_ptrs[0].last = 0; + desc->req.data_ptrs[0].dma_ext = 0; + for (b_idx = 1; b_idx < ACC200_DMA_MAX_NUM_POINTERS - 1; + b_idx++) { + desc->req.data_ptrs[b_idx].blkid = ACC200_DMA_BLKID_IN; + desc->req.data_ptrs[b_idx].last = 1; + desc->req.data_ptrs[b_idx].dma_ext = 0; + b_idx++; + desc->req.data_ptrs[b_idx].blkid = + ACC200_DMA_BLKID_OUT_ENC; + desc->req.data_ptrs[b_idx].last = 1; + desc->req.data_ptrs[b_idx].dma_ext = 0; + } + /* Preset some fields of LDPC FCW */ + desc->req.fcw_ld.FCWversion = ACC200_FCW_VER; + desc->req.fcw_ld.gain_i = 1; + desc->req.fcw_ld.gain_h = 1; + } + + q->lb_in = rte_zmalloc_socket(dev->device->driver->name, + RTE_CACHE_LINE_SIZE, + RTE_CACHE_LINE_SIZE, conf->socket); + if (q->lb_in == NULL) { + rte_bbdev_log(ERR, "Failed to allocate lb_in memory"); + rte_free(q); + return -ENOMEM; + } + q->lb_in_addr_iova = rte_malloc_virt2iova(q->lb_in); + q->lb_out = rte_zmalloc_socket(dev->device->driver->name, + RTE_CACHE_LINE_SIZE, + RTE_CACHE_LINE_SIZE, conf->socket); + if (q->lb_out == NULL) { + rte_bbdev_log(ERR, "Failed to allocate lb_out memory"); + rte_free(q->lb_in); + rte_free(q); + return -ENOMEM; + } + q->lb_out_addr_iova = rte_malloc_virt2iova(q->lb_out); + q->companion_ring_addr = rte_zmalloc_socket(dev->device->driver->name, + d->sw_ring_max_depth * sizeof(*q->companion_ring_addr), + RTE_CACHE_LINE_SIZE, conf->socket); + if (q->companion_ring_addr == NULL) { + rte_bbdev_log(ERR, "Failed to allocate companion_ring memory"); + rte_free(q->lb_in); + rte_free(q->lb_out); + rte_free(q); + return -ENOMEM; + } + + /* + * Software queue ring wraps synchronously with the HW when it reaches + * the boundary of the maximum allocated queue size, no matter what the + * sw queue size is. This wrapping is guarded by setting the wrap_mask + * to represent the maximum queue size as allocated at the time when + * the device has been setup (in configure()). + * + * The queue depth is set to the queue size value (conf->queue_size). + * This limits the occupancy of the queue at any point of time, so that + * the queue does not get swamped with enqueue requests. + */ + q->sw_ring_depth = conf->queue_size; + q->sw_ring_wrap_mask = d->sw_ring_max_depth - 1; + + q->op_type = conf->op_type; + + q_idx = acc200_find_free_queue_idx(dev, conf); + if (q_idx == -1) { + rte_free(q->companion_ring_addr); + rte_free(q->lb_in); + rte_free(q->lb_out); + rte_free(q); + return -1; + } + + q->qgrp_id = (q_idx >> ACC200_GRP_ID_SHIFT) & 0xF; + q->vf_id = (q_idx >> ACC200_VF_ID_SHIFT) & 0x3F; + q->aq_id = q_idx & 0xF; + q->aq_depth = 0; + if (conf->op_type == RTE_BBDEV_OP_TURBO_DEC) + q->aq_depth = (1 << d->acc200_conf.q_ul_4g.aq_depth_log2); + else if (conf->op_type == RTE_BBDEV_OP_TURBO_ENC) + q->aq_depth = (1 << d->acc200_conf.q_dl_4g.aq_depth_log2); + else if (conf->op_type == RTE_BBDEV_OP_LDPC_DEC) + q->aq_depth = (1 << d->acc200_conf.q_ul_5g.aq_depth_log2); + else if (conf->op_type == RTE_BBDEV_OP_LDPC_ENC) + q->aq_depth = (1 << d->acc200_conf.q_dl_5g.aq_depth_log2); + else if (conf->op_type == RTE_BBDEV_OP_FFT) + q->aq_depth = (1 << d->acc200_conf.q_fft.aq_depth_log2); + + q->mmio_reg_enqueue = RTE_PTR_ADD(d->mmio_base, + queue_offset(d->pf_device, + q->vf_id, q->qgrp_id, q->aq_id)); + + rte_bbdev_log_debug( + "Setup dev%u q%u: qgrp_id=%u, vf_id=%u, aq_id=%u, aq_depth=%u, mmio_reg_enqueue=%p base %p\n", + dev->data->dev_id, queue_id, q->qgrp_id, q->vf_id, + q->aq_id, q->aq_depth, q->mmio_reg_enqueue, + d->mmio_base); + + dev->data->queues[queue_id].queue_private = q; + return 0; +} + + +static int +acc200_queue_stop(struct rte_bbdev *dev, uint16_t queue_id) +{ + struct acc200_queue *q; + q = dev->data->queues[queue_id].queue_private; + rte_bbdev_log(INFO, "Queue Stop %d H/T/D %d %d %x OpType %d", + queue_id, q->sw_ring_head, q->sw_ring_tail, + q->sw_ring_depth, q->op_type); + /* ignore all operations in flight and clear counters */ + q->sw_ring_tail = q->sw_ring_head; + q->aq_enqueued = 0; + q->aq_dequeued = 0; + dev->data->queues[queue_id].queue_stats.enqueued_count = 0; + dev->data->queues[queue_id].queue_stats.dequeued_count = 0; + dev->data->queues[queue_id].queue_stats.enqueue_err_count = 0; + dev->data->queues[queue_id].queue_stats.dequeue_err_count = 0; + dev->data->queues[queue_id].queue_stats.enqueue_warn_count = 0; + dev->data->queues[queue_id].queue_stats.dequeue_warn_count = 0; + return 0; +} + +/* Release ACC200 queue */ +static int +acc200_queue_release(struct rte_bbdev *dev, uint16_t q_id) +{ + struct acc200_device *d = dev->data->dev_private; + struct acc200_queue *q = dev->data->queues[q_id].queue_private; + + if (q != NULL) { + /* Mark the Queue as un-assigned */ + d->q_assigned_bit_map[q->qgrp_id] &= (0xFFFFFFFF - + (1 << q->aq_id)); + rte_free(q->companion_ring_addr); + rte_free(q->lb_in); + rte_free(q->lb_out); + rte_free(q); + dev->data->queues[q_id].queue_private = NULL; + } + + return 0; +} + /* Get ACC200 device info */ static void acc200_dev_info_get(struct rte_bbdev *dev, @@ -289,8 +789,12 @@ } static const struct rte_bbdev_ops acc200_bbdev_ops = { + .setup_queues = acc200_setup_queues, .close = acc200_dev_close, .info_get = acc200_dev_info_get, + .queue_setup = acc200_queue_setup, + .queue_release = acc200_queue_release, + .queue_stop = acc200_queue_stop, }; /* ACC200 PCI PF address map */ -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v1 05/10] baseband/acc200: add LDPC processing functions 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru ` (3 preceding siblings ...) 2022-07-08 0:01 ` [PATCH v1 04/10] baseband/acc200: add queue configuration Nicolas Chautru @ 2022-07-08 0:01 ` Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 06/10] baseband/acc200: add LTE " Nicolas Chautru ` (5 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nicolas Chautru @ 2022-07-08 0:01 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal, trix Cc: maxime.coquelin, mdr, bruce.richardson, david.marchand, stephen, Nicolas Chautru Adding LDPC encode and decode processing functions. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/rte_acc200_pmd.c | 2116 +++++++++++++++++++++++++++++- 1 file changed, 2112 insertions(+), 4 deletions(-) diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index ec082f1..42cf2c8 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -697,15 +697,50 @@ return 0; } +static inline void +acc200_print_op(struct rte_bbdev_dec_op *op, enum rte_bbdev_op_type op_type, + uint16_t index) +{ + if (op == NULL) + return; + if (op_type == RTE_BBDEV_OP_LDPC_DEC) + rte_bbdev_log(INFO, + " Op 5GUL %d %d %d %d %d %d %d %d %d %d %d %d", + index, + op->ldpc_dec.basegraph, op->ldpc_dec.z_c, + op->ldpc_dec.n_cb, op->ldpc_dec.q_m, + op->ldpc_dec.n_filler, op->ldpc_dec.cb_params.e, + op->ldpc_dec.op_flags, op->ldpc_dec.rv_index, + op->ldpc_dec.iter_max, op->ldpc_dec.iter_count, + op->ldpc_dec.harq_combined_input.length + ); + else if (op_type == RTE_BBDEV_OP_LDPC_ENC) { + struct rte_bbdev_enc_op *op_dl = (struct rte_bbdev_enc_op *) op; + rte_bbdev_log(INFO, + " Op 5GDL %d %d %d %d %d %d %d %d %d", + index, + op_dl->ldpc_enc.basegraph, op_dl->ldpc_enc.z_c, + op_dl->ldpc_enc.n_cb, op_dl->ldpc_enc.q_m, + op_dl->ldpc_enc.n_filler, op_dl->ldpc_enc.cb_params.e, + op_dl->ldpc_enc.op_flags, op_dl->ldpc_enc.rv_index + ); + } +} static int acc200_queue_stop(struct rte_bbdev *dev, uint16_t queue_id) { struct acc200_queue *q; + struct rte_bbdev_dec_op *op; + uint16_t i; q = dev->data->queues[queue_id].queue_private; rte_bbdev_log(INFO, "Queue Stop %d H/T/D %d %d %x OpType %d", queue_id, q->sw_ring_head, q->sw_ring_tail, q->sw_ring_depth, q->op_type); + for (i = 0; i < q->sw_ring_depth; ++i) { + op = (q->ring_addr + i)->req.op_addr; + acc200_print_op(op, q->op_type, i); + } /* ignore all operations in flight and clear counters */ q->sw_ring_tail = q->sw_ring_head; q->aq_enqueued = 0; @@ -748,6 +783,43 @@ struct acc200_device *d = dev->data->dev_private; int i; static const struct rte_bbdev_op_cap bbdev_capabilities[] = { + { + .type = RTE_BBDEV_OP_LDPC_ENC, + .cap.ldpc_enc = { + .capability_flags = + RTE_BBDEV_LDPC_RATE_MATCH | + RTE_BBDEV_LDPC_CRC_24B_ATTACH | + RTE_BBDEV_LDPC_INTERLEAVER_BYPASS, + .num_buffers_src = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + .num_buffers_dst = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + } + }, + { + .type = RTE_BBDEV_OP_LDPC_DEC, + .cap.ldpc_dec = { + .capability_flags = + RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK | + RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP | + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK | + RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK | + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE | + RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE | + RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE | + RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS | + RTE_BBDEV_LDPC_DEC_SCATTER_GATHER | + RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION | + RTE_BBDEV_LDPC_LLR_COMPRESSION, + .llr_size = 8, + .llr_decimals = 1, + .num_buffers_src = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + .num_buffers_hard_out = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + .num_buffers_soft_out = 0, + } + }, RTE_BBDEV_END_OF_CAPABILITIES_LIST() }; @@ -764,13 +836,15 @@ dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = 0; dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = 0; - dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = 0; - dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = d->acc200_conf.q_ul_5g.num_aqs_per_groups * + d->acc200_conf.q_ul_5g.num_qgroups; + dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = d->acc200_conf.q_dl_5g.num_aqs_per_groups * + d->acc200_conf.q_dl_5g.num_qgroups; dev_info->num_queues[RTE_BBDEV_OP_FFT] = 0; dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = 0; dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = 0; - dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = 0; - dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = d->acc200_conf.q_ul_5g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = d->acc200_conf.q_dl_5g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_FFT] = 0; dev_info->max_num_queues = 0; for (i = RTE_BBDEV_OP_NONE; i <= RTE_BBDEV_OP_FFT; i++) @@ -820,6 +894,2036 @@ return bitmap & bitmask; } +static inline char * +mbuf_append(struct rte_mbuf *m_head, struct rte_mbuf *m, uint16_t len) +{ + if (unlikely(len > rte_pktmbuf_tailroom(m))) + return NULL; + + char *tail = (char *)m->buf_addr + m->data_off + m->data_len; + m->data_len = (uint16_t)(m->data_len + len); + m_head->pkt_len = (m_head->pkt_len + len); + return tail; +} + +/* Compute value of k0. + * Based on 3GPP 38.212 Table 5.4.2.1-2 + * Starting position of different redundancy versions, k0 + */ +static inline uint16_t +get_k0(uint16_t n_cb, uint16_t z_c, uint8_t bg, uint8_t rv_index) +{ + if (rv_index == 0) + return 0; + uint16_t n = (bg == 1 ? ACC200_N_ZC_1 : ACC200_N_ZC_2) * z_c; + if (n_cb == n) { + if (rv_index == 1) + return (bg == 1 ? ACC200_K0_1_1 : ACC200_K0_1_2) * z_c; + else if (rv_index == 2) + return (bg == 1 ? ACC200_K0_2_1 : ACC200_K0_2_2) * z_c; + else + return (bg == 1 ? ACC200_K0_3_1 : ACC200_K0_3_2) * z_c; + } + /* LBRM case - includes a division by N */ + if (unlikely(z_c == 0)) + return 0; + if (rv_index == 1) + return (((bg == 1 ? ACC200_K0_1_1 : ACC200_K0_1_2) * n_cb) + / n) * z_c; + else if (rv_index == 2) + return (((bg == 1 ? ACC200_K0_2_1 : ACC200_K0_2_2) * n_cb) + / n) * z_c; + else + return (((bg == 1 ? ACC200_K0_3_1 : ACC200_K0_3_2) * n_cb) + / n) * z_c; +} + +/* Fill in a frame control word for LDPC encoding. */ +static inline void +acc200_fcw_le_fill(const struct rte_bbdev_enc_op *op, + struct acc200_fcw_le *fcw, int num_cb, uint32_t default_e) +{ + fcw->qm = op->ldpc_enc.q_m; + fcw->nfiller = op->ldpc_enc.n_filler; + fcw->BG = (op->ldpc_enc.basegraph - 1); + fcw->Zc = op->ldpc_enc.z_c; + fcw->ncb = op->ldpc_enc.n_cb; + fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_enc.basegraph, + op->ldpc_enc.rv_index); + fcw->rm_e = (default_e == 0) ? op->ldpc_enc.cb_params.e : default_e; + fcw->crc_select = check_bit(op->ldpc_enc.op_flags, + RTE_BBDEV_LDPC_CRC_24B_ATTACH); + fcw->bypass_intlv = check_bit(op->ldpc_enc.op_flags, + RTE_BBDEV_LDPC_INTERLEAVER_BYPASS); + fcw->mcb_count = num_cb; +} + +/* Convert offset to harq index for harq_layout structure */ +static inline uint32_t hq_index(uint32_t offset) +{ + return (offset >> ACC200_HARQ_OFFSET_SHIFT) & ACC200_HARQ_OFFSET_MASK; +} + +/* Fill in a frame control word for LDPC decoding. */ +static inline void +acc200_fcw_ld_fill(struct rte_bbdev_dec_op *op, struct acc200_fcw_ld *fcw, + union acc200_harq_layout_data *harq_layout) +{ + uint16_t harq_out_length, harq_in_length, ncb_p, k0_p, parity_offset; + uint32_t harq_index; + uint32_t l; + bool harq_prun = false; + + fcw->qm = op->ldpc_dec.q_m; + fcw->nfiller = op->ldpc_dec.n_filler; + fcw->BG = (op->ldpc_dec.basegraph - 1); + fcw->Zc = op->ldpc_dec.z_c; + fcw->ncb = op->ldpc_dec.n_cb; + fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_dec.basegraph, + op->ldpc_dec.rv_index); + if (op->ldpc_dec.code_block_mode == RTE_BBDEV_CODE_BLOCK) + fcw->rm_e = op->ldpc_dec.cb_params.e; + else + fcw->rm_e = (op->ldpc_dec.tb_params.r < + op->ldpc_dec.tb_params.cab) ? + op->ldpc_dec.tb_params.ea : + op->ldpc_dec.tb_params.eb; + + if (unlikely(check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE) && + (op->ldpc_dec.harq_combined_input.length == 0))) { + rte_bbdev_log(WARNING, "Null HARQ input size provided"); + /* Disable HARQ input in that case to carry forward */ + op->ldpc_dec.op_flags ^= RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE; + } + if (unlikely(fcw->rm_e == 0)) { + rte_bbdev_log(WARNING, "Null E input provided"); + fcw->rm_e = 2; + } + + fcw->hcin_en = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE); + fcw->hcout_en = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE); + fcw->crc_select = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK); + fcw->bypass_dec = 0; + fcw->bypass_intlv = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS); + if (op->ldpc_dec.q_m == 1) { + fcw->bypass_intlv = 1; + fcw->qm = 2; + } + fcw->hcin_decomp_mode = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION); + fcw->hcout_comp_mode = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION); + fcw->llr_pack_mode = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_LLR_COMPRESSION); + harq_index = hq_index(op->ldpc_dec.harq_combined_output.offset); +#ifdef ACC200_EXT_MEM + /* Limit cases when HARQ pruning is valid */ + harq_prun = ((op->ldpc_dec.harq_combined_output.offset % + ACC200_HARQ_OFFSET) == 0); +#endif + if (fcw->hcin_en > 0) { + harq_in_length = op->ldpc_dec.harq_combined_input.length; + if (fcw->hcin_decomp_mode > 0) + harq_in_length = harq_in_length * 8 / 6; + harq_in_length = RTE_MIN(harq_in_length, op->ldpc_dec.n_cb + - op->ldpc_dec.n_filler); + harq_in_length = RTE_ALIGN_CEIL(harq_in_length, 64); + if ((harq_layout[harq_index].offset > 0) & harq_prun) { + rte_bbdev_log_debug("HARQ IN offset unexpected for now\n"); + fcw->hcin_size0 = harq_layout[harq_index].size0; + fcw->hcin_offset = harq_layout[harq_index].offset; + fcw->hcin_size1 = harq_in_length - + harq_layout[harq_index].offset; + } else { + fcw->hcin_size0 = harq_in_length; + fcw->hcin_offset = 0; + fcw->hcin_size1 = 0; + } + } else { + fcw->hcin_size0 = 0; + fcw->hcin_offset = 0; + fcw->hcin_size1 = 0; + } + + fcw->itmax = op->ldpc_dec.iter_max; + fcw->itstop = check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE); + fcw->cnu_algo = ACC200_ALGO_MSA; + fcw->synd_precoder = fcw->itstop; + /* + * These are all implicitly set + * fcw->synd_post = 0; + * fcw->so_en = 0; + * fcw->so_bypass_rm = 0; + * fcw->so_bypass_intlv = 0; + * fcw->dec_convllr = 0; + * fcw->hcout_convllr = 0; + * fcw->hcout_size1 = 0; + * fcw->so_it = 0; + * fcw->hcout_offset = 0; + * fcw->negstop_th = 0; + * fcw->negstop_it = 0; + * fcw->negstop_en = 0; + * fcw->gain_i = 1; + * fcw->gain_h = 1; + */ + if (fcw->hcout_en > 0) { + parity_offset = (op->ldpc_dec.basegraph == 1 ? 20 : 8) + * op->ldpc_dec.z_c - op->ldpc_dec.n_filler; + k0_p = (fcw->k0 > parity_offset) ? + fcw->k0 - op->ldpc_dec.n_filler : fcw->k0; + ncb_p = fcw->ncb - op->ldpc_dec.n_filler; + l = k0_p + fcw->rm_e; + harq_out_length = (uint16_t) fcw->hcin_size0; + harq_out_length = RTE_MIN(RTE_MAX(harq_out_length, l), ncb_p); + harq_out_length = RTE_ALIGN_CEIL(harq_out_length, 64); + if ((k0_p > fcw->hcin_size0 + ACC200_HARQ_OFFSET_THRESHOLD) && + harq_prun) { + fcw->hcout_size0 = (uint16_t) fcw->hcin_size0; + fcw->hcout_offset = k0_p & 0xFFC0; + fcw->hcout_size1 = harq_out_length - fcw->hcout_offset; + } else { + fcw->hcout_size0 = harq_out_length; + fcw->hcout_size1 = 0; + fcw->hcout_offset = 0; + } + harq_layout[harq_index].offset = fcw->hcout_offset; + harq_layout[harq_index].size0 = fcw->hcout_size0; + } else { + fcw->hcout_size0 = 0; + fcw->hcout_size1 = 0; + fcw->hcout_offset = 0; + } + + fcw->tb_crc_select = 0; + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK)) + fcw->tb_crc_select = 2; + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK)) + fcw->tb_crc_select = 1; +} + +/** + * Fills descriptor with data pointers of one block type. + * + * @param desc + * Pointer to DMA descriptor. + * @param input + * Pointer to pointer to input data which will be encoded. It can be changed + * and points to next segment in scatter-gather case. + * @param offset + * Input offset in rte_mbuf structure. It is used for calculating the point + * where data is starting. + * @param cb_len + * Length of currently processed Code Block + * @param seg_total_left + * It indicates how many bytes still left in segment (mbuf) for further + * processing. + * @param op_flags + * Store information about device capabilities + * @param next_triplet + * Index for ACC200 DMA Descriptor triplet + * @param scattergather + * Flag to support scatter-gather for the mbuf + * + * @return + * Returns index of next triplet on success, other value if lengths of + * pkt and processed cb do not match. + * + */ +static inline int +acc200_dma_fill_blk_type_in(struct acc200_dma_req_desc *desc, + struct rte_mbuf **input, uint32_t *offset, uint32_t cb_len, + uint32_t *seg_total_left, int next_triplet, + bool scattergather) +{ + uint32_t part_len; + struct rte_mbuf *m = *input; + if (scattergather) + part_len = (*seg_total_left < cb_len) ? + *seg_total_left : cb_len; + else + part_len = cb_len; + cb_len -= part_len; + *seg_total_left -= part_len; + + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(m, *offset); + desc->data_ptrs[next_triplet].blen = part_len; + desc->data_ptrs[next_triplet].blkid = ACC200_DMA_BLKID_IN; + desc->data_ptrs[next_triplet].last = 0; + desc->data_ptrs[next_triplet].dma_ext = 0; + *offset += part_len; + next_triplet++; + + while (cb_len > 0) { + if (next_triplet < ACC200_DMA_MAX_NUM_POINTERS_IN && m->next != NULL) { + + m = m->next; + *seg_total_left = rte_pktmbuf_data_len(m); + part_len = (*seg_total_left < cb_len) ? + *seg_total_left : + cb_len; + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(m, 0); + desc->data_ptrs[next_triplet].blen = part_len; + desc->data_ptrs[next_triplet].blkid = + ACC200_DMA_BLKID_IN; + desc->data_ptrs[next_triplet].last = 0; + desc->data_ptrs[next_triplet].dma_ext = 0; + cb_len -= part_len; + *seg_total_left -= part_len; + /* Initializing offset for next segment (mbuf) */ + *offset = part_len; + next_triplet++; + } else { + rte_bbdev_log(ERR, + "Some data still left for processing: " + "data_left: %u, next_triplet: %u, next_mbuf: %p", + cb_len, next_triplet, m->next); + return -EINVAL; + } + } + /* Storing new mbuf as it could be changed in scatter-gather case*/ + *input = m; + + return next_triplet; +} + +/* Fills descriptor with data pointers of one block type. + * Returns index of next triplet + */ +static inline int +acc200_dma_fill_blk_type(struct acc200_dma_req_desc *desc, + struct rte_mbuf *mbuf, uint32_t offset, + uint32_t len, int next_triplet, int blk_id) +{ + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(mbuf, offset); + desc->data_ptrs[next_triplet].blen = len; + desc->data_ptrs[next_triplet].blkid = blk_id; + desc->data_ptrs[next_triplet].last = 0; + desc->data_ptrs[next_triplet].dma_ext = 0; + next_triplet++; + + return next_triplet; +} + +static inline void +acc200_header_init(struct acc200_dma_req_desc *desc) +{ + desc->word0 = ACC200_DMA_DESC_TYPE; + desc->word1 = 0; /**< Timestamp could be disabled */ + desc->word2 = 0; + desc->word3 = 0; + desc->numCBs = 1; +} + +#ifdef RTE_LIBRTE_BBDEV_DEBUG +/* Check if any input data is unexpectedly left for processing */ +static inline int +check_mbuf_total_left(uint32_t mbuf_total_left) +{ + if (mbuf_total_left == 0) + return 0; + rte_bbdev_log(ERR, + "Some date still left for processing: mbuf_total_left = %u", + mbuf_total_left); + return -EINVAL; +} +#endif + +static inline int +acc200_dma_desc_le_fill(struct rte_bbdev_enc_op *op, + struct acc200_dma_req_desc *desc, struct rte_mbuf **input, + struct rte_mbuf *output, uint32_t *in_offset, + uint32_t *out_offset, uint32_t *out_length, + uint32_t *mbuf_total_left, uint32_t *seg_total_left) +{ + int next_triplet = 1; /* FCW already done */ + uint16_t K, in_length_in_bits, in_length_in_bytes; + struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc; + + acc200_header_init(desc); + K = (enc->basegraph == 1 ? 22 : 10) * enc->z_c; + in_length_in_bits = K - enc->n_filler; + if ((enc->op_flags & RTE_BBDEV_LDPC_CRC_24A_ATTACH) || + (enc->op_flags & RTE_BBDEV_LDPC_CRC_24B_ATTACH)) + in_length_in_bits -= 24; + in_length_in_bytes = in_length_in_bits >> 3; + + if (unlikely((*mbuf_total_left == 0) || + (*mbuf_total_left < in_length_in_bytes))) { + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u", + *mbuf_total_left, in_length_in_bytes); + return -1; + } + + next_triplet = acc200_dma_fill_blk_type_in(desc, input, in_offset, + in_length_in_bytes, + seg_total_left, next_triplet, + check_bit(op->ldpc_enc.op_flags, + RTE_BBDEV_LDPC_ENC_SCATTER_GATHER)); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + desc->data_ptrs[next_triplet - 1].last = 1; + desc->m2dlen = next_triplet; + *mbuf_total_left -= in_length_in_bytes; + + /* Set output length */ + /* Integer round up division by 8 */ + *out_length = (enc->cb_params.e + 7) >> 3; + + next_triplet = acc200_dma_fill_blk_type(desc, output, *out_offset, + *out_length, next_triplet, ACC200_DMA_BLKID_OUT_ENC); + op->ldpc_enc.output.length += *out_length; + *out_offset += *out_length; + desc->data_ptrs[next_triplet - 1].last = 1; + desc->data_ptrs[next_triplet - 1].dma_ext = 0; + desc->d2mlen = next_triplet - desc->m2dlen; + + desc->op_addr = op; + + return 0; +} + +static inline int +acc200_dma_desc_ld_fill(struct rte_bbdev_dec_op *op, + struct acc200_dma_req_desc *desc, + struct rte_mbuf **input, struct rte_mbuf *h_output, + uint32_t *in_offset, uint32_t *h_out_offset, + uint32_t *h_out_length, uint32_t *mbuf_total_left, + uint32_t *seg_total_left, + struct acc200_fcw_ld *fcw) +{ + struct rte_bbdev_op_ldpc_dec *dec = &op->ldpc_dec; + int next_triplet = 1; /* FCW already done */ + uint32_t input_length; + uint16_t output_length, crc24_overlap = 0; + uint16_t sys_cols, K, h_p_size, h_np_size; + bool h_comp = check_bit(dec->op_flags, + RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION); + + acc200_header_init(desc); + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP)) + crc24_overlap = 24; + + /* Compute some LDPC BG lengths */ + input_length = fcw->rm_e; + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_LLR_COMPRESSION)) + input_length = (input_length * 3 + 3) / 4; + sys_cols = (dec->basegraph == 1) ? 22 : 10; + K = sys_cols * dec->z_c; + output_length = K - dec->n_filler - crc24_overlap; + + if (unlikely((*mbuf_total_left == 0) || + (*mbuf_total_left < input_length))) { + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u", + *mbuf_total_left, input_length); + return -1; + } + + next_triplet = acc200_dma_fill_blk_type_in(desc, input, + in_offset, input_length, + seg_total_left, next_triplet, + check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_DEC_SCATTER_GATHER)); + + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE)) { + if (op->ldpc_dec.harq_combined_input.data == 0) { + rte_bbdev_log(ERR, "HARQ input is not defined"); + return -1; + } + h_p_size = fcw->hcin_size0 + fcw->hcin_size1; + if (h_comp) + h_p_size = (h_p_size * 3 + 3) / 4; + if (op->ldpc_dec.harq_combined_input.data == 0) { + rte_bbdev_log(ERR, "HARQ input is not defined"); + return -1; + } + acc200_dma_fill_blk_type( + desc, + op->ldpc_dec.harq_combined_input.data, + op->ldpc_dec.harq_combined_input.offset, + h_p_size, + next_triplet, + ACC200_DMA_BLKID_IN_HARQ); + next_triplet++; + } + + desc->data_ptrs[next_triplet - 1].last = 1; + desc->m2dlen = next_triplet; + *mbuf_total_left -= input_length; + + next_triplet = acc200_dma_fill_blk_type(desc, h_output, + *h_out_offset, output_length >> 3, next_triplet, + ACC200_DMA_BLKID_OUT_HARD); + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE)) { + if (op->ldpc_dec.harq_combined_output.data == 0) { + rte_bbdev_log(ERR, "HARQ output is not defined"); + return -1; + } + + /* Pruned size of the HARQ */ + h_p_size = fcw->hcout_size0 + fcw->hcout_size1; + /* Non-Pruned size of the HARQ */ + h_np_size = fcw->hcout_offset > 0 ? + fcw->hcout_offset + fcw->hcout_size1 : + h_p_size; + if (h_comp) { + h_np_size = (h_np_size * 3 + 3) / 4; + h_p_size = (h_p_size * 3 + 3) / 4; + } + dec->harq_combined_output.length = h_np_size; + acc200_dma_fill_blk_type( + desc, + dec->harq_combined_output.data, + dec->harq_combined_output.offset, + h_p_size, + next_triplet, + ACC200_DMA_BLKID_OUT_HARQ); + + next_triplet++; + } + + *h_out_length = output_length >> 3; + dec->hard_output.length += *h_out_length; + *h_out_offset += *h_out_length; + desc->data_ptrs[next_triplet - 1].last = 1; + desc->d2mlen = next_triplet - desc->m2dlen; + + desc->op_addr = op; + + return 0; +} + +static inline void +acc200_dma_desc_ld_update(struct rte_bbdev_dec_op *op, + struct acc200_dma_req_desc *desc, + struct rte_mbuf *input, struct rte_mbuf *h_output, + uint32_t *in_offset, uint32_t *h_out_offset, + uint32_t *h_out_length, + union acc200_harq_layout_data *harq_layout) +{ + int next_triplet = 1; /* FCW already done */ + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(input, *in_offset); + next_triplet++; + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE)) { + struct rte_bbdev_op_data hi = op->ldpc_dec.harq_combined_input; + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(hi.data, hi.offset); + next_triplet++; + } + + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(h_output, *h_out_offset); + *h_out_length = desc->data_ptrs[next_triplet].blen; + next_triplet++; + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE)) { + /* Adjust based on previous operation */ + struct rte_bbdev_dec_op *prev_op = desc->op_addr; + op->ldpc_dec.harq_combined_output.length = + prev_op->ldpc_dec.harq_combined_output.length; + uint32_t harq_idx = hq_index( + op->ldpc_dec.harq_combined_output.offset); + uint32_t prev_harq_idx = hq_index( + prev_op->ldpc_dec.harq_combined_output.offset); + harq_layout[harq_idx].val = harq_layout[prev_harq_idx].val; + struct rte_bbdev_op_data ho = + op->ldpc_dec.harq_combined_output; + desc->data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(ho.data, ho.offset); + next_triplet++; + } + + op->ldpc_dec.hard_output.length += *h_out_length; + desc->op_addr = op; +} + + +/* Enqueue a number of operations to HW and update software rings */ +static inline void +acc200_dma_enqueue(struct acc200_queue *q, uint16_t n, + struct rte_bbdev_stats *queue_stats) +{ + union acc200_enqueue_reg_fmt enq_req; +#ifdef RTE_BBDEV_OFFLOAD_COST + uint64_t start_time = 0; + queue_stats->acc_offload_cycles = 0; +#else + RTE_SET_USED(queue_stats); +#endif + + enq_req.val = 0; + /* Setting offset, 100b for 256 DMA Desc */ + enq_req.addr_offset = ACC200_DESC_OFFSET; + + /* Split ops into batches */ + do { + union acc200_dma_desc *desc; + uint16_t enq_batch_size; + uint64_t offset; + rte_iova_t req_elem_addr; + + enq_batch_size = RTE_MIN(n, MAX_ENQ_BATCH_SIZE); + + /* Set flag on last descriptor in a batch */ + desc = q->ring_addr + ((q->sw_ring_head + enq_batch_size - 1) & + q->sw_ring_wrap_mask); + desc->req.last_desc_in_batch = 1; + + /* Calculate the 1st descriptor's address */ + offset = ((q->sw_ring_head & q->sw_ring_wrap_mask) * + sizeof(union acc200_dma_desc)); + req_elem_addr = q->ring_addr_iova + offset; + + /* Fill enqueue struct */ + enq_req.num_elem = enq_batch_size; + /* low 6 bits are not needed */ + enq_req.req_elem_addr = (uint32_t)(req_elem_addr >> 6); + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "Req sdone", desc, sizeof(*desc)); +#endif + rte_bbdev_log_debug( + "Enqueue %u reqs (phys %#"PRIx64") to reg %p\n", + enq_batch_size, + req_elem_addr, + (void *)q->mmio_reg_enqueue); + + rte_wmb(); + +#ifdef RTE_BBDEV_OFFLOAD_COST + /* Start time measurement for enqueue function offload. */ + start_time = rte_rdtsc_precise(); +#endif + rte_bbdev_log(DEBUG, "Debug : MMIO Enqueue"); + mmio_write(q->mmio_reg_enqueue, enq_req.val); + +#ifdef RTE_BBDEV_OFFLOAD_COST + queue_stats->acc_offload_cycles += + rte_rdtsc_precise() - start_time; +#endif + + q->aq_enqueued++; + q->sw_ring_head += enq_batch_size; + n -= enq_batch_size; + + } while (n); + + +} + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + +/* Validates LDPC encoder parameters */ +static inline int +validate_ldpc_enc_op(struct rte_bbdev_enc_op *op) +{ + struct rte_bbdev_op_ldpc_enc *ldpc_enc = &op->ldpc_enc; + + /* Check Zc is valid value */ + if ((ldpc_enc->z_c > 384) || (ldpc_enc->z_c < 2)) { + rte_bbdev_log(ERR, + "Zc (%u) is out of range", + ldpc_enc->z_c); + return -1; + } + if (ldpc_enc->z_c > 256) { + if ((ldpc_enc->z_c % 32) != 0) { + rte_bbdev_log(ERR, "Invalid Zc %d", ldpc_enc->z_c); + return -1; + } + } else if (ldpc_enc->z_c > 128) { + if ((ldpc_enc->z_c % 16) != 0) { + rte_bbdev_log(ERR, "Invalid Zc %d", ldpc_enc->z_c); + return -1; + } + } else if (ldpc_enc->z_c > 64) { + if ((ldpc_enc->z_c % 8) != 0) { + rte_bbdev_log(ERR, "Invalid Zc %d", ldpc_enc->z_c); + return -1; + } + } else if (ldpc_enc->z_c > 32) { + if ((ldpc_enc->z_c % 4) != 0) { + rte_bbdev_log(ERR, "Invalid Zc %d", ldpc_enc->z_c); + return -1; + } + } else if (ldpc_enc->z_c > 16) { + if ((ldpc_enc->z_c % 2) != 0) { + rte_bbdev_log(ERR, "Invalid Zc %d", ldpc_enc->z_c); + return -1; + } + } + return 0; +} + +/* Validates LDPC decoder parameters */ +static inline int +validate_ldpc_dec_op(struct rte_bbdev_dec_op *op) +{ + struct rte_bbdev_op_ldpc_dec *ldpc_dec = &op->ldpc_dec; + /* Check Zc is valid value */ + if ((ldpc_dec->z_c > 384) || (ldpc_dec->z_c < 2)) { + rte_bbdev_log(ERR, + "Zc (%u) is out of range", + ldpc_dec->z_c); + return -1; + } + if (ldpc_dec->z_c > 256) { + if ((ldpc_dec->z_c % 32) != 0) { + rte_bbdev_log(ERR, "Invalid Zc %d", ldpc_dec->z_c); + return -1; + } + } else if (ldpc_dec->z_c > 128) { + if ((ldpc_dec->z_c % 16) != 0) { + rte_bbdev_log(ERR, "Invalid Zc %d", ldpc_dec->z_c); + return -1; + } + } else if (ldpc_dec->z_c > 64) { + if ((ldpc_dec->z_c % 8) != 0) { + rte_bbdev_log(ERR, "Invalid Zc %d", ldpc_dec->z_c); + return -1; + } + } else if (ldpc_dec->z_c > 32) { + if ((ldpc_dec->z_c % 4) != 0) { + rte_bbdev_log(ERR, "Invalid Zc %d", ldpc_dec->z_c); + return -1; + } + } else if (ldpc_dec->z_c > 16) { + if ((ldpc_dec->z_c % 2) != 0) { + rte_bbdev_log(ERR, "Invalid Zc %d", ldpc_dec->z_c); + return -1; + } + } + return 0; +} + +#endif + +/* Enqueue one encode operations for ACC200 device in CB mode + * multiplexed on the same descriptor + */ +static inline int +enqueue_ldpc_enc_n_op_cb(struct acc200_queue *q, struct rte_bbdev_enc_op **ops, + uint16_t total_enqueued_descs, int16_t num) +{ + union acc200_dma_desc *desc = NULL; + uint32_t out_length; + struct rte_mbuf *output_head, *output; + int i, next_triplet; + uint16_t in_length_in_bytes; + struct rte_bbdev_op_ldpc_enc *enc = &ops[0]->ldpc_enc; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (validate_ldpc_enc_op(ops[0]) == -1) { + rte_bbdev_log(ERR, "LDPC encoder validation failed"); + return -EINVAL; + } +#endif + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_descs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + acc200_fcw_le_fill(ops[0], &desc->req.fcw_le, num, 0); + + /** This could be done at polling */ + acc200_header_init(&desc->req); + desc->req.numCBs = num; + + in_length_in_bytes = ops[0]->ldpc_enc.input.data->data_len; + out_length = (enc->cb_params.e + 7) >> 3; + desc->req.m2dlen = 1 + num; + desc->req.d2mlen = num; + next_triplet = 1; + + for (i = 0; i < num; i++) { + desc->req.data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(ops[i]->ldpc_enc.input.data, 0); + desc->req.data_ptrs[next_triplet].blen = in_length_in_bytes; + next_triplet++; + desc->req.data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset( + ops[i]->ldpc_enc.output.data, 0); + desc->req.data_ptrs[next_triplet].blen = out_length; + next_triplet++; + ops[i]->ldpc_enc.output.length = out_length; + output_head = output = ops[i]->ldpc_enc.output.data; + mbuf_append(output_head, output, out_length); + output->data_len = out_length; + } + + desc->req.op_addr = ops[0]; + /* Keep track of pointers even when multiplexed in single descriptor */ + struct acc200_ptrs *context_ptrs = q->companion_ring_addr + desc_idx; + for (i = 0; i < num; i++) + context_ptrs->ptr[i].op_addr = ops[i]; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_le, + sizeof(desc->req.fcw_le) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + + /* One CB (one op) was successfully prepared to enqueue */ + return num; +} + +/* Enqueue one encode operations for ACC200 device for a partial TB + * all codes blocks have same configuration multiplexed on the same descriptor + */ +static inline void +enqueue_ldpc_enc_part_tb(struct acc200_queue *q, struct rte_bbdev_enc_op *op, + uint16_t total_enqueued_descs, int16_t num_cbs, uint32_t e, + uint16_t in_len_B, uint32_t out_len_B, uint32_t *in_offset, + uint32_t *out_offset) +{ + + union acc200_dma_desc *desc = NULL; + struct rte_mbuf *output_head, *output; + int i, next_triplet; + struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc; + + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_descs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + acc200_fcw_le_fill(op, &desc->req.fcw_le, num_cbs, e); + + /** This could be done at polling */ + acc200_header_init(&desc->req); + desc->req.numCBs = num_cbs; + + desc->req.m2dlen = 1 + num_cbs; + desc->req.d2mlen = num_cbs; + next_triplet = 1; + + for (i = 0; i < num_cbs; i++) { + desc->req.data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset(enc->input.data, + *in_offset); + *in_offset += in_len_B; + desc->req.data_ptrs[next_triplet].blen = in_len_B; + next_triplet++; + desc->req.data_ptrs[next_triplet].address = + rte_pktmbuf_iova_offset( + enc->output.data, *out_offset); + *out_offset += out_len_B; + desc->req.data_ptrs[next_triplet].blen = out_len_B; + next_triplet++; + enc->output.length += out_len_B; + output_head = output = enc->output.data; + mbuf_append(output_head, output, out_len_B); + } + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_le, + sizeof(desc->req.fcw_le) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + +} + +/* Enqueue one encode operations for ACC200 device in CB mode */ +static inline int +enqueue_ldpc_enc_one_op_cb(struct acc200_queue *q, struct rte_bbdev_enc_op *op, + uint16_t total_enqueued_cbs) +{ + union acc200_dma_desc *desc = NULL; + int ret; + uint32_t in_offset, out_offset, out_length, mbuf_total_left, + seg_total_left; + struct rte_mbuf *input, *output_head, *output; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (validate_ldpc_enc_op(op) == -1) { + rte_bbdev_log(ERR, "LDPC encoder validation failed"); + return -EINVAL; + } +#endif + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + acc200_fcw_le_fill(op, &desc->req.fcw_le, 1, 0); + + input = op->ldpc_enc.input.data; + output_head = output = op->ldpc_enc.output.data; + in_offset = op->ldpc_enc.input.offset; + out_offset = op->ldpc_enc.output.offset; + out_length = 0; + mbuf_total_left = op->ldpc_enc.input.length; + seg_total_left = rte_pktmbuf_data_len(op->ldpc_enc.input.data) + - in_offset; + + ret = acc200_dma_desc_le_fill(op, &desc->req, &input, output, + &in_offset, &out_offset, &out_length, &mbuf_total_left, + &seg_total_left); + + if (unlikely(ret < 0)) + return ret; + + mbuf_append(output_head, output, out_length); + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_le, + sizeof(desc->req.fcw_le) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); + + if (check_mbuf_total_left(mbuf_total_left) != 0) + return -EINVAL; +#endif + /* One CB (one op) was successfully prepared to enqueue */ + return 1; +} + +/* Enqueue one encode operations for ACC200 device in TB mode. + * returns the number of descs used + */ +static inline int +enqueue_ldpc_enc_one_op_tb(struct acc200_queue *q, struct rte_bbdev_enc_op *op, + uint16_t enq_descs, uint8_t cbs_in_tb) +{ + uint8_t num_a, num_b; + uint16_t desc_idx; + uint8_t r = op->ldpc_enc.tb_params.r; + uint8_t cab = op->ldpc_enc.tb_params.cab; + union acc200_dma_desc *desc; + uint16_t init_enq_descs = enq_descs; + uint16_t input_len_B = ((op->ldpc_enc.basegraph == 1 ? 22 : 10) * + op->ldpc_enc.z_c) >> 3; + if (check_bit(op->ldpc_enc.op_flags, RTE_BBDEV_LDPC_CRC_24B_ATTACH)) + input_len_B -= 3; + + if (r < cab) { + num_a = cab - r; + num_b = cbs_in_tb - cab; + } else { + num_a = 0; + num_b = cbs_in_tb - r; + } + uint32_t in_offset = 0, out_offset = 0; + + while (num_a > 0) { + uint32_t e = op->ldpc_enc.tb_params.ea; + uint32_t out_len_B = (e + 7) >> 3; + uint8_t enq = RTE_MIN(num_a, ACC200_MUX_5GDL_DESC); + num_a -= enq; + enqueue_ldpc_enc_part_tb(q, op, enq_descs, enq, e, input_len_B, + out_len_B, &in_offset, &out_offset); + enq_descs++; + } + while (num_b > 0) { + uint32_t e = op->ldpc_enc.tb_params.eb; + uint32_t out_len_B = (e + 7) >> 3; + uint8_t enq = RTE_MIN(num_b, ACC200_MUX_5GDL_DESC); + num_b -= enq; + enqueue_ldpc_enc_part_tb(q, op, enq_descs, enq, e, input_len_B, + out_len_B, &in_offset, &out_offset); + enq_descs++; + } + + uint16_t return_descs = enq_descs - init_enq_descs; + /* Keep total number of CBs in first TB */ + desc_idx = ((q->sw_ring_head + init_enq_descs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + desc->req.cbs_in_tb = return_descs; /** Actual number of descriptors */ + desc->req.op_addr = op; + + /* Set SDone on last CB descriptor for TB mode. */ + desc_idx = ((q->sw_ring_head + enq_descs - 1) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + desc->req.op_addr = op; + return return_descs; +} + +/** Enqueue one decode operations for ACC200 device in CB mode */ +static inline int +enqueue_ldpc_dec_one_op_cb(struct acc200_queue *q, struct rte_bbdev_dec_op *op, + uint16_t total_enqueued_cbs, bool same_op) +{ + int ret, hq_len; + if (op->ldpc_dec.cb_params.e == 0) + return -EINVAL; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (validate_ldpc_dec_op(op) == -1) { + rte_bbdev_log(ERR, "LDPC decoder validation failed"); + return -EINVAL; + } +#endif + + union acc200_dma_desc *desc; + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + struct rte_mbuf *input, *h_output_head, *h_output; + uint32_t in_offset, h_out_offset, mbuf_total_left, h_out_length = 0; + input = op->ldpc_dec.input.data; + h_output_head = h_output = op->ldpc_dec.hard_output.data; + in_offset = op->ldpc_dec.input.offset; + h_out_offset = op->ldpc_dec.hard_output.offset; + mbuf_total_left = op->ldpc_dec.input.length; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(input == NULL)) { + rte_bbdev_log(ERR, "Invalid mbuf pointer"); + return -EFAULT; + } +#endif + union acc200_harq_layout_data *harq_layout = q->d->harq_layout; + + if (same_op) { + union acc200_dma_desc *prev_desc; + desc_idx = ((q->sw_ring_head + total_enqueued_cbs - 1) + & q->sw_ring_wrap_mask); + prev_desc = q->ring_addr + desc_idx; + uint8_t *prev_ptr = (uint8_t *) prev_desc; + uint8_t *new_ptr = (uint8_t *) desc; + /* Copy first 4 words and BDESCs */ + rte_memcpy(new_ptr, prev_ptr, ACC200_5GUL_SIZE_0); + rte_memcpy(new_ptr + ACC200_5GUL_OFFSET_0, + prev_ptr + ACC200_5GUL_OFFSET_0, + ACC200_5GUL_SIZE_1); + desc->req.op_addr = prev_desc->req.op_addr; + /* Copy FCW */ + rte_memcpy(new_ptr + ACC200_DESC_FCW_OFFSET, + prev_ptr + ACC200_DESC_FCW_OFFSET, + ACC200_FCW_LD_BLEN); + acc200_dma_desc_ld_update(op, &desc->req, input, h_output, + &in_offset, &h_out_offset, + &h_out_length, harq_layout); + } else { + struct acc200_fcw_ld *fcw; + uint32_t seg_total_left; + fcw = &desc->req.fcw_ld; + acc200_fcw_ld_fill(op, fcw, harq_layout); + + /* Special handling when using mbuf or not */ + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_DEC_SCATTER_GATHER)) + seg_total_left = rte_pktmbuf_data_len(input) + - in_offset; + else + seg_total_left = fcw->rm_e; + + ret = acc200_dma_desc_ld_fill(op, &desc->req, &input, h_output, + &in_offset, &h_out_offset, + &h_out_length, &mbuf_total_left, + &seg_total_left, fcw); + if (unlikely(ret < 0)) + return ret; + } + + /* Hard output */ + mbuf_append(h_output_head, h_output, h_out_length); + if (op->ldpc_dec.harq_combined_output.length > 0) { + /* Push the HARQ output into host memory */ + struct rte_mbuf *hq_output_head, *hq_output; + hq_output_head = op->ldpc_dec.harq_combined_output.data; + hq_output = op->ldpc_dec.harq_combined_output.data; + hq_len = op->ldpc_dec.harq_combined_output.length; + if (unlikely(!mbuf_append(hq_output_head, hq_output, + hq_len))) { + rte_bbdev_log(ERR, "HARQ output mbuf issue %d %d\n", + hq_output->buf_len, + hq_len); + return -1; + } + } + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_ld, + sizeof(desc->req.fcw_ld) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + + /* One CB (one op) was successfully prepared to enqueue */ + return 1; +} + + +/* Enqueue one decode operations for ACC200 device in TB mode */ +static inline int +enqueue_ldpc_dec_one_op_tb(struct acc200_queue *q, struct rte_bbdev_dec_op *op, + uint16_t total_enqueued_cbs, uint8_t cbs_in_tb) +{ + union acc200_dma_desc *desc = NULL; + union acc200_dma_desc *desc_first = NULL; + int ret; + uint8_t r, c; + uint32_t in_offset, h_out_offset, + h_out_length, mbuf_total_left, seg_total_left; + struct rte_mbuf *input, *h_output_head, *h_output; + uint16_t current_enqueued_cbs = 0; + uint16_t sys_cols, trail_len = 0; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (validate_ldpc_dec_op(op) == -1) { + rte_bbdev_log(ERR, "LDPC decoder validation failed"); + return -EINVAL; + } +#endif + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + desc_first = desc; + uint64_t fcw_offset = (desc_idx << 8) + ACC200_DESC_FCW_OFFSET; + union acc200_harq_layout_data *harq_layout = q->d->harq_layout; + acc200_fcw_ld_fill(op, &desc->req.fcw_ld, harq_layout); + + input = op->ldpc_dec.input.data; + h_output_head = h_output = op->ldpc_dec.hard_output.data; + in_offset = op->ldpc_dec.input.offset; + h_out_offset = op->ldpc_dec.hard_output.offset; + h_out_length = 0; + mbuf_total_left = op->ldpc_dec.input.length; + c = op->ldpc_dec.tb_params.c; + r = op->ldpc_dec.tb_params.r; + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK)) { + sys_cols = (op->ldpc_dec.basegraph == 1) ? 22 : 10; + trail_len = sys_cols * op->ldpc_dec.z_c - + op->ldpc_dec.n_filler - 24; + } + + while (mbuf_total_left > 0 && r < c) { + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_DEC_SCATTER_GATHER)) + seg_total_left = rte_pktmbuf_data_len(input) + - in_offset; + else + seg_total_left = op->ldpc_dec.input.length; + /* Set up DMA descriptor */ + desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + fcw_offset = (desc_idx << 8) + ACC200_DESC_FCW_OFFSET; + desc->req.data_ptrs[0].address = q->ring_addr_iova + + fcw_offset; + desc->req.data_ptrs[0].blen = ACC200_FCW_LD_BLEN; + rte_memcpy(&desc->req.fcw_ld, &desc_first->req.fcw_ld, + ACC200_FCW_LD_BLEN); + desc->req.fcw_ld.tb_trailer_size = (c - r - 1) * trail_len; + + ret = acc200_dma_desc_ld_fill(op, &desc->req, &input, + h_output, &in_offset, &h_out_offset, + &h_out_length, + &mbuf_total_left, &seg_total_left, + &desc->req.fcw_ld); + + if (unlikely(ret < 0)) + return ret; + + /* Hard output */ + mbuf_append(h_output_head, h_output, h_out_length); + + /* Set total number of CBs in TB */ + desc->req.cbs_in_tb = cbs_in_tb; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_td, + sizeof(desc->req.fcw_td) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_DEC_SCATTER_GATHER) + && (seg_total_left == 0)) { + /* Go to the next mbuf */ + input = input->next; + in_offset = 0; + h_output = h_output->next; + h_out_offset = 0; + } + total_enqueued_cbs++; + current_enqueued_cbs++; + r++; + } + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (check_mbuf_total_left(mbuf_total_left) != 0) + return -EINVAL; +#endif + /* Set SDone on last CB descriptor for TB mode */ + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + return current_enqueued_cbs; +} + +/* Calculates number of CBs in processed encoder TB based on 'r' and input + * length. + */ +static inline uint8_t +get_num_cbs_in_tb_ldpc_enc(struct rte_bbdev_op_ldpc_enc *ldpc_enc) +{ + uint8_t c, r, crc24_bits = 0; + uint16_t k = (ldpc_enc->basegraph == 1 ? 22 : 10) * ldpc_enc->z_c + - ldpc_enc->n_filler; + uint8_t cbs_in_tb = 0; + int32_t length; + + length = ldpc_enc->input.length; + r = ldpc_enc->tb_params.r; + c = ldpc_enc->tb_params.c; + crc24_bits = 0; + if (check_bit(ldpc_enc->op_flags, RTE_BBDEV_LDPC_CRC_24B_ATTACH)) + crc24_bits = 24; + while (length > 0 && r < c) { + length -= (k - crc24_bits) >> 3; + r++; + cbs_in_tb++; + } + + return cbs_in_tb; +} + +/* Calculates number of CBs in processed encoder TB based on 'r' and input + * length. + */ +static inline uint8_t +get_num_cbs_in_tb_enc(struct rte_bbdev_op_turbo_enc *turbo_enc) +{ + uint8_t c, c_neg, r, crc24_bits = 0; + uint16_t k, k_neg, k_pos; + uint8_t cbs_in_tb = 0; + int32_t length; + + length = turbo_enc->input.length; + r = turbo_enc->tb_params.r; + c = turbo_enc->tb_params.c; + c_neg = turbo_enc->tb_params.c_neg; + k_neg = turbo_enc->tb_params.k_neg; + k_pos = turbo_enc->tb_params.k_pos; + crc24_bits = 0; + if (check_bit(turbo_enc->op_flags, RTE_BBDEV_TURBO_CRC_24B_ATTACH)) + crc24_bits = 24; + while (length > 0 && r < c) { + k = (r < c_neg) ? k_neg : k_pos; + length -= (k - crc24_bits) >> 3; + r++; + cbs_in_tb++; + } + + return cbs_in_tb; +} + +/* Calculates number of CBs in processed decoder TB based on 'r' and input + * length. + */ +static inline uint16_t +get_num_cbs_in_tb_dec(struct rte_bbdev_op_turbo_dec *turbo_dec) +{ + uint8_t c, c_neg, r = 0; + uint16_t kw, k, k_neg, k_pos, cbs_in_tb = 0; + int32_t length; + + length = turbo_dec->input.length; + r = turbo_dec->tb_params.r; + c = turbo_dec->tb_params.c; + c_neg = turbo_dec->tb_params.c_neg; + k_neg = turbo_dec->tb_params.k_neg; + k_pos = turbo_dec->tb_params.k_pos; + while (length > 0 && r < c) { + k = (r < c_neg) ? k_neg : k_pos; + kw = RTE_ALIGN_CEIL(k + 4, 32) * 3; + length -= kw; + r++; + cbs_in_tb++; + } + + return cbs_in_tb; +} + +/* Calculates number of CBs in processed decoder TB based on 'r' and input + * length. + */ +static inline uint16_t +get_num_cbs_in_tb_ldpc_dec(struct rte_bbdev_op_ldpc_dec *ldpc_dec) +{ + uint16_t r, cbs_in_tb = 0; + int32_t length = ldpc_dec->input.length; + r = ldpc_dec->tb_params.r; + while (length > 0 && r < ldpc_dec->tb_params.c) { + length -= (r < ldpc_dec->tb_params.cab) ? + ldpc_dec->tb_params.ea : + ldpc_dec->tb_params.eb; + r++; + cbs_in_tb++; + } + return cbs_in_tb; +} + +static inline void +acc200_enqueue_status(struct rte_bbdev_queue_data *q_data, + enum rte_bbdev_enqueue_status status) +{ + q_data->enqueue_status = status; + q_data->queue_stats.enqueue_status_count[status]++; + rte_bbdev_log(WARNING, "Enqueue Status: %s %#"PRIx64"", + rte_bbdev_enqueue_status_str(status), + q_data->queue_stats.enqueue_status_count[status]); +} + +static inline void +acc200_enqueue_invalid(struct rte_bbdev_queue_data *q_data) +{ + acc200_enqueue_status(q_data, RTE_BBDEV_ENQ_STATUS_INVALID_OP); +} + +static inline void +acc200_enqueue_ring_full(struct rte_bbdev_queue_data *q_data) +{ + acc200_enqueue_status(q_data, RTE_BBDEV_ENQ_STATUS_RING_FULL); +} + +static inline void +acc200_enqueue_queue_full(struct rte_bbdev_queue_data *q_data) +{ + acc200_enqueue_status(q_data, RTE_BBDEV_ENQ_STATUS_QUEUE_FULL); +} + +/* Number of available descriptor in ring to enqueue */ +static uint32_t +acc200_ring_avail_enq(struct acc200_queue *q) +{ + return (q->sw_ring_depth - 1 + q->sw_ring_tail - q->sw_ring_head) % q->sw_ring_depth; +} + +/* Number of available descriptor in ring to dequeue */ +static uint32_t +acc200_ring_avail_deq(struct acc200_queue *q) +{ + return (q->sw_ring_depth + q->sw_ring_head - q->sw_ring_tail) % q->sw_ring_depth; +} + +/* Check we can mux encode operations with common FCW */ +static inline int16_t +check_mux(struct rte_bbdev_enc_op **ops, uint16_t num) { + uint16_t i; + if (num <= 1) + return 1; + for (i = 1; i < num; ++i) { + /* Only mux compatible code blocks */ + if (memcmp((uint8_t *)(&ops[i]->ldpc_enc) + ACC200_ENC_OFFSET, + (uint8_t *)(&ops[0]->ldpc_enc) + + ACC200_ENC_OFFSET, + ACC200_CMP_ENC_SIZE) != 0) + return i; + } + /* Avoid multiplexing small inbound size frames */ + int Kp = (ops[0]->ldpc_enc.basegraph == 1 ? 22 : 10) * + ops[0]->ldpc_enc.z_c - ops[0]->ldpc_enc.n_filler; + if (Kp <= ACC200_LIMIT_DL_MUX_BITS) + return 1; + return num; +} + +/** Enqueue encode operations for ACC200 device in CB mode. */ +static inline uint16_t +acc200_enqueue_ldpc_enc_cb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + int32_t avail = acc200_ring_avail_enq(q); + uint16_t i = 0; + union acc200_dma_desc *desc; + int ret, desc_idx = 0; + int16_t enq, left = num; + + while (left > 0) { + if (unlikely(avail < 1)) { + acc200_enqueue_ring_full(q_data); + break; + } + avail--; + enq = RTE_MIN(left, ACC200_MUX_5GDL_DESC); + enq = check_mux(&ops[i], enq); + if (enq > 1) { + ret = enqueue_ldpc_enc_n_op_cb(q, &ops[i], + desc_idx, enq); + if (ret < 0) { + acc200_enqueue_invalid(q_data); + break; + } + i += enq; + } else { + ret = enqueue_ldpc_enc_one_op_cb(q, ops[i], desc_idx); + if (ret < 0) { + acc200_enqueue_invalid(q_data); + break; + } + i++; + } + desc_idx++; + left = num - i; + } + + if (unlikely(i == 0)) + return 0; /* Nothing to enqueue */ + + /* Set SDone in last CB in enqueued ops for CB mode*/ + desc = q->ring_addr + ((q->sw_ring_head + desc_idx - 1) + & q->sw_ring_wrap_mask); + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + acc200_dma_enqueue(q, desc_idx, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + + return i; +} + +/* Enqueue LDPC encode operations for ACC200 device in TB mode. */ +static uint16_t +acc200_enqueue_ldpc_enc_tb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + int32_t avail = acc200_ring_avail_enq(q); + uint16_t i, enqueued_descs = 0; + uint8_t cbs_in_tb; + int descs_used; + + for (i = 0; i < num; ++i) { + cbs_in_tb = get_num_cbs_in_tb_ldpc_enc(&ops[i]->ldpc_enc); + /* Check if there are available space for further processing */ + if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) { + acc200_enqueue_ring_full(q_data); + break; + } + + descs_used = enqueue_ldpc_enc_one_op_tb(q, ops[i], + enqueued_descs, cbs_in_tb); + if (descs_used < 0) { + acc200_enqueue_invalid(q_data); + break; + } + enqueued_descs += descs_used; + avail -= descs_used; + } + if (unlikely(enqueued_descs == 0)) + return 0; /* Nothing to enqueue */ + + acc200_dma_enqueue(q, enqueued_descs, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + + return i; +} + +/* Check room in AQ for the enqueues batches into Qmgr */ +static int32_t +acc200_aq_avail(struct rte_bbdev_queue_data *q_data, uint16_t num_ops) +{ + struct acc200_queue *q = q_data->queue_private; + int32_t aq_avail = q->aq_depth - + ((q->aq_enqueued - q->aq_dequeued + + ACC200_MAX_QUEUE_DEPTH) % ACC200_MAX_QUEUE_DEPTH) + - (num_ops >> 7); + if (aq_avail <= 0) + acc200_enqueue_queue_full(q_data); + return aq_avail; +} + +/* Enqueue encode operations for ACC200 device. */ +static uint16_t +acc200_enqueue_ldpc_enc(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + int32_t aq_avail = acc200_ring_avail_enq(q); + if (unlikely((aq_avail <= 0) || (num == 0))) + return 0; + if (ops[0]->ldpc_enc.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + return acc200_enqueue_ldpc_enc_tb(q_data, ops, num); + else + return acc200_enqueue_ldpc_enc_cb(q_data, ops, num); +} + +/* Check we can mux encode operations with common FCW */ +static inline bool +cmp_ldpc_dec_op(struct rte_bbdev_dec_op **ops) { + /* Only mux compatible code blocks */ + if (memcmp((uint8_t *)(&ops[0]->ldpc_dec) + ACC200_DEC_OFFSET, + (uint8_t *)(&ops[1]->ldpc_dec) + + ACC200_DEC_OFFSET, ACC200_CMP_DEC_SIZE) != 0) { + return false; + } else + return true; +} + + +/* Enqueue decode operations for ACC200 device in TB mode */ +static uint16_t +acc200_enqueue_ldpc_dec_tb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + int32_t avail = acc200_ring_avail_enq(q); + uint16_t i, enqueued_cbs = 0; + uint8_t cbs_in_tb; + int ret; + + for (i = 0; i < num; ++i) { + cbs_in_tb = get_num_cbs_in_tb_ldpc_dec(&ops[i]->ldpc_dec); + /* Check if there are available space for further processing */ + if (unlikely((avail - cbs_in_tb < 0) || + (cbs_in_tb == 0))) + break; + avail -= cbs_in_tb; + + ret = enqueue_ldpc_dec_one_op_tb(q, ops[i], + enqueued_cbs, cbs_in_tb); + if (ret <= 0) + break; + enqueued_cbs += ret; + } + + acc200_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + return i; +} + +/* Enqueue decode operations for ACC200 device in CB mode */ +static uint16_t +acc200_enqueue_ldpc_dec_cb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + int32_t avail = acc200_ring_avail_enq(q); + uint16_t i; + union acc200_dma_desc *desc; + int ret; + bool same_op = false; + for (i = 0; i < num; ++i) { + /* Check if there are available space for further processing */ + if (unlikely(avail < 1)) { + acc200_enqueue_ring_full(q_data); + break; + } + avail -= 1; +#ifdef ACC200_DESC_OPTIMIZATION + if (i > 0) + same_op = cmp_ldpc_dec_op(&ops[i-1]); +#endif + rte_bbdev_log(INFO, "Op %d %d %d %d %d %d %d %d %d %d %d %d\n", + i, ops[i]->ldpc_dec.op_flags, ops[i]->ldpc_dec.rv_index, + ops[i]->ldpc_dec.iter_max, ops[i]->ldpc_dec.iter_count, + ops[i]->ldpc_dec.basegraph, ops[i]->ldpc_dec.z_c, + ops[i]->ldpc_dec.n_cb, ops[i]->ldpc_dec.q_m, + ops[i]->ldpc_dec.n_filler, ops[i]->ldpc_dec.cb_params.e, + same_op); + ret = enqueue_ldpc_dec_one_op_cb(q, ops[i], i, same_op); + if (ret < 0) { + acc200_enqueue_invalid(q_data); + break; + } + } + + if (unlikely(i == 0)) + return 0; /* Nothing to enqueue */ + + /* Set SDone in last CB in enqueued ops for CB mode*/ + desc = q->ring_addr + ((q->sw_ring_head + i - 1) + & q->sw_ring_wrap_mask); + + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + acc200_dma_enqueue(q, i, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + return i; +} + +/* Enqueue decode operations for ACC200 device. */ +static uint16_t +acc200_enqueue_ldpc_dec(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + int32_t aq_avail = acc200_aq_avail(q_data, num); + if (unlikely((aq_avail <= 0) || (num == 0))) + return 0; + if (ops[0]->ldpc_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + return acc200_enqueue_ldpc_dec_tb(q_data, ops, num); + else + return acc200_enqueue_ldpc_dec_cb(q_data, ops, num); +} + + +/* Dequeue one encode operations from ACC200 device in CB mode + */ +static inline int +dequeue_enc_one_op_cb(struct acc200_queue *q, struct rte_bbdev_enc_op **ref_op, + uint16_t *dequeued_ops, uint32_t *aq_dequeued, + uint16_t *dequeued_descs) +{ + union acc200_dma_desc *desc, atom_desc; + union acc200_dma_rsp_desc rsp; + struct rte_bbdev_enc_op *op; + int i; + int desc_idx = ((q->sw_ring_tail + *dequeued_descs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC200_FDONE)) + return -1; + + rsp.val = atom_desc.rsp.val; + rte_bbdev_log_debug("Resp. desc %p: %x", desc, rsp.val); + + /* Dequeue */ + op = desc->req.op_addr; + + /* Clearing status, it will be set based on response */ + op->status = 0; + + op->status |= ((rsp.input_err) + ? (1 << RTE_BBDEV_DATA_ERROR) : 0); + op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + desc->rsp.val = ACC200_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; /*Reserved bits */ + desc->rsp.add_info_1 = 0; /*Reserved bits */ + + ref_op[0] = op; + struct acc200_ptrs *context_ptrs = q->companion_ring_addr + desc_idx; + for (i = 1 ; i < desc->req.numCBs; i++) + ref_op[i] = context_ptrs->ptr[i].op_addr; + + /* One op was successfully dequeued */ + (*dequeued_descs)++; + *dequeued_ops += desc->req.numCBs; + return desc->req.numCBs; +} + +/* Dequeue one LDPC encode operations from ACC200 device in TB mode + * That operation may cover multiple descriptors + */ +static inline int +dequeue_enc_one_op_tb(struct acc200_queue *q, struct rte_bbdev_enc_op **ref_op, + uint16_t *dequeued_ops, uint32_t *aq_dequeued, + uint16_t *dequeued_descs) +{ + union acc200_dma_desc *desc, *last_desc, atom_desc; + union acc200_dma_rsp_desc rsp; + struct rte_bbdev_enc_op *op; + uint8_t i = 0; + uint16_t current_dequeued_descs = 0, descs_in_tb; + + desc = q->ring_addr + ((q->sw_ring_tail + *dequeued_descs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC200_FDONE)) + return -1; + + /* Get number of CBs in dequeued TB */ + descs_in_tb = desc->req.cbs_in_tb; + /* Get last CB */ + last_desc = q->ring_addr + ((q->sw_ring_tail + + *dequeued_descs + descs_in_tb - 1) + & q->sw_ring_wrap_mask); + /* Check if last CB in TB is ready to dequeue (and thus + * the whole TB) - checking sdone bit. If not return. + */ + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)last_desc, + __ATOMIC_RELAXED); + if (!(atom_desc.rsp.val & ACC200_SDONE)) + return -1; + + /* Dequeue */ + op = desc->req.op_addr; + + /* Clearing status, it will be set based on response */ + op->status = 0; + + while (i < descs_in_tb) { + desc = q->ring_addr + ((q->sw_ring_tail + + *dequeued_descs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + rsp.val = atom_desc.rsp.val; + rte_bbdev_log_debug("Resp. desc %p: %x", desc, + rsp.val); + + op->status |= ((rsp.input_err) + ? (1 << RTE_BBDEV_DATA_ERROR) : 0); + op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + desc->rsp.val = ACC200_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; + desc->rsp.add_info_1 = 0; + (*dequeued_descs)++; + current_dequeued_descs++; + i++; + } + + *ref_op = op; + (*dequeued_ops)++; + return current_dequeued_descs; +} + +/* Dequeue one decode operation from ACC200 device in CB mode */ +static inline int +dequeue_dec_one_op_cb(struct rte_bbdev_queue_data *q_data, + struct acc200_queue *q, struct rte_bbdev_dec_op **ref_op, + uint16_t dequeued_cbs, uint32_t *aq_dequeued) +{ + union acc200_dma_desc *desc, atom_desc; + union acc200_dma_rsp_desc rsp; + struct rte_bbdev_dec_op *op; + + desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC200_FDONE)) + return -1; + + rsp.val = atom_desc.rsp.val; + rte_bbdev_log_debug("Resp. desc %p: %x\n", desc, rsp.val); + + /* Dequeue */ + op = desc->req.op_addr; + + /* Clearing status, it will be set based on response */ + op->status = 0; + op->status |= ((rsp.input_err) + ? (1 << RTE_BBDEV_DATA_ERROR) : 0); + op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + if (op->status != 0) { + /* These errors are not expected */ + q_data->queue_stats.dequeue_err_count++; + } + + /* CRC invalid if error exists */ + if (!op->status) + op->status |= rsp.crc_status << RTE_BBDEV_CRC_ERROR; + op->turbo_dec.iter_count = (uint8_t) rsp.iter_cnt; + /* Check if this is the last desc in batch (Atomic Queue) */ + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + desc->rsp.val = ACC200_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; + desc->rsp.add_info_1 = 0; + *ref_op = op; + + /* One CB (op) was successfully dequeued */ + return 1; +} + +/* Dequeue one decode operations from ACC200 device in CB mode */ +static inline int +dequeue_ldpc_dec_one_op_cb(struct rte_bbdev_queue_data *q_data, + struct acc200_queue *q, struct rte_bbdev_dec_op **ref_op, + uint16_t dequeued_cbs, uint32_t *aq_dequeued) +{ + union acc200_dma_desc *desc, atom_desc; + union acc200_dma_rsp_desc rsp; + struct rte_bbdev_dec_op *op; + + desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC200_FDONE)) + return -1; + + rsp.val = atom_desc.rsp.val; + rte_bbdev_log_debug("Resp. desc %p: %x %x %x\n", desc, + rsp.val, desc->rsp.add_info_0, + desc->rsp.add_info_1); + + /* Dequeue */ + op = desc->req.op_addr; + + /* Clearing status, it will be set based on response */ + op->status = 0; + op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR; + op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR; + op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR; + if (op->status != 0) + q_data->queue_stats.dequeue_err_count++; + + op->status |= rsp.crc_status << RTE_BBDEV_CRC_ERROR; + if (op->ldpc_dec.hard_output.length > 0 && !rsp.synd_ok) + op->status |= 1 << RTE_BBDEV_SYNDROME_ERROR; + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK) || + check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK)) { + if (desc->rsp.add_info_1 != 0) + op->status |= 1 << RTE_BBDEV_CRC_ERROR; + } + + op->ldpc_dec.iter_count = (uint8_t) rsp.iter_cnt; + + /* Check if this is the last desc in batch (Atomic Queue) */ + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + + desc->rsp.val = ACC200_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; + desc->rsp.add_info_1 = 0; + + *ref_op = op; + + /* One CB (op) was successfully dequeued */ + return 1; +} + +/* Dequeue one decode operations from ACC200 device in TB mode. */ +static inline int +dequeue_dec_one_op_tb(struct acc200_queue *q, struct rte_bbdev_dec_op **ref_op, + uint16_t dequeued_cbs, uint32_t *aq_dequeued) +{ + union acc200_dma_desc *desc, *last_desc, atom_desc; + union acc200_dma_rsp_desc rsp; + struct rte_bbdev_dec_op *op; + uint8_t cbs_in_tb = 1, cb_idx = 0; + uint32_t tb_crc_check = 0; + + desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC200_FDONE)) + return -1; + + /* Dequeue */ + op = desc->req.op_addr; + + /* Get number of CBs in dequeued TB */ + cbs_in_tb = desc->req.cbs_in_tb; + /* Get last CB */ + last_desc = q->ring_addr + ((q->sw_ring_tail + + dequeued_cbs + cbs_in_tb - 1) + & q->sw_ring_wrap_mask); + /* Check if last CB in TB is ready to dequeue (and thus + * the whole TB) - checking sdone bit. If not return. + */ + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)last_desc, + __ATOMIC_RELAXED); + if (!(atom_desc.rsp.val & ACC200_SDONE)) + return -1; + + /* Clearing status, it will be set based on response */ + op->status = 0; + + /* Read remaining CBs if exists */ + while (cb_idx < cbs_in_tb) { + desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + rsp.val = atom_desc.rsp.val; + rte_bbdev_log_debug("Resp. desc %p: %x %x %x", desc, + rsp.val, desc->rsp.add_info_0, + desc->rsp.add_info_1); + + op->status |= ((rsp.input_err) + ? (1 << RTE_BBDEV_DATA_ERROR) : 0); + op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0); + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK)) + tb_crc_check ^= desc->rsp.add_info_1; + + /* CRC invalid if error exists */ + if (!op->status) + op->status |= rsp.crc_status << RTE_BBDEV_CRC_ERROR; + op->turbo_dec.iter_count = RTE_MAX((uint8_t) rsp.iter_cnt, + op->turbo_dec.iter_count); + + /* Check if this is the last desc in batch (Atomic Queue) */ + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + desc->rsp.val = ACC200_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; + desc->rsp.add_info_1 = 0; + dequeued_cbs++; + cb_idx++; + } + + if (check_bit(op->ldpc_dec.op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK)) { + rte_bbdev_log_debug("TB-CRC Check %x\n", tb_crc_check); + if (tb_crc_check > 0) + op->status |= 1 << RTE_BBDEV_CRC_ERROR; + } + + *ref_op = op; + + return cb_idx; +} + +/* Dequeue LDPC encode operations from ACC200 device. */ +static uint16_t +acc200_dequeue_ldpc_enc(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + uint32_t avail = acc200_ring_avail_deq(q); + uint32_t aq_dequeued = 0; + uint16_t i, dequeued_ops = 0, dequeued_descs = 0; + int ret; + struct rte_bbdev_enc_op *op; + if (avail == 0) + return 0; + op = (q->ring_addr + (q->sw_ring_tail & + q->sw_ring_wrap_mask))->req.op_addr; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(ops == NULL || q == NULL || op == NULL)) + return 0; +#endif + int cbm = op->ldpc_enc.code_block_mode; + + for (i = 0; i < avail; i++) { + if (cbm == RTE_BBDEV_TRANSPORT_BLOCK) + ret = dequeue_enc_one_op_tb(q, &ops[dequeued_ops], + &dequeued_ops, &aq_dequeued, + &dequeued_descs); + else + ret = dequeue_enc_one_op_cb(q, &ops[dequeued_ops], + &dequeued_ops, &aq_dequeued, + &dequeued_descs); + if (ret < 0) + break; + if (dequeued_ops >= num) + break; + } + + q->aq_dequeued += aq_dequeued; + q->sw_ring_tail += dequeued_descs; + + /* Update enqueue stats */ + q_data->queue_stats.dequeued_count += dequeued_ops; + + return dequeued_ops; +} + +/* Dequeue decode operations from ACC200 device. */ +static uint16_t +acc200_dequeue_ldpc_dec(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + uint16_t dequeue_num; + uint32_t avail = acc200_ring_avail_deq(q); + uint32_t aq_dequeued = 0; + uint16_t i; + uint16_t dequeued_cbs = 0; + struct rte_bbdev_dec_op *op; + int ret; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(ops == 0 && q == NULL)) + return 0; +#endif + + dequeue_num = RTE_MIN(avail, num); + + for (i = 0; i < dequeue_num; ++i) { + op = (q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask))->req.op_addr; + if (op->ldpc_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + ret = dequeue_dec_one_op_tb(q, &ops[i], dequeued_cbs, + &aq_dequeued); + else + ret = dequeue_ldpc_dec_one_op_cb( + q_data, q, &ops[i], dequeued_cbs, + &aq_dequeued); + + if (ret <= 0) + break; + dequeued_cbs += ret; + } + + q->aq_dequeued += aq_dequeued; + q->sw_ring_tail += dequeued_cbs; + + /* Update enqueue stats */ + q_data->queue_stats.dequeued_count += i; + + return i; +} + /* Initialization Function */ static void acc200_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv) @@ -827,6 +2931,10 @@ struct rte_pci_device *pci_dev = RTE_DEV_TO_PCI(dev->device); dev->dev_ops = &acc200_bbdev_ops; + dev->enqueue_ldpc_enc_ops = acc200_enqueue_ldpc_enc; + dev->enqueue_ldpc_dec_ops = acc200_enqueue_ldpc_dec; + dev->dequeue_ldpc_enc_ops = acc200_dequeue_ldpc_enc; + dev->dequeue_ldpc_dec_ops = acc200_dequeue_ldpc_dec; ((struct acc200_device *) dev->data->dev_private)->pf_device = !strcmp(drv->driver.name, -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v1 06/10] baseband/acc200: add LTE processing functions 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru ` (4 preceding siblings ...) 2022-07-08 0:01 ` [PATCH v1 05/10] baseband/acc200: add LDPC processing functions Nicolas Chautru @ 2022-07-08 0:01 ` Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 07/10] baseband/acc200: add support for FFT operations Nicolas Chautru ` (4 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nicolas Chautru @ 2022-07-08 0:01 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal, trix Cc: maxime.coquelin, mdr, bruce.richardson, david.marchand, stephen, Nicolas Chautru Add functions and capability for 4G FEC Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/rte_acc200_pmd.c | 1244 +++++++++++++++++++++++++++++- 1 file changed, 1235 insertions(+), 9 deletions(-) diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 42cf2c8..003a2a3 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -784,6 +784,46 @@ int i; static const struct rte_bbdev_op_cap bbdev_capabilities[] = { { + .type = RTE_BBDEV_OP_TURBO_DEC, + .cap.turbo_dec = { + .capability_flags = + RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE | + RTE_BBDEV_TURBO_CRC_TYPE_24B | + RTE_BBDEV_TURBO_EQUALIZER | + RTE_BBDEV_TURBO_SOFT_OUT_SATURATE | + RTE_BBDEV_TURBO_HALF_ITERATION_EVEN | + RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH | + RTE_BBDEV_TURBO_SOFT_OUTPUT | + RTE_BBDEV_TURBO_EARLY_TERMINATION | + RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN | + RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT | + RTE_BBDEV_TURBO_MAP_DEC | + RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP | + RTE_BBDEV_TURBO_DEC_SCATTER_GATHER, + .max_llr_modulus = INT8_MAX, + .num_buffers_src = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_hard_out = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_soft_out = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + } + }, + { + .type = RTE_BBDEV_OP_TURBO_ENC, + .cap.turbo_enc = { + .capability_flags = + RTE_BBDEV_TURBO_CRC_24B_ATTACH | + RTE_BBDEV_TURBO_RV_INDEX_BYPASS | + RTE_BBDEV_TURBO_RATE_MATCH | + RTE_BBDEV_TURBO_ENC_SCATTER_GATHER, + .num_buffers_src = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_dst = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + } + }, + { .type = RTE_BBDEV_OP_LDPC_ENC, .cap.ldpc_enc = { .capability_flags = @@ -834,15 +874,17 @@ /* Exposed number of queues */ dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; - dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = 0; - dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = 0; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_DEC] = d->acc200_conf.q_ul_4g.num_aqs_per_groups * + d->acc200_conf.q_ul_4g.num_qgroups; + dev_info->num_queues[RTE_BBDEV_OP_TURBO_ENC] = d->acc200_conf.q_dl_4g.num_aqs_per_groups * + d->acc200_conf.q_dl_4g.num_qgroups; dev_info->num_queues[RTE_BBDEV_OP_LDPC_DEC] = d->acc200_conf.q_ul_5g.num_aqs_per_groups * d->acc200_conf.q_ul_5g.num_qgroups; dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = d->acc200_conf.q_dl_5g.num_aqs_per_groups * d->acc200_conf.q_dl_5g.num_qgroups; dev_info->num_queues[RTE_BBDEV_OP_FFT] = 0; - dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = 0; - dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = d->acc200_conf.q_ul_4g.num_qgroups; + dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = d->acc200_conf.q_dl_4g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = d->acc200_conf.q_ul_5g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = d->acc200_conf.q_dl_5g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_FFT] = 0; @@ -906,6 +948,58 @@ return tail; } +/* Fill in a frame control word for turbo encoding. */ +static inline void +acc200_fcw_te_fill(const struct rte_bbdev_enc_op *op, struct acc200_fcw_te *fcw) +{ + fcw->code_block_mode = op->turbo_enc.code_block_mode; + if (fcw->code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { + fcw->k_neg = op->turbo_enc.tb_params.k_neg; + fcw->k_pos = op->turbo_enc.tb_params.k_pos; + fcw->c_neg = op->turbo_enc.tb_params.c_neg; + fcw->c = op->turbo_enc.tb_params.c; + fcw->ncb_neg = op->turbo_enc.tb_params.ncb_neg; + fcw->ncb_pos = op->turbo_enc.tb_params.ncb_pos; + + if (check_bit(op->turbo_enc.op_flags, + RTE_BBDEV_TURBO_RATE_MATCH)) { + fcw->bypass_rm = 0; + fcw->cab = op->turbo_enc.tb_params.cab; + fcw->ea = op->turbo_enc.tb_params.ea; + fcw->eb = op->turbo_enc.tb_params.eb; + } else { + /* E is set to the encoding output size when RM is + * bypassed. + */ + fcw->bypass_rm = 1; + fcw->cab = fcw->c_neg; + fcw->ea = 3 * fcw->k_neg + 12; + fcw->eb = 3 * fcw->k_pos + 12; + } + } else { /* For CB mode */ + fcw->k_pos = op->turbo_enc.cb_params.k; + fcw->ncb_pos = op->turbo_enc.cb_params.ncb; + + if (check_bit(op->turbo_enc.op_flags, + RTE_BBDEV_TURBO_RATE_MATCH)) { + fcw->bypass_rm = 0; + fcw->eb = op->turbo_enc.cb_params.e; + } else { + /* E is set to the encoding output size when RM is + * bypassed. + */ + fcw->bypass_rm = 1; + fcw->eb = 3 * fcw->k_pos + 12; + } + } + + fcw->bypass_rv_idx1 = check_bit(op->turbo_enc.op_flags, + RTE_BBDEV_TURBO_RV_INDEX_BYPASS); + fcw->code_block_crc = check_bit(op->turbo_enc.op_flags, + RTE_BBDEV_TURBO_CRC_24B_ATTACH); + fcw->rv_idx1 = op->turbo_enc.rv_index; +} + /* Compute value of k0. * Based on 3GPP 38.212 Table 5.4.2.1-2 * Starting position of different redundancy versions, k0 @@ -958,6 +1052,70 @@ fcw->mcb_count = num_cb; } +/* Fill in a frame control word for turbo decoding. */ +static inline void +acc200_fcw_td_fill(const struct rte_bbdev_dec_op *op, struct acc200_fcw_td *fcw) +{ + fcw->fcw_ver = 1; + fcw->num_maps = ACC200_FCW_TD_AUTOMAP; + fcw->bypass_sb_deint = !check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE); + if (op->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { + /* FIXME for TB block */ + fcw->k_pos = op->turbo_dec.tb_params.k_pos; + fcw->k_neg = op->turbo_dec.tb_params.k_neg; + } else { + fcw->k_pos = op->turbo_dec.cb_params.k; + fcw->k_neg = op->turbo_dec.cb_params.k; + } + fcw->c = 1; + fcw->c_neg = 1; + if (check_bit(op->turbo_dec.op_flags, RTE_BBDEV_TURBO_SOFT_OUTPUT)) { + fcw->soft_output_en = 1; + fcw->sw_soft_out_dis = 0; + fcw->sw_et_cont = check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH); + fcw->sw_soft_out_saturation = check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_SOFT_OUT_SATURATE); + if (check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_EQUALIZER)) { + fcw->bypass_teq = 0; + fcw->ea = op->turbo_dec.cb_params.e; + fcw->eb = op->turbo_dec.cb_params.e; + if (op->turbo_dec.rv_index == 0) + fcw->k0_start_col = ACC200_FCW_TD_RVIDX_0; + else if (op->turbo_dec.rv_index == 1) + fcw->k0_start_col = ACC200_FCW_TD_RVIDX_1; + else if (op->turbo_dec.rv_index == 2) + fcw->k0_start_col = ACC200_FCW_TD_RVIDX_2; + else + fcw->k0_start_col = ACC200_FCW_TD_RVIDX_3; + } else { + fcw->bypass_teq = 1; + fcw->eb = 64; /* avoid undefined value */ + } + } else { + fcw->soft_output_en = 0; + fcw->sw_soft_out_dis = 1; + fcw->bypass_teq = 0; + } + + fcw->code_block_mode = 1; /* FIXME */ + fcw->turbo_crc_type = check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_CRC_TYPE_24B); + + fcw->ext_td_cold_reg_en = 1; + fcw->raw_decoder_input_on = 0; + fcw->max_iter = RTE_MAX((uint8_t) op->turbo_dec.iter_max, 2); + fcw->min_iter = 2; + fcw->half_iter_on = !check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_HALF_ITERATION_EVEN); + + fcw->early_stop_en = check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_EARLY_TERMINATION) & !fcw->soft_output_en; + fcw->ext_scale = 0xF; +} + /* Convert offset to harq index for harq_layout structure */ static inline uint32_t hq_index(uint32_t offset) { @@ -1240,6 +1398,89 @@ static inline uint32_t hq_index(uint32_t offset) #endif static inline int +acc200_dma_desc_te_fill(struct rte_bbdev_enc_op *op, + struct acc200_dma_req_desc *desc, struct rte_mbuf **input, + struct rte_mbuf *output, uint32_t *in_offset, + uint32_t *out_offset, uint32_t *out_length, + uint32_t *mbuf_total_left, uint32_t *seg_total_left, uint8_t r) +{ + int next_triplet = 1; /* FCW already done */ + uint32_t e, ea, eb, length; + uint16_t k, k_neg, k_pos; + uint8_t cab, c_neg; + + desc->word0 = ACC200_DMA_DESC_TYPE; + desc->word1 = 0; /**< Timestamp could be disabled */ + desc->word2 = 0; + desc->word3 = 0; + desc->numCBs = 1; + + if (op->turbo_enc.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { + ea = op->turbo_enc.tb_params.ea; + eb = op->turbo_enc.tb_params.eb; + cab = op->turbo_enc.tb_params.cab; + k_neg = op->turbo_enc.tb_params.k_neg; + k_pos = op->turbo_enc.tb_params.k_pos; + c_neg = op->turbo_enc.tb_params.c_neg; + e = (r < cab) ? ea : eb; + k = (r < c_neg) ? k_neg : k_pos; + } else { + e = op->turbo_enc.cb_params.e; + k = op->turbo_enc.cb_params.k; + } + + if (check_bit(op->turbo_enc.op_flags, RTE_BBDEV_TURBO_CRC_24B_ATTACH)) + length = (k - 24) >> 3; + else + length = k >> 3; + + if (unlikely((*mbuf_total_left == 0) || (*mbuf_total_left < length))) { + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u", + *mbuf_total_left, length); + return -1; + } + + next_triplet = acc200_dma_fill_blk_type_in(desc, input, in_offset, + length, seg_total_left, next_triplet, + check_bit(op->turbo_enc.op_flags, + RTE_BBDEV_TURBO_ENC_SCATTER_GATHER)); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + desc->data_ptrs[next_triplet - 1].last = 1; + desc->m2dlen = next_triplet; + *mbuf_total_left -= length; + + /* Set output length */ + if (check_bit(op->turbo_enc.op_flags, RTE_BBDEV_TURBO_RATE_MATCH)) + /* Integer round up division by 8 */ + *out_length = (e + 7) >> 3; + else + *out_length = (k >> 3) * 3 + 2; + + next_triplet = acc200_dma_fill_blk_type(desc, output, *out_offset, + *out_length, next_triplet, ACC200_DMA_BLKID_OUT_ENC); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + op->turbo_enc.output.length += *out_length; + *out_offset += *out_length; + desc->data_ptrs[next_triplet - 1].last = 1; + desc->d2mlen = next_triplet - desc->m2dlen; + + desc->op_addr = op; + + return 0; +} + +static inline int acc200_dma_desc_le_fill(struct rte_bbdev_enc_op *op, struct acc200_dma_req_desc *desc, struct rte_mbuf **input, struct rte_mbuf *output, uint32_t *in_offset, @@ -1299,6 +1540,122 @@ static inline uint32_t hq_index(uint32_t offset) } static inline int +acc200_dma_desc_td_fill(struct rte_bbdev_dec_op *op, + struct acc200_dma_req_desc *desc, struct rte_mbuf **input, + struct rte_mbuf *h_output, struct rte_mbuf *s_output, + uint32_t *in_offset, uint32_t *h_out_offset, + uint32_t *s_out_offset, uint32_t *h_out_length, + uint32_t *s_out_length, uint32_t *mbuf_total_left, + uint32_t *seg_total_left, uint8_t r) +{ + int next_triplet = 1; /* FCW already done */ + uint16_t k; + uint16_t crc24_overlap = 0; + uint32_t e, kw; + + desc->word0 = ACC200_DMA_DESC_TYPE; + desc->word1 = 0; /**< Timestamp could be disabled */ + desc->word2 = 0; + desc->word3 = 0; + desc->numCBs = 1; + + if (op->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { + k = (r < op->turbo_dec.tb_params.c_neg) + ? op->turbo_dec.tb_params.k_neg + : op->turbo_dec.tb_params.k_pos; + e = (r < op->turbo_dec.tb_params.cab) + ? op->turbo_dec.tb_params.ea + : op->turbo_dec.tb_params.eb; + } else { + k = op->turbo_dec.cb_params.k; + e = op->turbo_dec.cb_params.e; + } + + if ((op->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + && !check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP)) + crc24_overlap = 24; + + /* Calculates circular buffer size. + * According to 3gpp 36.212 section 5.1.4.2 + * Kw = 3 * Kpi, + * where: + * Kpi = nCol * nRow + * where nCol is 32 and nRow can be calculated from: + * D =< nCol * nRow + * where D is the size of each output from turbo encoder block (k + 4). + */ + kw = RTE_ALIGN_CEIL(k + 4, 32) * 3; + + if (unlikely((*mbuf_total_left == 0) || (*mbuf_total_left < kw))) { + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u", + *mbuf_total_left, kw); + return -1; + } + + next_triplet = acc200_dma_fill_blk_type_in(desc, input, in_offset, kw, + seg_total_left, next_triplet, + check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_DEC_SCATTER_GATHER)); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + desc->data_ptrs[next_triplet - 1].last = 1; + desc->m2dlen = next_triplet; + *mbuf_total_left -= kw; + *h_out_length = ((k - crc24_overlap) >> 3); + next_triplet = acc200_dma_fill_blk_type( + desc, h_output, *h_out_offset, + *h_out_length, next_triplet, ACC200_DMA_BLKID_OUT_HARD); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + + op->turbo_dec.hard_output.length += *h_out_length; + *h_out_offset += *h_out_length; + + /* Soft output */ + if (check_bit(op->turbo_dec.op_flags, RTE_BBDEV_TURBO_SOFT_OUTPUT)) { + if (op->turbo_dec.soft_output.data == 0) { + rte_bbdev_log(ERR, "Soft output is not defined"); + return -1; + } + if (check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_EQUALIZER)) + *s_out_length = e; + else + *s_out_length = (k * 3) + 12; + + next_triplet = acc200_dma_fill_blk_type(desc, s_output, + *s_out_offset, *s_out_length, next_triplet, + ACC200_DMA_BLKID_OUT_SOFT); + if (unlikely(next_triplet < 0)) { + rte_bbdev_log(ERR, + "Mismatch between data to process and mbuf data length in bbdev_op: %p", + op); + return -1; + } + + op->turbo_dec.soft_output.length += *s_out_length; + *s_out_offset += *s_out_length; + } + + desc->data_ptrs[next_triplet - 1].last = 1; + desc->d2mlen = next_triplet - desc->m2dlen; + + desc->op_addr = op; + + return 0; +} + +static inline int acc200_dma_desc_ld_fill(struct rte_bbdev_dec_op *op, struct acc200_dma_req_desc *desc, struct rte_mbuf **input, struct rte_mbuf *h_output, @@ -1545,6 +1902,144 @@ static inline uint32_t hq_index(uint32_t offset) } #ifdef RTE_LIBRTE_BBDEV_DEBUG +/* Validates turbo encoder parameters */ +static inline int +validate_enc_op(struct rte_bbdev_enc_op *op) +{ + struct rte_bbdev_op_turbo_enc *turbo_enc = &op->turbo_enc; + struct rte_bbdev_op_enc_turbo_cb_params *cb = NULL; + struct rte_bbdev_op_enc_turbo_tb_params *tb = NULL; + uint16_t kw, kw_neg, kw_pos; + + if (op->mempool == NULL) { + rte_bbdev_log(ERR, "Invalid mempool pointer"); + return -1; + } + if (turbo_enc->input.data == NULL) { + rte_bbdev_log(ERR, "Invalid input pointer"); + return -1; + } + if (turbo_enc->output.data == NULL) { + rte_bbdev_log(ERR, "Invalid output pointer"); + return -1; + } + if (turbo_enc->rv_index > 3) { + rte_bbdev_log(ERR, + "rv_index (%u) is out of range 0 <= value <= 3", + turbo_enc->rv_index); + return -1; + } + if (turbo_enc->code_block_mode != RTE_BBDEV_TRANSPORT_BLOCK && + turbo_enc->code_block_mode != RTE_BBDEV_CODE_BLOCK) { + rte_bbdev_log(ERR, + "code_block_mode (%u) is out of range 0 <= value <= 1", + turbo_enc->code_block_mode); + return -1; + } + + if (turbo_enc->code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { + tb = &turbo_enc->tb_params; + if ((tb->k_neg < RTE_BBDEV_TURBO_MIN_CB_SIZE + || tb->k_neg > RTE_BBDEV_TURBO_MAX_CB_SIZE) + && tb->c_neg > 0) { + rte_bbdev_log(ERR, + "k_neg (%u) is out of range %u <= value <= %u", + tb->k_neg, RTE_BBDEV_TURBO_MIN_CB_SIZE, + RTE_BBDEV_TURBO_MAX_CB_SIZE); + return -1; + } + if (tb->k_pos < RTE_BBDEV_TURBO_MIN_CB_SIZE + || tb->k_pos > RTE_BBDEV_TURBO_MAX_CB_SIZE) { + rte_bbdev_log(ERR, + "k_pos (%u) is out of range %u <= value <= %u", + tb->k_pos, RTE_BBDEV_TURBO_MIN_CB_SIZE, + RTE_BBDEV_TURBO_MAX_CB_SIZE); + return -1; + } + if (tb->c_neg > (RTE_BBDEV_TURBO_MAX_CODE_BLOCKS - 1)) + rte_bbdev_log(ERR, + "c_neg (%u) is out of range 0 <= value <= %u", + tb->c_neg, + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS - 1); + if (tb->c < 1 || tb->c > RTE_BBDEV_TURBO_MAX_CODE_BLOCKS) { + rte_bbdev_log(ERR, + "c (%u) is out of range 1 <= value <= %u", + tb->c, RTE_BBDEV_TURBO_MAX_CODE_BLOCKS); + return -1; + } + if (tb->cab > tb->c) { + rte_bbdev_log(ERR, + "cab (%u) is greater than c (%u)", + tb->cab, tb->c); + return -1; + } + if ((tb->ea < RTE_BBDEV_TURBO_MIN_CB_SIZE || (tb->ea % 2)) + && tb->r < tb->cab) { + rte_bbdev_log(ERR, + "ea (%u) is less than %u or it is not even", + tb->ea, RTE_BBDEV_TURBO_MIN_CB_SIZE); + return -1; + } + if ((tb->eb < RTE_BBDEV_TURBO_MIN_CB_SIZE || (tb->eb % 2)) + && tb->c > tb->cab) { + rte_bbdev_log(ERR, + "eb (%u) is less than %u or it is not even", + tb->eb, RTE_BBDEV_TURBO_MIN_CB_SIZE); + return -1; + } + + kw_neg = 3 * RTE_ALIGN_CEIL(tb->k_neg + 4, + RTE_BBDEV_TURBO_C_SUBBLOCK); + if (tb->ncb_neg < tb->k_neg || tb->ncb_neg > kw_neg) { + rte_bbdev_log(ERR, + "ncb_neg (%u) is out of range (%u) k_neg <= value <= (%u) kw_neg", + tb->ncb_neg, tb->k_neg, kw_neg); + return -1; + } + + kw_pos = 3 * RTE_ALIGN_CEIL(tb->k_pos + 4, + RTE_BBDEV_TURBO_C_SUBBLOCK); + if (tb->ncb_pos < tb->k_pos || tb->ncb_pos > kw_pos) { + rte_bbdev_log(ERR, + "ncb_pos (%u) is out of range (%u) k_pos <= value <= (%u) kw_pos", + tb->ncb_pos, tb->k_pos, kw_pos); + return -1; + } + if (tb->r > (tb->c - 1)) { + rte_bbdev_log(ERR, + "r (%u) is greater than c - 1 (%u)", + tb->r, tb->c - 1); + return -1; + } + } else { + cb = &turbo_enc->cb_params; + if (cb->k < RTE_BBDEV_TURBO_MIN_CB_SIZE + || cb->k > RTE_BBDEV_TURBO_MAX_CB_SIZE) { + rte_bbdev_log(ERR, + "k (%u) is out of range %u <= value <= %u", + cb->k, RTE_BBDEV_TURBO_MIN_CB_SIZE, + RTE_BBDEV_TURBO_MAX_CB_SIZE); + return -1; + } + + if (cb->e < RTE_BBDEV_TURBO_MIN_CB_SIZE || (cb->e % 2)) { + rte_bbdev_log(ERR, + "e (%u) is less than %u or it is not even", + cb->e, RTE_BBDEV_TURBO_MIN_CB_SIZE); + return -1; + } + + kw = RTE_ALIGN_CEIL(cb->k + 4, RTE_BBDEV_TURBO_C_SUBBLOCK) * 3; + if (cb->ncb < cb->k || cb->ncb > kw) { + rte_bbdev_log(ERR, + "ncb (%u) is out of range (%u) k <= value <= (%u) kw", + cb->ncb, cb->k, kw); + return -1; + } + } + + return 0; +} /* Validates LDPC encoder parameters */ static inline int @@ -1631,6 +2126,59 @@ static inline uint32_t hq_index(uint32_t offset) #endif +/* Enqueue one encode operations for ACC200 device in CB mode */ +static inline int +enqueue_enc_one_op_cb(struct acc200_queue *q, struct rte_bbdev_enc_op *op, + uint16_t total_enqueued_cbs) +{ + union acc200_dma_desc *desc = NULL; + int ret; + uint32_t in_offset, out_offset, out_length, mbuf_total_left, + seg_total_left; + struct rte_mbuf *input, *output_head, *output; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + /* Validate op structure */ + if (validate_enc_op(op) == -1) { + rte_bbdev_log(ERR, "Turbo encoder validation failed"); + return -EINVAL; + } +#endif + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + acc200_fcw_te_fill(op, &desc->req.fcw_te); + + input = op->turbo_enc.input.data; + output_head = output = op->turbo_enc.output.data; + in_offset = op->turbo_enc.input.offset; + out_offset = op->turbo_enc.output.offset; + out_length = 0; + mbuf_total_left = op->turbo_enc.input.length; + seg_total_left = rte_pktmbuf_data_len(op->turbo_enc.input.data) + - in_offset; + + ret = acc200_dma_desc_te_fill(op, &desc->req, &input, output, + &in_offset, &out_offset, &out_length, &mbuf_total_left, + &seg_total_left, 0); + + if (unlikely(ret < 0)) + return ret; + + mbuf_append(output_head, output, out_length); + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_te, + sizeof(desc->req.fcw_te) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); + if (check_mbuf_total_left(mbuf_total_left) != 0) + return -EINVAL; +#endif + /* One CB (one op) was successfully prepared to enqueue */ + return 1; +} + /* Enqueue one encode operations for ACC200 device in CB mode * multiplexed on the same descriptor */ @@ -1807,12 +2355,98 @@ static inline uint32_t hq_index(uint32_t offset) return 1; } -/* Enqueue one encode operations for ACC200 device in TB mode. - * returns the number of descs used - */ + +/* Enqueue one encode operations for ACC200 device in TB mode. */ static inline int -enqueue_ldpc_enc_one_op_tb(struct acc200_queue *q, struct rte_bbdev_enc_op *op, - uint16_t enq_descs, uint8_t cbs_in_tb) +enqueue_enc_one_op_tb(struct acc200_queue *q, struct rte_bbdev_enc_op *op, + uint16_t total_enqueued_cbs, uint8_t cbs_in_tb) +{ + union acc200_dma_desc *desc = NULL; + int ret; + uint8_t r, c; + uint32_t in_offset, out_offset, out_length, mbuf_total_left, + seg_total_left; + struct rte_mbuf *input, *output_head, *output; + uint16_t current_enqueued_cbs = 0; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + /* Validate op structure */ + if (validate_enc_op(op) == -1) { + rte_bbdev_log(ERR, "Turbo encoder validation failed"); + return -EINVAL; + } +#endif + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + uint64_t fcw_offset = (desc_idx << 8) + ACC200_DESC_FCW_OFFSET; + acc200_fcw_te_fill(op, &desc->req.fcw_te); + + input = op->turbo_enc.input.data; + output_head = output = op->turbo_enc.output.data; + in_offset = op->turbo_enc.input.offset; + out_offset = op->turbo_enc.output.offset; + out_length = 0; + mbuf_total_left = op->turbo_enc.input.length; + + c = op->turbo_enc.tb_params.c; + r = op->turbo_enc.tb_params.r; + + while (mbuf_total_left > 0 && r < c) { + seg_total_left = rte_pktmbuf_data_len(input) - in_offset; + /* Set up DMA descriptor */ + desc = q->ring_addr + ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc->req.data_ptrs[0].address = q->ring_addr_iova + fcw_offset; + desc->req.data_ptrs[0].blen = ACC200_FCW_TE_BLEN; + + ret = acc200_dma_desc_te_fill(op, &desc->req, &input, output, + &in_offset, &out_offset, &out_length, + &mbuf_total_left, &seg_total_left, r); + if (unlikely(ret < 0)) + return ret; + mbuf_append(output_head, output, out_length); + + /* Set total number of CBs in TB */ + desc->req.cbs_in_tb = cbs_in_tb; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_te, + sizeof(desc->req.fcw_te) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + + if (seg_total_left == 0) { + /* Go to the next mbuf */ + input = input->next; + in_offset = 0; + output = output->next; + out_offset = 0; + } + + total_enqueued_cbs++; + current_enqueued_cbs++; + r++; + } + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (check_mbuf_total_left(mbuf_total_left) != 0) + return -EINVAL; +#endif + + /* Set SDone on last CB descriptor for TB mode. */ + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + return current_enqueued_cbs; +} + +/* Enqueue one encode operations for ACC200 device in TB mode. + * returns the number of descs used + */ +static inline int +enqueue_ldpc_enc_one_op_tb(struct acc200_queue *q, struct rte_bbdev_enc_op *op, + uint16_t enq_descs, uint8_t cbs_in_tb) { uint8_t num_a, num_b; uint16_t desc_idx; @@ -1871,6 +2505,213 @@ static inline uint32_t hq_index(uint32_t offset) return return_descs; } +#ifdef RTE_LIBRTE_BBDEV_DEBUG +/* Validates turbo decoder parameters */ +static inline int +validate_dec_op(struct rte_bbdev_dec_op *op) +{ + struct rte_bbdev_op_turbo_dec *turbo_dec = &op->turbo_dec; + struct rte_bbdev_op_dec_turbo_cb_params *cb = NULL; + struct rte_bbdev_op_dec_turbo_tb_params *tb = NULL; + + if (op->mempool == NULL) { + rte_bbdev_log(ERR, "Invalid mempool pointer"); + return -1; + } + if (turbo_dec->input.data == NULL) { + rte_bbdev_log(ERR, "Invalid input pointer"); + return -1; + } + if (turbo_dec->hard_output.data == NULL) { + rte_bbdev_log(ERR, "Invalid hard_output pointer"); + return -1; + } + if (check_bit(turbo_dec->op_flags, RTE_BBDEV_TURBO_SOFT_OUTPUT) && + turbo_dec->soft_output.data == NULL) { + rte_bbdev_log(ERR, "Invalid soft_output pointer"); + return -1; + } + if (turbo_dec->rv_index > 3) { + rte_bbdev_log(ERR, + "rv_index (%u) is out of range 0 <= value <= 3", + turbo_dec->rv_index); + return -1; + } + if (turbo_dec->iter_min < 1) { + rte_bbdev_log(ERR, + "iter_min (%u) is less than 1", + turbo_dec->iter_min); + return -1; + } + if (turbo_dec->iter_max <= 2) { + rte_bbdev_log(ERR, + "iter_max (%u) is less than or equal to 2", + turbo_dec->iter_max); + return -1; + } + if (turbo_dec->iter_min > turbo_dec->iter_max) { + rte_bbdev_log(ERR, + "iter_min (%u) is greater than iter_max (%u)", + turbo_dec->iter_min, turbo_dec->iter_max); + return -1; + } + if (turbo_dec->code_block_mode != RTE_BBDEV_TRANSPORT_BLOCK && + turbo_dec->code_block_mode != RTE_BBDEV_CODE_BLOCK) { + rte_bbdev_log(ERR, + "code_block_mode (%u) is out of range 0 <= value <= 1", + turbo_dec->code_block_mode); + return -1; + } + + if (turbo_dec->code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) { + tb = &turbo_dec->tb_params; + if ((tb->k_neg < RTE_BBDEV_TURBO_MIN_CB_SIZE + || tb->k_neg > RTE_BBDEV_TURBO_MAX_CB_SIZE) + && tb->c_neg > 0) { + rte_bbdev_log(ERR, + "k_neg (%u) is out of range %u <= value <= %u", + tb->k_neg, RTE_BBDEV_TURBO_MIN_CB_SIZE, + RTE_BBDEV_TURBO_MAX_CB_SIZE); + return -1; + } + if ((tb->k_pos < RTE_BBDEV_TURBO_MIN_CB_SIZE + || tb->k_pos > RTE_BBDEV_TURBO_MAX_CB_SIZE) + && tb->c > tb->c_neg) { + rte_bbdev_log(ERR, + "k_pos (%u) is out of range %u <= value <= %u", + tb->k_pos, RTE_BBDEV_TURBO_MIN_CB_SIZE, + RTE_BBDEV_TURBO_MAX_CB_SIZE); + return -1; + } + if (tb->c_neg > (RTE_BBDEV_TURBO_MAX_CODE_BLOCKS - 1)) + rte_bbdev_log(ERR, + "c_neg (%u) is out of range 0 <= value <= %u", + tb->c_neg, + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS - 1); + if (tb->c < 1 || tb->c > RTE_BBDEV_TURBO_MAX_CODE_BLOCKS) { + rte_bbdev_log(ERR, + "c (%u) is out of range 1 <= value <= %u", + tb->c, RTE_BBDEV_TURBO_MAX_CODE_BLOCKS); + return -1; + } + if (tb->cab > tb->c) { + rte_bbdev_log(ERR, + "cab (%u) is greater than c (%u)", + tb->cab, tb->c); + return -1; + } + if (check_bit(turbo_dec->op_flags, RTE_BBDEV_TURBO_EQUALIZER) && + (tb->ea < RTE_BBDEV_TURBO_MIN_CB_SIZE + || (tb->ea % 2)) + && tb->cab > 0) { + rte_bbdev_log(ERR, + "ea (%u) is less than %u or it is not even", + tb->ea, RTE_BBDEV_TURBO_MIN_CB_SIZE); + return -1; + } + if (check_bit(turbo_dec->op_flags, RTE_BBDEV_TURBO_EQUALIZER) && + (tb->eb < RTE_BBDEV_TURBO_MIN_CB_SIZE + || (tb->eb % 2)) + && tb->c > tb->cab) { + rte_bbdev_log(ERR, + "eb (%u) is less than %u or it is not even", + tb->eb, RTE_BBDEV_TURBO_MIN_CB_SIZE); + } + } else { + cb = &turbo_dec->cb_params; + if (cb->k < RTE_BBDEV_TURBO_MIN_CB_SIZE + || cb->k > RTE_BBDEV_TURBO_MAX_CB_SIZE) { + rte_bbdev_log(ERR, + "k (%u) is out of range %u <= value <= %u", + cb->k, RTE_BBDEV_TURBO_MIN_CB_SIZE, + RTE_BBDEV_TURBO_MAX_CB_SIZE); + return -1; + } + if (check_bit(turbo_dec->op_flags, RTE_BBDEV_TURBO_EQUALIZER) && + (cb->e < RTE_BBDEV_TURBO_MIN_CB_SIZE || + (cb->e % 2))) { + rte_bbdev_log(ERR, + "e (%u) is less than %u or it is not even", + cb->e, RTE_BBDEV_TURBO_MIN_CB_SIZE); + return -1; + } + } + + return 0; +} +#endif + +/** Enqueue one decode operations for ACC200 device in CB mode */ +static inline int +enqueue_dec_one_op_cb(struct acc200_queue *q, struct rte_bbdev_dec_op *op, + uint16_t total_enqueued_cbs) +{ + union acc200_dma_desc *desc = NULL; + int ret; + uint32_t in_offset, h_out_offset, s_out_offset, s_out_length, + h_out_length, mbuf_total_left, seg_total_left; + struct rte_mbuf *input, *h_output_head, *h_output, + *s_output_head, *s_output; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + /* Validate op structure */ + if (validate_dec_op(op) == -1) { + rte_bbdev_log(ERR, "Turbo decoder validation failed"); + return -EINVAL; + } +#endif + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + acc200_fcw_td_fill(op, &desc->req.fcw_td); + + input = op->turbo_dec.input.data; + h_output_head = h_output = op->turbo_dec.hard_output.data; + s_output_head = s_output = op->turbo_dec.soft_output.data; + in_offset = op->turbo_dec.input.offset; + h_out_offset = op->turbo_dec.hard_output.offset; + s_out_offset = op->turbo_dec.soft_output.offset; + h_out_length = s_out_length = 0; + mbuf_total_left = op->turbo_dec.input.length; + seg_total_left = rte_pktmbuf_data_len(input) - in_offset; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(input == NULL)) { + rte_bbdev_log(ERR, "Invalid mbuf pointer"); + return -EFAULT; + } +#endif + + /* Set up DMA descriptor */ + desc = q->ring_addr + ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + + ret = acc200_dma_desc_td_fill(op, &desc->req, &input, h_output, + s_output, &in_offset, &h_out_offset, &s_out_offset, + &h_out_length, &s_out_length, &mbuf_total_left, + &seg_total_left, 0); + + if (unlikely(ret < 0)) + return ret; + + /* Hard output */ + mbuf_append(h_output_head, h_output, h_out_length); + + /* Soft output */ + if (check_bit(op->turbo_dec.op_flags, RTE_BBDEV_TURBO_SOFT_OUTPUT)) + mbuf_append(s_output_head, s_output, s_out_length); + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_td, + sizeof(desc->req.fcw_td)); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + + /* One CB (one op) was successfully prepared to enqueue */ + return 1; +} + /** Enqueue one decode operations for ACC200 device in CB mode */ static inline int enqueue_ldpc_dec_one_op_cb(struct acc200_queue *q, struct rte_bbdev_dec_op *op, @@ -2084,6 +2925,108 @@ static inline uint32_t hq_index(uint32_t offset) return current_enqueued_cbs; } +/* Enqueue one decode operations for ACC200 device in TB mode */ +static inline int +enqueue_dec_one_op_tb(struct acc200_queue *q, struct rte_bbdev_dec_op *op, + uint16_t total_enqueued_cbs, uint8_t cbs_in_tb) +{ + union acc200_dma_desc *desc = NULL; + int ret; + uint8_t r, c; + uint32_t in_offset, h_out_offset, s_out_offset, s_out_length, + h_out_length, mbuf_total_left, seg_total_left; + struct rte_mbuf *input, *h_output_head, *h_output, + *s_output_head, *s_output; + uint16_t current_enqueued_cbs = 0; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + /* Validate op structure */ + if (validate_dec_op(op) == -1) { + rte_bbdev_log(ERR, "Turbo decoder validation failed"); + return -EINVAL; + } +#endif + + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + uint64_t fcw_offset = (desc_idx << 8) + ACC200_DESC_FCW_OFFSET; + acc200_fcw_td_fill(op, &desc->req.fcw_td); + + input = op->turbo_dec.input.data; + h_output_head = h_output = op->turbo_dec.hard_output.data; + s_output_head = s_output = op->turbo_dec.soft_output.data; + in_offset = op->turbo_dec.input.offset; + h_out_offset = op->turbo_dec.hard_output.offset; + s_out_offset = op->turbo_dec.soft_output.offset; + h_out_length = s_out_length = 0; + mbuf_total_left = op->turbo_dec.input.length; + c = op->turbo_dec.tb_params.c; + r = op->turbo_dec.tb_params.r; + + while (mbuf_total_left > 0 && r < c) { + + seg_total_left = rte_pktmbuf_data_len(input) - in_offset; + + /* Set up DMA descriptor */ + desc = q->ring_addr + ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc->req.data_ptrs[0].address = q->ring_addr_iova + fcw_offset; + desc->req.data_ptrs[0].blen = ACC200_FCW_TD_BLEN; + ret = acc200_dma_desc_td_fill(op, &desc->req, &input, + h_output, s_output, &in_offset, &h_out_offset, + &s_out_offset, &h_out_length, &s_out_length, + &mbuf_total_left, &seg_total_left, r); + + if (unlikely(ret < 0)) + return ret; + + /* Hard output */ + mbuf_append(h_output_head, h_output, h_out_length); + + /* Soft output */ + if (check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_SOFT_OUTPUT)) + mbuf_append(s_output_head, s_output, s_out_length); + + /* Set total number of CBs in TB */ + desc->req.cbs_in_tb = cbs_in_tb; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_td, + sizeof(desc->req.fcw_td) - 8); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + + if (seg_total_left == 0) { + /* Go to the next mbuf */ + input = input->next; + in_offset = 0; + h_output = h_output->next; + h_out_offset = 0; + + if (check_bit(op->turbo_dec.op_flags, + RTE_BBDEV_TURBO_SOFT_OUTPUT)) { + s_output = s_output->next; + s_out_offset = 0; + } + } + + total_enqueued_cbs++; + current_enqueued_cbs++; + r++; + } + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (check_mbuf_total_left(mbuf_total_left) != 0) + return -EINVAL; +#endif + /* Set SDone on last CB descriptor for TB mode */ + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + return current_enqueued_cbs; +} + /* Calculates number of CBs in processed encoder TB based on 'r' and input * length. */ @@ -2230,6 +3173,49 @@ static inline uint32_t hq_index(uint32_t offset) return (q->sw_ring_depth + q->sw_ring_head - q->sw_ring_tail) % q->sw_ring_depth; } +/* Enqueue encode operations for ACC200 device in CB mode. */ +static uint16_t +acc200_enqueue_enc_cb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + int32_t avail = acc200_ring_avail_enq(q); + uint16_t i; + union acc200_dma_desc *desc; + int ret; + + for (i = 0; i < num; ++i) { + /* Check if there are available space for further processing */ + if (unlikely(avail - 1 < 0)) { + acc200_enqueue_ring_full(q_data); + break; + } + avail -= 1; + + ret = enqueue_enc_one_op_cb(q, ops[i], i); + if (ret < 0) { + acc200_enqueue_invalid(q_data); + break; + } + } + + if (unlikely(i == 0)) + return 0; /* Nothing to enqueue */ + + /* Set SDone in last CB in enqueued ops for CB mode*/ + desc = q->ring_addr + ((q->sw_ring_head + i - 1) + & q->sw_ring_wrap_mask); + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + acc200_dma_enqueue(q, i, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + return i; +} + /* Check we can mux encode operations with common FCW */ static inline int16_t check_mux(struct rte_bbdev_enc_op **ops, uint16_t num) { @@ -2310,6 +3296,45 @@ static inline uint32_t hq_index(uint32_t offset) return i; } +/* Enqueue encode operations for ACC200 device in TB mode. */ +static uint16_t +acc200_enqueue_enc_tb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + int32_t avail = acc200_ring_avail_enq(q); + uint16_t i, enqueued_cbs = 0; + uint8_t cbs_in_tb; + int ret; + + for (i = 0; i < num; ++i) { + cbs_in_tb = get_num_cbs_in_tb_enc(&ops[i]->turbo_enc); + /* Check if there are available space for further processing */ + if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) { + acc200_enqueue_ring_full(q_data); + break; + } + avail -= cbs_in_tb; + + ret = enqueue_enc_one_op_tb(q, ops[i], enqueued_cbs, cbs_in_tb); + if (ret <= 0) { + acc200_enqueue_invalid(q_data); + break; + } + enqueued_cbs += ret; + } + if (unlikely(enqueued_cbs == 0)) + return 0; /* Nothing to enqueue */ + + acc200_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + + return i; +} + /* Enqueue LDPC encode operations for ACC200 device in TB mode. */ static uint16_t acc200_enqueue_ldpc_enc_tb(struct rte_bbdev_queue_data *q_data, @@ -2366,6 +3391,20 @@ static inline uint32_t hq_index(uint32_t offset) /* Enqueue encode operations for ACC200 device. */ static uint16_t +acc200_enqueue_enc(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + int32_t aq_avail = acc200_aq_avail(q_data, num); + if (unlikely((aq_avail <= 0) || (num == 0))) + return 0; + if (ops[0]->turbo_enc.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + return acc200_enqueue_enc_tb(q_data, ops, num); + else + return acc200_enqueue_enc_cb(q_data, ops, num); +} + +/* Enqueue encode operations for ACC200 device. */ +static uint16_t acc200_enqueue_ldpc_enc(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_enc_op **ops, uint16_t num) { @@ -2379,6 +3418,47 @@ static inline uint32_t hq_index(uint32_t offset) return acc200_enqueue_ldpc_enc_cb(q_data, ops, num); } + +/* Enqueue decode operations for ACC200 device in CB mode */ +static uint16_t +acc200_enqueue_dec_cb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + int32_t avail = acc200_ring_avail_enq(q); + uint16_t i; + union acc200_dma_desc *desc; + int ret; + + for (i = 0; i < num; ++i) { + /* Check if there are available space for further processing */ + if (unlikely(avail - 1 < 0)) + break; + avail -= 1; + + ret = enqueue_dec_one_op_cb(q, ops[i], i); + if (ret < 0) + break; + } + + if (unlikely(i == 0)) + return 0; /* Nothing to enqueue */ + + /* Set SDone in last CB in enqueued ops for CB mode*/ + desc = q->ring_addr + ((q->sw_ring_head + i - 1) + & q->sw_ring_wrap_mask); + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + + acc200_dma_enqueue(q, i, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + + return i; +} + /* Check we can mux encode operations with common FCW */ static inline bool cmp_ldpc_dec_op(struct rte_bbdev_dec_op **ops) { @@ -2480,6 +3560,58 @@ static inline uint32_t hq_index(uint32_t offset) return i; } + +/* Enqueue decode operations for ACC200 device in TB mode */ +static uint16_t +acc200_enqueue_dec_tb(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + int32_t avail = acc200_ring_avail_enq(q); + uint16_t i, enqueued_cbs = 0; + uint8_t cbs_in_tb; + int ret; + + for (i = 0; i < num; ++i) { + cbs_in_tb = get_num_cbs_in_tb_dec(&ops[i]->turbo_dec); + /* Check if there are available space for further processing */ + if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) { + acc200_enqueue_ring_full(q_data); + break; + } + avail -= cbs_in_tb; + + ret = enqueue_dec_one_op_tb(q, ops[i], enqueued_cbs, cbs_in_tb); + if (ret <= 0) { + acc200_enqueue_invalid(q_data); + break; + } + enqueued_cbs += ret; + } + + acc200_dma_enqueue(q, enqueued_cbs, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + + return i; +} + +/* Enqueue decode operations for ACC200 device. */ +static uint16_t +acc200_enqueue_dec(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + int32_t aq_avail = acc200_aq_avail(q_data, num); + if (unlikely((aq_avail <= 0) || (num == 0))) + return 0; + if (ops[0]->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + return acc200_enqueue_dec_tb(q_data, ops, num); + else + return acc200_enqueue_dec_cb(q_data, ops, num); +} + /* Enqueue decode operations for ACC200 device. */ static uint16_t acc200_enqueue_ldpc_dec(struct rte_bbdev_queue_data *q_data, @@ -2833,6 +3965,51 @@ static inline uint32_t hq_index(uint32_t offset) return cb_idx; } +/* Dequeue encode operations from ACC200 device. */ +static uint16_t +acc200_dequeue_enc(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + uint32_t avail = acc200_ring_avail_deq(q); + uint32_t aq_dequeued = 0; + uint16_t i, dequeued_ops = 0, dequeued_descs = 0; + int ret; + struct rte_bbdev_enc_op *op; + if (avail == 0) + return 0; + op = (q->ring_addr + (q->sw_ring_tail & + q->sw_ring_wrap_mask))->req.op_addr; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(ops == NULL || q == NULL || op == NULL)) + return 0; +#endif + int cbm = op->turbo_enc.code_block_mode; + + for (i = 0; i < num; i++) { + if (cbm == RTE_BBDEV_TRANSPORT_BLOCK) + ret = dequeue_enc_one_op_tb(q, &ops[dequeued_ops], + &dequeued_ops, &aq_dequeued, + &dequeued_descs); + else + ret = dequeue_enc_one_op_cb(q, &ops[dequeued_ops], + &dequeued_ops, &aq_dequeued, + &dequeued_descs); + if (ret < 0) + break; + if (dequeued_ops >= num) + break; + } + + q->aq_dequeued += aq_dequeued; + q->sw_ring_tail += dequeued_descs; + + /* Update enqueue stats */ + q_data->queue_stats.dequeued_count += dequeued_ops; + + return dequeued_ops; +} + /* Dequeue LDPC encode operations from ACC200 device. */ static uint16_t acc200_dequeue_ldpc_enc(struct rte_bbdev_queue_data *q_data, @@ -2880,6 +4057,51 @@ static inline uint32_t hq_index(uint32_t offset) /* Dequeue decode operations from ACC200 device. */ static uint16_t +acc200_dequeue_dec(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + uint16_t dequeue_num; + uint32_t avail = acc200_ring_avail_deq(q); + uint32_t aq_dequeued = 0; + uint16_t i; + uint16_t dequeued_cbs = 0; + struct rte_bbdev_dec_op *op; + int ret; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(ops == 0 && q == NULL)) + return 0; +#endif + + dequeue_num = (avail < num) ? avail : num; + + for (i = 0; i < dequeue_num; ++i) { + op = (q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask))->req.op_addr; + if (op->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) + ret = dequeue_dec_one_op_tb(q, &ops[i], dequeued_cbs, + &aq_dequeued); + else + ret = dequeue_dec_one_op_cb(q_data, q, &ops[i], + dequeued_cbs, &aq_dequeued); + + if (ret <= 0) + break; + dequeued_cbs += ret; + } + + q->aq_dequeued += aq_dequeued; + q->sw_ring_tail += dequeued_cbs; + + /* Update enqueue stats */ + q_data->queue_stats.dequeued_count += i; + + return i; +} + +/* Dequeue decode operations from ACC200 device. */ +static uint16_t acc200_dequeue_ldpc_dec(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t num) { @@ -2931,6 +4153,10 @@ static inline uint32_t hq_index(uint32_t offset) struct rte_pci_device *pci_dev = RTE_DEV_TO_PCI(dev->device); dev->dev_ops = &acc200_bbdev_ops; + dev->enqueue_enc_ops = acc200_enqueue_enc; + dev->enqueue_dec_ops = acc200_enqueue_dec; + dev->dequeue_enc_ops = acc200_dequeue_enc; + dev->dequeue_dec_ops = acc200_dequeue_dec; dev->enqueue_ldpc_enc_ops = acc200_enqueue_ldpc_enc; dev->enqueue_ldpc_dec_ops = acc200_enqueue_ldpc_dec; dev->dequeue_ldpc_enc_ops = acc200_dequeue_ldpc_enc; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v1 07/10] baseband/acc200: add support for FFT operations 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru ` (5 preceding siblings ...) 2022-07-08 0:01 ` [PATCH v1 06/10] baseband/acc200: add LTE " Nicolas Chautru @ 2022-07-08 0:01 ` Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 08/10] baseband/acc200: support interrupt Nicolas Chautru ` (3 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nicolas Chautru @ 2022-07-08 0:01 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal, trix Cc: maxime.coquelin, mdr, bruce.richardson, david.marchand, stephen, Nicolas Chautru Add functions and capability for FFT processing Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/rte_acc200_pmd.c | 272 ++++++++++++++++++++++++++++++- 1 file changed, 270 insertions(+), 2 deletions(-) diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 003a2a3..36c5561 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -860,6 +860,21 @@ .num_buffers_soft_out = 0, } }, + { + .type = RTE_BBDEV_OP_FFT, + .cap.fft = { + .capability_flags = + RTE_BBDEV_FFT_WINDOWING | + RTE_BBDEV_FFT_CS_ADJUSTMENT | + RTE_BBDEV_FFT_DFT_BYPASS | + RTE_BBDEV_FFT_IDFT_BYPASS | + RTE_BBDEV_FFT_WINDOWING_BYPASS, + .num_buffers_src = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + .num_buffers_dst = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + } + }, RTE_BBDEV_END_OF_CAPABILITIES_LIST() }; @@ -882,12 +897,13 @@ d->acc200_conf.q_ul_5g.num_qgroups; dev_info->num_queues[RTE_BBDEV_OP_LDPC_ENC] = d->acc200_conf.q_dl_5g.num_aqs_per_groups * d->acc200_conf.q_dl_5g.num_qgroups; - dev_info->num_queues[RTE_BBDEV_OP_FFT] = 0; + dev_info->num_queues[RTE_BBDEV_OP_FFT] = d->acc200_conf.q_fft.num_aqs_per_groups * + d->acc200_conf.q_fft.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = d->acc200_conf.q_ul_4g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = d->acc200_conf.q_dl_4g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = d->acc200_conf.q_ul_5g.num_qgroups; dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = d->acc200_conf.q_dl_5g.num_qgroups; - dev_info->queue_priority[RTE_BBDEV_OP_FFT] = 0; + dev_info->queue_priority[RTE_BBDEV_OP_FFT] = d->acc200_conf.q_fft.num_qgroups; dev_info->max_num_queues = 0; for (i = RTE_BBDEV_OP_NONE; i <= RTE_BBDEV_OP_FFT; i++) dev_info->max_num_queues += dev_info->num_queues[i]; @@ -2124,6 +2140,21 @@ static inline uint32_t hq_index(uint32_t offset) return 0; } + +/* Validates FFT op parameters */ +static inline int +validate_fft_op(struct rte_bbdev_fft_op *op) +{ + struct rte_bbdev_op_fft *fft = &op->fft; + struct rte_mbuf *input; + input = fft->base_input.data; + if (unlikely(input == NULL)) { + rte_bbdev_log(ERR, "Invalid mbuf pointer"); + return -EFAULT; + } + return 0; +} + #endif /* Enqueue one encode operations for ACC200 device in CB mode */ @@ -4146,6 +4177,241 @@ static inline uint32_t hq_index(uint32_t offset) return i; } +/* Fill in a frame control word for FFT processing. */ +static inline void +acc200_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct acc200_fcw_fft *fcw) +{ + fcw->in_frame_size = op->fft.input_sequence_size; + fcw->leading_pad_size = op->fft.input_leading_padding; + fcw->out_frame_size = op->fft.output_sequence_size; + fcw->leading_depad_size = op->fft.output_leading_depadding; + fcw->cs_window_sel = op->fft.window_index[0] + + (op->fft.window_index[1] << 8) + + (op->fft.window_index[2] << 16) + + (op->fft.window_index[3] << 24); + fcw->cs_window_sel2 = op->fft.window_index[4] + + (op->fft.window_index[5] << 8); + fcw->cs_enable_bmap = op->fft.cs_bitmap; + fcw->num_antennas = op->fft.num_antennas_log2; + fcw->idft_size = op->fft.idft_log2; + fcw->dft_size = op->fft.dft_log2; + fcw->cs_offset = op->fft.cs_time_adjustment; + fcw->idft_shift = op->fft.idft_shift; + fcw->dft_shift = op->fft.dft_shift; + fcw->cs_multiplier = op->fft.ncs_reciprocal; + if (check_bit(op->fft.op_flags, + RTE_BBDEV_FFT_IDFT_BYPASS)) { + if (check_bit(op->fft.op_flags, + RTE_BBDEV_FFT_WINDOWING_BYPASS)) + fcw->bypass = 2; + else + fcw->bypass = 1; + } else if (check_bit(op->fft.op_flags, + RTE_BBDEV_FFT_DFT_BYPASS)) + fcw->bypass = 3; + else + fcw->bypass = 0; +} + +static inline int +acc200_dma_desc_fft_fill(struct rte_bbdev_fft_op *op, + struct acc200_dma_req_desc *desc, + struct rte_mbuf *input, struct rte_mbuf *output, + uint32_t *in_offset, uint32_t *out_offset) +{ + /* FCW already done */ + acc200_header_init(desc); + desc->data_ptrs[1].address = + rte_pktmbuf_iova_offset(input, *in_offset); + desc->data_ptrs[1].blen = op->fft.input_sequence_size * 4; + desc->data_ptrs[1].blkid = ACC200_DMA_BLKID_IN; + desc->data_ptrs[1].last = 1; + desc->data_ptrs[1].dma_ext = 0; + desc->data_ptrs[2].address = + rte_pktmbuf_iova_offset(output, *out_offset); + desc->data_ptrs[2].blen = op->fft.output_sequence_size * 4; + desc->data_ptrs[2].blkid = ACC200_DMA_BLKID_OUT_HARD; + desc->data_ptrs[2].last = 1; + desc->data_ptrs[2].dma_ext = 0; + desc->m2dlen = 2; + desc->d2mlen = 1; + desc->ib_ant_offset = op->fft.input_sequence_size; + desc->num_ant = op->fft.num_antennas_log2 - 3; + int num_cs = 0, i; + for (i = 0; i < 12; i++) + if (check_bit(op->fft.cs_bitmap, 1 << i)) + num_cs++; + desc->num_cs = num_cs; + desc->ob_cyc_offset = op->fft.output_sequence_size; + desc->ob_ant_offset = op->fft.output_sequence_size * num_cs; + desc->op_addr = op; + return 0; +} + + +/** Enqueue one FFT operation for ACC200 device*/ +static inline int +enqueue_fft_one_op(struct acc200_queue *q, struct rte_bbdev_fft_op *op, + uint16_t total_enqueued_cbs) +{ +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (validate_fft_op(op) == -EFAULT) { + rte_bbdev_log(ERR, "FFT op validation failed"); + return -EINVAL; + } +#endif + union acc200_dma_desc *desc; + uint16_t desc_idx = ((q->sw_ring_head + total_enqueued_cbs) + & q->sw_ring_wrap_mask); + desc = q->ring_addr + desc_idx; + struct rte_mbuf *input, *output; + uint32_t in_offset, out_offset; + input = op->fft.base_input.data; + output = op->fft.base_output.data; + in_offset = op->fft.base_input.offset; + out_offset = op->fft.base_output.offset; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(input == NULL)) { + rte_bbdev_log(ERR, "Invalid mbuf pointer"); + return -EFAULT; + } +#endif + struct acc200_fcw_fft *fcw; + fcw = &desc->req.fcw_fft; + acc200_fcw_fft_fill(op, fcw); + acc200_dma_desc_fft_fill(op, &desc->req, input, output, + &in_offset, &out_offset); +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "FCW", &desc->req.fcw_fft, + sizeof(desc->req.fcw_fft)); + rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); +#endif + return 1; +} + +/* Enqueue decode operations for ACC200 device. */ +static uint16_t +acc200_enqueue_fft(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_fft_op **ops, uint16_t num) +{ + int32_t aq_avail = acc200_aq_avail(q_data, num); + if (unlikely((aq_avail <= 0) || (num == 0))) + return 0; + struct acc200_queue *q = q_data->queue_private; + int32_t avail = acc200_ring_avail_enq(q); + uint16_t i; + union acc200_dma_desc *desc; + int ret; + for (i = 0; i < num; ++i) { + /* Check if there are available space for further processing */ + if (unlikely(avail < 1)) + break; + avail -= 1; + ret = enqueue_fft_one_op(q, ops[i], i); + if (ret < 0) + break; + } + + if (unlikely(i == 0)) + return 0; /* Nothing to enqueue */ + + /* Set SDone in last CB in enqueued ops for CB mode*/ + desc = q->ring_addr + ((q->sw_ring_head + i - 1) + & q->sw_ring_wrap_mask); + + desc->req.sdone_enable = 1; + desc->req.irq_enable = q->irq_enable; + acc200_dma_enqueue(q, i, &q_data->queue_stats); + + /* Update stats */ + q_data->queue_stats.enqueued_count += i; + q_data->queue_stats.enqueue_err_count += num - i; + return i; +} + + +/* Dequeue one FFT operations from ACC200 device */ +static inline int +dequeue_fft_one_op(struct rte_bbdev_queue_data *q_data, + struct acc200_queue *q, struct rte_bbdev_fft_op **ref_op, + uint16_t dequeued_cbs, uint32_t *aq_dequeued) +{ + union acc200_dma_desc *desc, atom_desc; + union acc200_dma_rsp_desc rsp; + struct rte_bbdev_fft_op *op; + + desc = q->ring_addr + ((q->sw_ring_tail + dequeued_cbs) + & q->sw_ring_wrap_mask); + atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, + __ATOMIC_RELAXED); + + /* Check fdone bit */ + if (!(atom_desc.rsp.val & ACC200_FDONE)) + return -1; + + rsp.val = atom_desc.rsp.val; +#ifdef RTE_LIBRTE_BBDEV_DEBUG + rte_memdump(stderr, "Resp", &desc->rsp.val, + sizeof(desc->rsp.val)); +#endif + /* Dequeue */ + op = desc->req.op_addr; + + /* Clearing status, it will be set based on response */ + op->status = 0; + op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR; + op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR; + op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR; + if (op->status != 0) + q_data->queue_stats.dequeue_err_count++; + + /* Check if this is the last desc in batch (Atomic Queue) */ + if (desc->req.last_desc_in_batch) { + (*aq_dequeued)++; + desc->req.last_desc_in_batch = 0; + } + desc->rsp.val = ACC200_DMA_DESC_TYPE; + desc->rsp.add_info_0 = 0; + *ref_op = op; + /* One CB (op) was successfully dequeued */ + return 1; +} + + +/* Dequeue FFT operations from ACC200 device. */ +static uint16_t +acc200_dequeue_fft(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_fft_op **ops, uint16_t num) +{ + struct acc200_queue *q = q_data->queue_private; + uint16_t dequeue_num, i, dequeued_cbs = 0; + uint32_t avail = acc200_ring_avail_deq(q); + uint32_t aq_dequeued = 0; + int ret; + +#ifdef RTE_LIBRTE_BBDEV_DEBUG + if (unlikely(ops == 0 && q == NULL)) + return 0; +#endif + + dequeue_num = RTE_MIN(avail, num); + + for (i = 0; i < dequeue_num; ++i) { + ret = dequeue_fft_one_op( + q_data, q, &ops[i], dequeued_cbs, + &aq_dequeued); + if (ret <= 0) + break; + dequeued_cbs += ret; + } + + q->aq_dequeued += aq_dequeued; + q->sw_ring_tail += dequeued_cbs; + /* Update enqueue stats */ + q_data->queue_stats.dequeued_count += i; + return i; +} + /* Initialization Function */ static void acc200_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv) @@ -4161,6 +4427,8 @@ static inline uint32_t hq_index(uint32_t offset) dev->enqueue_ldpc_dec_ops = acc200_enqueue_ldpc_dec; dev->dequeue_ldpc_enc_ops = acc200_dequeue_ldpc_enc; dev->dequeue_ldpc_dec_ops = acc200_dequeue_ldpc_dec; + dev->enqueue_fft_ops = acc200_enqueue_fft; + dev->dequeue_fft_ops = acc200_dequeue_fft; ((struct acc200_device *) dev->data->dev_private)->pf_device = !strcmp(drv->driver.name, -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v1 08/10] baseband/acc200: support interrupt 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru ` (6 preceding siblings ...) 2022-07-08 0:01 ` [PATCH v1 07/10] baseband/acc200: add support for FFT operations Nicolas Chautru @ 2022-07-08 0:01 ` Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 09/10] baseband/acc200: add device status and vf2pf comms Nicolas Chautru ` (2 subsequent siblings) 10 siblings, 0 replies; 50+ messages in thread From: Nicolas Chautru @ 2022-07-08 0:01 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal, trix Cc: maxime.coquelin, mdr, bruce.richardson, david.marchand, stephen, Nicolas Chautru Adding support for capability and functions for MSI/MSI-X interript and underlying information ring. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/rte_acc200_pmd.c | 370 ++++++++++++++++++++++++++++++- 1 file changed, 368 insertions(+), 2 deletions(-) diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 36c5561..ecfbc7a 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -363,6 +363,217 @@ free_base_addresses(base_addrs, i); } +/* + * Find queue_id of a device queue based on details from the Info Ring. + * If a queue isn't found UINT16_MAX is returned. + */ +static inline uint16_t +get_queue_id_from_ring_info(struct rte_bbdev_data *data, + const union acc200_info_ring_data ring_data) +{ + uint16_t queue_id; + + for (queue_id = 0; queue_id < data->num_queues; ++queue_id) { + struct acc200_queue *acc200_q = + data->queues[queue_id].queue_private; + if (acc200_q != NULL && acc200_q->aq_id == ring_data.aq_id && + acc200_q->qgrp_id == ring_data.qg_id && + acc200_q->vf_id == ring_data.vf_id) + return queue_id; + } + + return UINT16_MAX; +} + +/* Checks PF Info Ring to find the interrupt cause and handles it accordingly */ +static inline void +acc200_check_ir(struct acc200_device *acc200_dev) +{ + volatile union acc200_info_ring_data *ring_data; + uint16_t info_ring_head = acc200_dev->info_ring_head; + if (acc200_dev->info_ring == NULL) + return; + + ring_data = acc200_dev->info_ring + (acc200_dev->info_ring_head & + ACC200_INFO_RING_MASK); + + while (ring_data->valid) { + if ((ring_data->int_nb < ACC200_PF_INT_DMA_DL_DESC_IRQ) || ( + ring_data->int_nb > + ACC200_PF_INT_DMA_DL5G_DESC_IRQ)) { + rte_bbdev_log(WARNING, "InfoRing: ITR:%d Info:0x%x", + ring_data->int_nb, ring_data->detailed_info); + /* Initialize Info Ring entry and move forward */ + ring_data->val = 0; + } + info_ring_head++; + ring_data = acc200_dev->info_ring + + (info_ring_head & ACC200_INFO_RING_MASK); + } +} + +/* Checks PF Info Ring to find the interrupt cause and handles it accordingly */ +static inline void +acc200_pf_interrupt_handler(struct rte_bbdev *dev) +{ + struct acc200_device *acc200_dev = dev->data->dev_private; + volatile union acc200_info_ring_data *ring_data; + struct acc200_deq_intr_details deq_intr_det; + + ring_data = acc200_dev->info_ring + (acc200_dev->info_ring_head & + ACC200_INFO_RING_MASK); + + while (ring_data->valid) { + + rte_bbdev_log_debug( + "ACC200 PF Interrupt received, Info Ring data: 0x%x -> %d", + ring_data->val, ring_data->int_nb); + + switch (ring_data->int_nb) { + case ACC200_PF_INT_DMA_DL_DESC_IRQ: + case ACC200_PF_INT_DMA_UL_DESC_IRQ: + case ACC200_PF_INT_DMA_FFT_DESC_IRQ: + case ACC200_PF_INT_DMA_UL5G_DESC_IRQ: + case ACC200_PF_INT_DMA_DL5G_DESC_IRQ: + deq_intr_det.queue_id = get_queue_id_from_ring_info( + dev->data, *ring_data); + if (deq_intr_det.queue_id == UINT16_MAX) { + rte_bbdev_log(ERR, + "Couldn't find queue: aq_id: %u, qg_id: %u, vf_id: %u", + ring_data->aq_id, + ring_data->qg_id, + ring_data->vf_id); + return; + } + rte_bbdev_pmd_callback_process(dev, + RTE_BBDEV_EVENT_DEQUEUE, &deq_intr_det); + break; + default: + rte_bbdev_pmd_callback_process(dev, + RTE_BBDEV_EVENT_ERROR, NULL); + break; + } + + /* Initialize Info Ring entry and move forward */ + ring_data->val = 0; + ++acc200_dev->info_ring_head; + ring_data = acc200_dev->info_ring + + (acc200_dev->info_ring_head & + ACC200_INFO_RING_MASK); + } +} + +/* Checks VF Info Ring to find the interrupt cause and handles it accordingly */ +static inline void +acc200_vf_interrupt_handler(struct rte_bbdev *dev) +{ + struct acc200_device *acc200_dev = dev->data->dev_private; + volatile union acc200_info_ring_data *ring_data; + struct acc200_deq_intr_details deq_intr_det; + + ring_data = acc200_dev->info_ring + (acc200_dev->info_ring_head & + ACC200_INFO_RING_MASK); + + while (ring_data->valid) { + + rte_bbdev_log_debug( + "ACC200 VF Interrupt received, Info Ring data: 0x%x\n", + ring_data->val); + + switch (ring_data->int_nb) { + case ACC200_VF_INT_DMA_DL_DESC_IRQ: + case ACC200_VF_INT_DMA_UL_DESC_IRQ: + case ACC200_VF_INT_DMA_FFT_DESC_IRQ: + case ACC200_VF_INT_DMA_UL5G_DESC_IRQ: + case ACC200_VF_INT_DMA_DL5G_DESC_IRQ: + /* VFs are not aware of their vf_id - it's set to 0 in + * queue structures. + */ + ring_data->vf_id = 0; + deq_intr_det.queue_id = get_queue_id_from_ring_info( + dev->data, *ring_data); + if (deq_intr_det.queue_id == UINT16_MAX) { + rte_bbdev_log(ERR, + "Couldn't find queue: aq_id: %u, qg_id: %u", + ring_data->aq_id, + ring_data->qg_id); + return; + } + rte_bbdev_pmd_callback_process(dev, + RTE_BBDEV_EVENT_DEQUEUE, &deq_intr_det); + break; + default: + rte_bbdev_pmd_callback_process(dev, + RTE_BBDEV_EVENT_ERROR, NULL); + break; + } + + /* Initialize Info Ring entry and move forward */ + ring_data->valid = 0; + ++acc200_dev->info_ring_head; + ring_data = acc200_dev->info_ring + (acc200_dev->info_ring_head + & ACC200_INFO_RING_MASK); + } +} + +/* Interrupt handler triggered by ACC200 dev for handling specific interrupt */ +static void +acc200_dev_interrupt_handler(void *cb_arg) +{ + struct rte_bbdev *dev = cb_arg; + struct acc200_device *acc200_dev = dev->data->dev_private; + + /* Read info ring */ + if (acc200_dev->pf_device) + acc200_pf_interrupt_handler(dev); + else + acc200_vf_interrupt_handler(dev); +} + +/* Allocate and setup inforing */ +static int +allocate_info_ring(struct rte_bbdev *dev) +{ + struct acc200_device *d = dev->data->dev_private; + const struct acc200_registry_addr *reg_addr; + rte_iova_t info_ring_iova; + uint32_t phys_low, phys_high; + + if (d->info_ring != NULL) + return 0; /* Already configured */ + + /* Choose correct registry addresses for the device type */ + if (d->pf_device) + reg_addr = &pf_reg_addr; + else + reg_addr = &vf_reg_addr; + /* Allocate InfoRing */ + if (d->info_ring == NULL) + d->info_ring = rte_zmalloc_socket("Info Ring", + ACC200_INFO_RING_NUM_ENTRIES * + sizeof(*d->info_ring), RTE_CACHE_LINE_SIZE, + dev->data->socket_id); + if (d->info_ring == NULL) { + rte_bbdev_log(ERR, + "Failed to allocate Info Ring for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + return -ENOMEM; + } + info_ring_iova = rte_malloc_virt2iova(d->info_ring); + + /* Setup Info Ring */ + phys_high = (uint32_t)(info_ring_iova >> 32); + phys_low = (uint32_t)(info_ring_iova); + acc200_reg_write(d, reg_addr->info_ring_hi, phys_high); + acc200_reg_write(d, reg_addr->info_ring_lo, phys_low); + acc200_reg_write(d, reg_addr->info_ring_en, ACC200_REG_IRQ_EN_ALL); + d->info_ring_head = (acc200_reg_read(d, reg_addr->info_ring_ptr) & + 0xFFF) / sizeof(union acc200_info_ring_data); + return 0; +} + + /* Allocate 64MB memory used for all software rings */ static int acc200_setup_queues(struct rte_bbdev *dev, uint16_t num_queues, int socket_id) @@ -370,6 +581,7 @@ uint32_t phys_low, phys_high, value; struct acc200_device *d = dev->data->dev_private; const struct acc200_registry_addr *reg_addr; + int ret; if (d->pf_device && !d->acc200_conf.pf_mode_en) { rte_bbdev_log(NOTICE, @@ -470,6 +682,14 @@ acc200_reg_write(d, reg_addr->tail_ptrs_fft_hi, phys_high); acc200_reg_write(d, reg_addr->tail_ptrs_fft_lo, phys_low); + ret = allocate_info_ring(dev); + if (ret < 0) { + rte_bbdev_log(ERR, "Failed to allocate info_ring for %s:%u", + dev->device->driver->name, + dev->data->dev_id); + /* Continue */ + } + if (d->harq_layout == NULL) d->harq_layout = rte_zmalloc_socket("HARQ Layout", ACC200_HARQ_LAYOUT * sizeof(*d->harq_layout), @@ -492,17 +712,121 @@ return 0; } +static int +acc200_intr_enable(struct rte_bbdev *dev) +{ + int ret; + struct acc200_device *d = dev->data->dev_private; + /* + * MSI/MSI-X are supported + * Option controlled by vfio-intr through EAL parameter + */ + if (rte_intr_type_get(dev->intr_handle) == RTE_INTR_HANDLE_VFIO_MSI) { + + ret = allocate_info_ring(dev); + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't allocate info ring for device: %s", + dev->data->name); + return ret; + } + ret = rte_intr_enable(dev->intr_handle); + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't enable interrupts for device: %s", + dev->data->name); + rte_free(d->info_ring); + return ret; + } + ret = rte_intr_callback_register(dev->intr_handle, + acc200_dev_interrupt_handler, dev); + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't register interrupt callback for device: %s", + dev->data->name); + rte_free(d->info_ring); + return ret; + } + + return 0; + } else if (rte_intr_type_get(dev->intr_handle) == RTE_INTR_HANDLE_VFIO_MSIX) { + + ret = allocate_info_ring(dev); + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't allocate info ring for device: %s", + dev->data->name); + return ret; + } + + int i, max_queues; + struct acc200_device *acc200_dev = dev->data->dev_private; + + if (acc200_dev->pf_device) + max_queues = ACC200_MAX_PF_MSIX; + else + max_queues = ACC200_MAX_VF_MSIX; + + if (rte_intr_efd_enable(dev->intr_handle, max_queues)) { + rte_bbdev_log(ERR, "Failed to create fds for %u queues", + dev->data->num_queues); + return -1; + } + + for (i = 0; i < max_queues; ++i) { + if (rte_intr_efds_index_set(dev->intr_handle, i, + rte_intr_fd_get(dev->intr_handle))) + return -rte_errno; + } + + if (rte_intr_vec_list_alloc(dev->intr_handle, "intr_vec", + dev->data->num_queues)) { + rte_bbdev_log(ERR, "Failed to allocate %u vectors", + dev->data->num_queues); + return -ENOMEM; + } + + ret = rte_intr_enable(dev->intr_handle); + + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't enable interrupts for device: %s", + dev->data->name); + rte_free(d->info_ring); + return ret; + } + ret = rte_intr_callback_register(dev->intr_handle, + acc200_dev_interrupt_handler, dev); + if (ret < 0) { + rte_bbdev_log(ERR, + "Couldn't register interrupt callback for device: %s", + dev->data->name); + rte_free(d->info_ring); + return ret; + } + + return 0; + } + + rte_bbdev_log(ERR, "ACC200 (%s) supports only VFIO MSI/MSI-X interrupts\n", + dev->data->name); + return -ENOTSUP; +} + /* Free memory used for software rings */ static int acc200_dev_close(struct rte_bbdev *dev) { struct acc200_device *d = dev->data->dev_private; + acc200_check_ir(d); if (d->sw_rings_base != NULL) { rte_free(d->tail_ptrs); + rte_free(d->info_ring); rte_free(d->sw_rings_base); rte_free(d->harq_layout); d->sw_rings_base = NULL; d->tail_ptrs = NULL; + d->info_ring = NULL; d->harq_layout = NULL; } /* Ensure all in flight HW transactions are completed */ @@ -795,6 +1119,7 @@ RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH | RTE_BBDEV_TURBO_SOFT_OUTPUT | RTE_BBDEV_TURBO_EARLY_TERMINATION | + RTE_BBDEV_TURBO_DEC_INTERRUPTS | RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN | RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT | RTE_BBDEV_TURBO_MAP_DEC | @@ -816,6 +1141,7 @@ RTE_BBDEV_TURBO_CRC_24B_ATTACH | RTE_BBDEV_TURBO_RV_INDEX_BYPASS | RTE_BBDEV_TURBO_RATE_MATCH | + RTE_BBDEV_TURBO_ENC_INTERRUPTS | RTE_BBDEV_TURBO_ENC_SCATTER_GATHER, .num_buffers_src = RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, @@ -829,7 +1155,8 @@ .capability_flags = RTE_BBDEV_LDPC_RATE_MATCH | RTE_BBDEV_LDPC_CRC_24B_ATTACH | - RTE_BBDEV_LDPC_INTERLEAVER_BYPASS, + RTE_BBDEV_LDPC_INTERLEAVER_BYPASS | + RTE_BBDEV_LDPC_ENC_INTERRUPTS, .num_buffers_src = RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, .num_buffers_dst = @@ -850,7 +1177,8 @@ RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS | RTE_BBDEV_LDPC_DEC_SCATTER_GATHER | RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION | - RTE_BBDEV_LDPC_LLR_COMPRESSION, + RTE_BBDEV_LDPC_LLR_COMPRESSION | + RTE_BBDEV_LDPC_DEC_INTERRUPTS, .llr_size = 8, .llr_decimals = 1, .num_buffers_src = @@ -918,15 +1246,46 @@ dev_info->min_alignment = 1; dev_info->capabilities = bbdev_capabilities; dev_info->harq_buffer_size = 0; + + acc200_check_ir(d); +} + +static int +acc200_queue_intr_enable(struct rte_bbdev *dev, uint16_t queue_id) +{ + struct acc200_queue *q = dev->data->queues[queue_id].queue_private; + + if (rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_VFIO_MSI && + rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_VFIO_MSIX) + return -ENOTSUP; + + q->irq_enable = 1; + return 0; +} + +static int +acc200_queue_intr_disable(struct rte_bbdev *dev, uint16_t queue_id) +{ + struct acc200_queue *q = dev->data->queues[queue_id].queue_private; + + if (rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_VFIO_MSI && + rte_intr_type_get(dev->intr_handle) != RTE_INTR_HANDLE_VFIO_MSIX) + return -ENOTSUP; + + q->irq_enable = 0; + return 0; } static const struct rte_bbdev_ops acc200_bbdev_ops = { .setup_queues = acc200_setup_queues, + .intr_enable = acc200_intr_enable, .close = acc200_dev_close, .info_get = acc200_dev_info_get, .queue_setup = acc200_queue_setup, .queue_release = acc200_queue_release, .queue_stop = acc200_queue_stop, + .queue_intr_enable = acc200_queue_intr_enable, + .queue_intr_disable = acc200_queue_intr_disable }; /* ACC200 PCI PF address map */ @@ -3821,6 +4180,7 @@ static inline uint32_t hq_index(uint32_t offset) if (op->status != 0) { /* These errors are not expected */ q_data->queue_stats.dequeue_err_count++; + acc200_check_ir(q->d); } /* CRC invalid if error exists */ @@ -3890,6 +4250,9 @@ static inline uint32_t hq_index(uint32_t offset) op->ldpc_dec.iter_count = (uint8_t) rsp.iter_cnt; + if (op->status & (1 << RTE_BBDEV_DRV_ERROR)) + acc200_check_ir(q->d); + /* Check if this is the last desc in batch (Atomic Queue) */ if (desc->req.last_desc_in_batch) { (*aq_dequeued)++; @@ -4365,6 +4728,9 @@ static inline uint32_t hq_index(uint32_t offset) if (op->status != 0) q_data->queue_stats.dequeue_err_count++; + if (op->status & (1 << RTE_BBDEV_DRV_ERROR)) + acc200_check_ir(q->d); + /* Check if this is the last desc in batch (Atomic Queue) */ if (desc->req.last_desc_in_batch) { (*aq_dequeued)++; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v1 09/10] baseband/acc200: add device status and vf2pf comms 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru ` (7 preceding siblings ...) 2022-07-08 0:01 ` [PATCH v1 08/10] baseband/acc200: support interrupt Nicolas Chautru @ 2022-07-08 0:01 ` Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 10/10] baseband/acc200: add PF configure companion function Nicolas Chautru 2022-07-12 13:48 ` [PATCH v1 00/10] baseband/acc200 Maxime Coquelin 10 siblings, 0 replies; 50+ messages in thread From: Nicolas Chautru @ 2022-07-08 0:01 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal, trix Cc: maxime.coquelin, mdr, bruce.richardson, david.marchand, stephen, Nicolas Chautru Add support to expose the device status seen from the host through v2pf mailbox communication. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- drivers/baseband/acc200/rte_acc200_pmd.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index ecfbc7a..856ea1c 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -262,6 +262,31 @@ acc200_conf->q_fft.aq_depth_log2); } +static inline void +acc200_vf2pf(struct acc200_device *d, unsigned int payload) +{ + acc200_reg_write(d, HWVfHiVfToPfDbellVf, payload); +} + +/* Request device status information */ +static inline uint32_t +acc200_device_status(struct rte_bbdev *dev) +{ + struct acc200_device *d = dev->data->dev_private; + uint32_t reg, time_out = 0; + if (d->pf_device) + return RTE_BBDEV_DEV_NOT_SUPPORTED; + acc200_vf2pf(d, ACC200_VF2PF_STATUS_REQUEST); + reg = acc200_reg_read(d, HWVfHiPfToVfDbellVf); + while ((time_out < ACC200_STATUS_TO) && (reg == RTE_BBDEV_DEV_NOSTATUS)) { + usleep(ACC200_STATUS_WAIT); /*< Wait or VF->PF->VF Comms */ + reg = acc200_reg_read(d, HWVfHiPfToVfDbellVf); + time_out++; + } + /* printf("DevStatus %x %s %d\n", reg, rte_bbdev_device_status_str(reg), time_out); */ + return reg; +} + static void free_base_addresses(void **base_addrs, int size) { @@ -704,6 +729,7 @@ /* Mark as configured properly */ d->configured = true; + acc200_vf2pf(d, ACC200_VF2PF_USING_VF); rte_bbdev_log_debug( "ACC200 (%s) configured sw_rings = %p, sw_rings_iova = %#" @@ -1214,6 +1240,8 @@ /* Read and save the populated config from ACC200 registers */ fetch_acc200_config(dev); + /* Check the status of device */ + dev_info->device_status = acc200_device_status(dev); /* Exposed number of queues */ dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v1 10/10] baseband/acc200: add PF configure companion function 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru ` (8 preceding siblings ...) 2022-07-08 0:01 ` [PATCH v1 09/10] baseband/acc200: add device status and vf2pf comms Nicolas Chautru @ 2022-07-08 0:01 ` Nicolas Chautru 2022-07-12 13:48 ` [PATCH v1 00/10] baseband/acc200 Maxime Coquelin 10 siblings, 0 replies; 50+ messages in thread From: Nicolas Chautru @ 2022-07-08 0:01 UTC (permalink / raw) To: dev, thomas, gakhil, hemant.agrawal, trix Cc: maxime.coquelin, mdr, bruce.richardson, david.marchand, stephen, Nicolas Chautru Add configure function notably to configure the device from the PF within DPDK and bbdev-test (without external dependency). Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- app/test-bbdev/meson.build | 3 + app/test-bbdev/test_bbdev_perf.c | 76 +++++ drivers/baseband/acc200/meson.build | 2 + drivers/baseband/acc200/rte_acc200_cfg.h | 21 ++ drivers/baseband/acc200/rte_acc200_pmd.c | 466 +++++++++++++++++++++++++++++++ drivers/baseband/acc200/version.map | 7 + 6 files changed, 575 insertions(+) diff --git a/app/test-bbdev/meson.build b/app/test-bbdev/meson.build index 76d4c26..1ffaa54 100644 --- a/app/test-bbdev/meson.build +++ b/app/test-bbdev/meson.build @@ -23,6 +23,9 @@ endif if dpdk_conf.has('RTE_BASEBAND_ACC100') deps += ['baseband_acc100'] endif +if dpdk_conf.has('RTE_BASEBAND_ACC200') + deps += ['baseband_acc200'] +endif if dpdk_conf.has('RTE_LIBRTE_PMD_BBDEV_LA12XX') deps += ['baseband_la12xx'] endif diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c index 653b21f..69a505d 100644 --- a/app/test-bbdev/test_bbdev_perf.c +++ b/app/test-bbdev/test_bbdev_perf.c @@ -64,6 +64,18 @@ #define ACC100_QOS_GBR 0 #endif +#ifdef RTE_BASEBAND_ACC200 +#include <rte_acc200_cfg.h> +#define ACC200PF_DRIVER_NAME ("intel_acc200_pf") +#define ACC200VF_DRIVER_NAME ("intel_acc200_vf") +#define ACC200_QMGR_NUM_AQS 16 +#define ACC200_QMGR_NUM_QGS 2 +#define ACC200_QMGR_AQ_DEPTH 5 +#define ACC200_QMGR_INVALID_IDX -1 +#define ACC200_QMGR_RR 1 +#define ACC200_QOS_GBR 0 +#endif + #define OPS_CACHE_SIZE 256U #define OPS_POOL_SIZE_MIN 511U /* 0.5K per queue */ @@ -762,6 +774,70 @@ typedef int (test_case_function)(struct active_device *ad, info->dev_name); } #endif +#ifdef RTE_BASEBAND_ACC200 + if ((get_init_device() == true) && + (!strcmp(info->drv.driver_name, ACC200PF_DRIVER_NAME))) { + struct rte_acc200_conf conf; + unsigned int i; + + printf("Configure ACC200 FEC Driver %s with default values\n", + info->drv.driver_name); + + /* clear default configuration before initialization */ + memset(&conf, 0, sizeof(struct rte_acc200_conf)); + + /* Always set in PF mode for built-in configuration */ + conf.pf_mode_en = true; + for (i = 0; i < RTE_ACC200_NUM_VFS; ++i) { + conf.arb_dl_4g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_dl_4g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_dl_4g[i].round_robin_weight = ACC200_QMGR_RR; + conf.arb_ul_4g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_ul_4g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_ul_4g[i].round_robin_weight = ACC200_QMGR_RR; + conf.arb_dl_5g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_dl_5g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_dl_5g[i].round_robin_weight = ACC200_QMGR_RR; + conf.arb_ul_5g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_ul_5g[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_ul_5g[i].round_robin_weight = ACC200_QMGR_RR; + conf.arb_fft[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_fft[i].gbr_threshold1 = ACC200_QOS_GBR; + conf.arb_fft[i].round_robin_weight = ACC200_QMGR_RR; + } + + conf.input_pos_llr_1_bit = true; + conf.output_pos_llr_1_bit = true; + conf.num_vf_bundles = 1; /**< Number of VF bundles to setup */ + + conf.q_ul_4g.num_qgroups = ACC200_QMGR_NUM_QGS; + conf.q_ul_4g.first_qgroup_index = ACC200_QMGR_INVALID_IDX; + conf.q_ul_4g.num_aqs_per_groups = ACC200_QMGR_NUM_AQS; + conf.q_ul_4g.aq_depth_log2 = ACC200_QMGR_AQ_DEPTH; + conf.q_dl_4g.num_qgroups = ACC200_QMGR_NUM_QGS; + conf.q_dl_4g.first_qgroup_index = ACC200_QMGR_INVALID_IDX; + conf.q_dl_4g.num_aqs_per_groups = ACC200_QMGR_NUM_AQS; + conf.q_dl_4g.aq_depth_log2 = ACC200_QMGR_AQ_DEPTH; + conf.q_ul_5g.num_qgroups = ACC200_QMGR_NUM_QGS; + conf.q_ul_5g.first_qgroup_index = ACC200_QMGR_INVALID_IDX; + conf.q_ul_5g.num_aqs_per_groups = ACC200_QMGR_NUM_AQS; + conf.q_ul_5g.aq_depth_log2 = ACC200_QMGR_AQ_DEPTH; + conf.q_dl_5g.num_qgroups = ACC200_QMGR_NUM_QGS; + conf.q_dl_5g.first_qgroup_index = ACC200_QMGR_INVALID_IDX; + conf.q_dl_5g.num_aqs_per_groups = ACC200_QMGR_NUM_AQS; + conf.q_dl_5g.aq_depth_log2 = ACC200_QMGR_AQ_DEPTH; + conf.q_fft.num_qgroups = ACC200_QMGR_NUM_QGS; + conf.q_fft.first_qgroup_index = ACC200_QMGR_INVALID_IDX; + conf.q_fft.num_aqs_per_groups = ACC200_QMGR_NUM_AQS; + conf.q_fft.aq_depth_log2 = ACC200_QMGR_AQ_DEPTH; + + /* setup PF with configuration information */ + ret = rte_acc200_configure(info->dev_name, &conf); + TEST_ASSERT_SUCCESS(ret, + "Failed to configure ACC200 PF for bbdev %s", + info->dev_name); + } +#endif /* Let's refresh this now this is configured */ rte_bbdev_info_get(dev_id, info); nb_queues = RTE_MIN(rte_lcore_count(), info->drv.max_num_queues); diff --git a/drivers/baseband/acc200/meson.build b/drivers/baseband/acc200/meson.build index 7b47bc6..33b3e5e 100644 --- a/drivers/baseband/acc200/meson.build +++ b/drivers/baseband/acc200/meson.build @@ -4,3 +4,5 @@ deps += ['bbdev', 'bus_vdev', 'ring', 'pci', 'bus_pci'] sources = files('rte_acc200_pmd.c') + +headers = files('rte_acc200_cfg.h') diff --git a/drivers/baseband/acc200/rte_acc200_cfg.h b/drivers/baseband/acc200/rte_acc200_cfg.h index fcccfbf..33ea819 100644 --- a/drivers/baseband/acc200/rte_acc200_cfg.h +++ b/drivers/baseband/acc200/rte_acc200_cfg.h @@ -91,4 +91,25 @@ struct rte_acc200_conf { struct rte_acc200_arbitration arb_fft[RTE_ACC200_NUM_VFS]; }; +/** + * Configure a ACC200 device + * + * @param dev_name + * The name of the device. This is the short form of PCI BDF, e.g. 00:01.0. + * It can also be retrieved for a bbdev device from the dev_name field in the + * rte_bbdev_info structure returned by rte_bbdev_info_get(). + * @param conf + * Configuration to apply to ACC200 HW. + * + * @return + * Zero on success, negative value on failure. + */ +__rte_experimental +int +rte_acc200_configure(const char *dev_name, struct rte_acc200_conf *conf); + +#ifdef __cplusplus +} +#endif + #endif /* _RTE_ACC200_CFG_H_ */ diff --git a/drivers/baseband/acc200/rte_acc200_pmd.c b/drivers/baseband/acc200/rte_acc200_pmd.c index 856ea1c..c44d729 100644 --- a/drivers/baseband/acc200/rte_acc200_pmd.c +++ b/drivers/baseband/acc200/rte_acc200_pmd.c @@ -85,6 +85,27 @@ enum {UL_4G = 0, UL_5G, DL_4G, DL_5G, FFT, NUM_ACC}; +/* Return the accelerator enum for a Queue Group Index */ +static inline int +accFromQgid(int qg_idx, const struct rte_acc200_conf *acc200_conf) +{ + int accQg[ACC200_NUM_QGRPS]; + int NumQGroupsPerFn[NUM_ACC]; + int acc, qgIdx, qgIndex = 0; + for (qgIdx = 0; qgIdx < ACC200_NUM_QGRPS; qgIdx++) + accQg[qgIdx] = 0; + NumQGroupsPerFn[UL_4G] = acc200_conf->q_ul_4g.num_qgroups; + NumQGroupsPerFn[UL_5G] = acc200_conf->q_ul_5g.num_qgroups; + NumQGroupsPerFn[DL_4G] = acc200_conf->q_dl_4g.num_qgroups; + NumQGroupsPerFn[DL_5G] = acc200_conf->q_dl_5g.num_qgroups; + NumQGroupsPerFn[FFT] = acc200_conf->q_fft.num_qgroups; + for (acc = UL_4G; acc < NUM_ACC; acc++) + for (qgIdx = 0; qgIdx < NumQGroupsPerFn[acc]; qgIdx++) + accQg[qgIndex++] = acc; + acc = accQg[qg_idx]; + return acc; +} + /* Return the queue topology for a Queue Group Index */ static inline void qtopFromAcc(struct rte_acc200_queue_topology **qtop, int acc_enum, @@ -117,6 +138,30 @@ *qtop = p_qtop; } +/* Return the AQ depth for a Queue Group Index */ +static inline int +aqDepth(int qg_idx, struct rte_acc200_conf *acc200_conf) +{ + struct rte_acc200_queue_topology *q_top = NULL; + int acc_enum = accFromQgid(qg_idx, acc200_conf); + qtopFromAcc(&q_top, acc_enum, acc200_conf); + if (unlikely(q_top == NULL)) + return 0; + return q_top->aq_depth_log2; +} + +/* Return the AQ depth for a Queue Group Index */ +static inline int +aqNum(int qg_idx, struct rte_acc200_conf *acc200_conf) +{ + struct rte_acc200_queue_topology *q_top = NULL; + int acc_enum = accFromQgid(qg_idx, acc200_conf); + qtopFromAcc(&q_top, acc_enum, acc200_conf); + if (unlikely(q_top == NULL)) + return 0; + return q_top->num_aqs_per_groups; +} + static void initQTop(struct rte_acc200_conf *acc200_conf) { @@ -4935,3 +4980,424 @@ static int acc200_pci_remove(struct rte_pci_device *pci_dev) RTE_PMD_REGISTER_PCI_TABLE(ACC200PF_DRIVER_NAME, pci_id_acc200_pf_map); RTE_PMD_REGISTER_PCI(ACC200VF_DRIVER_NAME, acc200_pci_vf_driver); RTE_PMD_REGISTER_PCI_TABLE(ACC200VF_DRIVER_NAME, pci_id_acc200_vf_map); + +/* Initial configuration of a ACC200 device prior to running configure() */ +int +rte_acc200_configure(const char *dev_name, struct rte_acc200_conf *conf) +{ + rte_bbdev_log(INFO, "rte_acc200_configure"); + uint32_t value, address, status; + int qg_idx, template_idx, vf_idx, acc, i; + struct rte_bbdev *bbdev = rte_bbdev_get_named_dev(dev_name); + + /* Compile time checks */ + RTE_BUILD_BUG_ON(sizeof(struct acc200_dma_req_desc) != 256); + RTE_BUILD_BUG_ON(sizeof(union acc200_dma_desc) != 256); + RTE_BUILD_BUG_ON(sizeof(struct acc200_fcw_td) != 24); + RTE_BUILD_BUG_ON(sizeof(struct acc200_fcw_te) != 32); + + if (bbdev == NULL) { + rte_bbdev_log(ERR, + "Invalid dev_name (%s), or device is not yet initialised", + dev_name); + return -ENODEV; + } + struct acc200_device *d = bbdev->data->dev_private; + + /* Store configuration */ + rte_memcpy(&d->acc200_conf, conf, sizeof(d->acc200_conf)); + + + /* Check we are already out of PG */ + status = acc200_reg_read(d, HWPfHiSectionPowerGatingAck); + if (status > 0) { + if (status != ACC200_PG_MASK_0) { + rte_bbdev_log(ERR, "Unexpected status %x %x", + status, ACC200_PG_MASK_0); + return -ENODEV; + } + /* Clock gate sections that will be un-PG */ + acc200_reg_write(d, HWPfHiClkGateHystReg, ACC200_CLK_DIS); + /* Un-PG required sections */ + acc200_reg_write(d, HWPfHiSectionPowerGatingReq, + ACC200_PG_MASK_1); + status = acc200_reg_read(d, HWPfHiSectionPowerGatingAck); + if (status != ACC200_PG_MASK_1) { + rte_bbdev_log(ERR, "Unexpected status %x %x", + status, ACC200_PG_MASK_1); + return -ENODEV; + } + acc200_reg_write(d, HWPfHiSectionPowerGatingReq, + ACC200_PG_MASK_2); + status = acc200_reg_read(d, HWPfHiSectionPowerGatingAck); + if (status != ACC200_PG_MASK_2) { + rte_bbdev_log(ERR, "Unexpected status %x %x", + status, ACC200_PG_MASK_2); + return -ENODEV; + } + acc200_reg_write(d, HWPfHiSectionPowerGatingReq, + ACC200_PG_MASK_3); + status = acc200_reg_read(d, HWPfHiSectionPowerGatingAck); + if (status != ACC200_PG_MASK_3) { + rte_bbdev_log(ERR, "Unexpected status %x %x", + status, ACC200_PG_MASK_3); + return -ENODEV; + } + /* Enable clocks for all sections */ + acc200_reg_write(d, HWPfHiClkGateHystReg, ACC200_CLK_EN); + } + + /* Explicitly releasing AXI as this may be stopped after PF FLR/BME */ + address = HWPfDmaAxiControl; + value = 1; + acc200_reg_write(d, address, value); + + /* Set the fabric mode */ + address = HWPfFabricM2iBufferReg; + value = ACC200_FABRIC_MODE; + acc200_reg_write(d, address, value); + + /* Set default descriptor signature */ + address = HWPfDmaDescriptorSignatuture; + value = 0; + acc200_reg_write(d, address, value); + + /* Enable the Error Detection in DMA */ + value = ACC200_CFG_DMA_ERROR; + address = HWPfDmaErrorDetectionEn; + acc200_reg_write(d, address, value); + + /* AXI Cache configuration */ + value = ACC200_CFG_AXI_CACHE; + address = HWPfDmaAxcacheReg; + acc200_reg_write(d, address, value); + + /* Default DMA Configuration (Qmgr Enabled) */ + address = HWPfDmaConfig0Reg; + value = 0; + acc200_reg_write(d, address, value); + address = HWPfDmaQmanen; + value = 0; + acc200_reg_write(d, address, value); + + /* Default RLIM/ALEN configuration */ + int rlim = 0; + int alen = 1; + int timestamp = 0; + address = HWPfDmaConfig1Reg; + value = (1 << 31) + (rlim << 8) + (timestamp << 6) + alen; + acc200_reg_write(d, address, value); + + /* Default FFT configuration */ + address = HWPfFftConfig0; + value = ACC200_FFT_CFG_0; + acc200_reg_write(d, address, value); + + /* Configure DMA Qmanager addresses */ + address = HWPfDmaQmgrAddrReg; + value = HWPfQmgrEgressQueuesTemplate; + acc200_reg_write(d, address, value); + + /* ===== Qmgr Configuration ===== */ + /* Configuration of the AQueue Depth QMGR_GRP_0_DEPTH_LOG2 for UL */ + int totalQgs = conf->q_ul_4g.num_qgroups + + conf->q_ul_5g.num_qgroups + + conf->q_dl_4g.num_qgroups + + conf->q_dl_5g.num_qgroups + + conf->q_fft.num_qgroups; + for (qg_idx = 0; qg_idx < ACC200_NUM_QGRPS; qg_idx++) { + address = HWPfQmgrDepthLog2Grp + + ACC200_BYTES_IN_WORD * qg_idx; + value = aqDepth(qg_idx, conf); + acc200_reg_write(d, address, value); + address = HWPfQmgrTholdGrp + + ACC200_BYTES_IN_WORD * qg_idx; + value = (1 << 16) + (1 << (aqDepth(qg_idx, conf) - 1)); + acc200_reg_write(d, address, value); + } + + /* Template Priority in incremental order */ + for (template_idx = 0; template_idx < ACC200_NUM_TMPL; + template_idx++) { + address = HWPfQmgrGrpTmplateReg0Indx + ACC200_BYTES_IN_WORD * template_idx; + value = ACC200_TMPL_PRI_0; + acc200_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg1Indx + ACC200_BYTES_IN_WORD * template_idx; + value = ACC200_TMPL_PRI_1; + acc200_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg2indx + ACC200_BYTES_IN_WORD * template_idx; + value = ACC200_TMPL_PRI_2; + acc200_reg_write(d, address, value); + address = HWPfQmgrGrpTmplateReg3Indx + ACC200_BYTES_IN_WORD * template_idx; + value = ACC200_TMPL_PRI_3; + acc200_reg_write(d, address, value); + } + + address = HWPfQmgrGrpPriority; + value = ACC200_CFG_QMGR_HI_P; + acc200_reg_write(d, address, value); + + /* Template Configuration */ + for (template_idx = 0; template_idx < ACC200_NUM_TMPL; + template_idx++) { + value = 0; + address = HWPfQmgrGrpTmplateReg4Indx + + ACC200_BYTES_IN_WORD * template_idx; + acc200_reg_write(d, address, value); + } + /* 4GUL */ + int numQgs = conf->q_ul_4g.num_qgroups; + int numQqsAcc = 0; + value = 0; + for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++) + value |= (1 << qg_idx); + for (template_idx = ACC200_SIG_UL_4G; + template_idx <= ACC200_SIG_UL_4G_LAST; + template_idx++) { + address = HWPfQmgrGrpTmplateReg4Indx + + ACC200_BYTES_IN_WORD * template_idx; + acc200_reg_write(d, address, value); + } + /* 5GUL */ + numQqsAcc += numQgs; + numQgs = conf->q_ul_5g.num_qgroups; + value = 0; + int numEngines = 0; + for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++) + value |= (1 << qg_idx); + for (template_idx = ACC200_SIG_UL_5G; + template_idx <= ACC200_SIG_UL_5G_LAST; + template_idx++) { + /* Check engine power-on status */ + address = HwPfFecUl5gIbDebugReg + + ACC200_ENGINE_OFFSET * template_idx; + status = (acc200_reg_read(d, address) >> 4) & 0x7; + address = HWPfQmgrGrpTmplateReg4Indx + + ACC200_BYTES_IN_WORD * template_idx; + if (status == 1) { + acc200_reg_write(d, address, value); + numEngines++; + } else + acc200_reg_write(d, address, 0); +#if RTE_ACC200_SINGLE_FEC == 1 + value = 0; +#endif + } + printf("Number of 5GUL engines %d\n", numEngines); + /* 4GDL */ + numQqsAcc += numQgs; + numQgs = conf->q_dl_4g.num_qgroups; + value = 0; + for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++) + value |= (1 << qg_idx); + for (template_idx = ACC200_SIG_DL_4G; + template_idx <= ACC200_SIG_DL_4G_LAST; + template_idx++) { + address = HWPfQmgrGrpTmplateReg4Indx + + ACC200_BYTES_IN_WORD * template_idx; + acc200_reg_write(d, address, value); +#if RTE_ACC200_SINGLE_FEC == 1 + value = 0; +#endif + } + /* 5GDL */ + numQqsAcc += numQgs; + numQgs = conf->q_dl_5g.num_qgroups; + value = 0; + for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++) + value |= (1 << qg_idx); + for (template_idx = ACC200_SIG_DL_5G; + template_idx <= ACC200_SIG_DL_5G_LAST; + template_idx++) { + address = HWPfQmgrGrpTmplateReg4Indx + + ACC200_BYTES_IN_WORD * template_idx; + acc200_reg_write(d, address, value); +#if RTE_ACC200_SINGLE_FEC == 1 + value = 0; +#endif + } + /* FFT */ + numQqsAcc += numQgs; + numQgs = conf->q_fft.num_qgroups; + value = 0; + for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++) + value |= (1 << qg_idx); + for (template_idx = ACC200_SIG_FFT; + template_idx <= ACC200_SIG_FFT_LAST; + template_idx++) { + address = HWPfQmgrGrpTmplateReg4Indx + + ACC200_BYTES_IN_WORD * template_idx; + acc200_reg_write(d, address, value); +#if RTE_ACC200_SINGLE_FEC == 1 + value = 0; +#endif + } + + /* Queue Group Function mapping */ + int qman_func_id[8] = {0, 2, 1, 3, 4, 0, 0, 0}; + value = 0; + for (qg_idx = 0; qg_idx < ACC200_NUM_QGRPS_PER_WORD; qg_idx++) { + acc = accFromQgid(qg_idx, conf); + value |= qman_func_id[acc] << (qg_idx * 4); + } + acc200_reg_write(d, HWPfQmgrGrpFunction0, value); + value = 0; + for (qg_idx = 0; qg_idx < ACC200_NUM_QGRPS_PER_WORD; qg_idx++) { + acc = accFromQgid(qg_idx + ACC200_NUM_QGRPS_PER_WORD, conf); + value |= qman_func_id[acc] << (qg_idx * 4); + } + acc200_reg_write(d, HWPfQmgrGrpFunction1, value); + + /* Configuration of the Arbitration QGroup depth to 1 */ + for (qg_idx = 0; qg_idx < ACC200_NUM_QGRPS; qg_idx++) { + address = HWPfQmgrArbQDepthGrp + + ACC200_BYTES_IN_WORD * qg_idx; + value = 0; + acc200_reg_write(d, address, value); + } + + /* This pointer to ARAM (256kB) is shifted by 2 (4B per register) */ + uint32_t aram_address = 0; + for (qg_idx = 0; qg_idx < totalQgs; qg_idx++) { + for (vf_idx = 0; vf_idx < conf->num_vf_bundles; vf_idx++) { + address = HWPfQmgrVfBaseAddr + vf_idx + * ACC200_BYTES_IN_WORD + qg_idx + * ACC200_BYTES_IN_WORD * 64; + value = aram_address; + acc200_reg_write(d, address, value); + /* Offset ARAM Address for next memory bank + * - increment of 4B + */ + aram_address += aqNum(qg_idx, conf) * + (1 << aqDepth(qg_idx, conf)); + } + } + + if (aram_address > ACC200_WORDS_IN_ARAM_SIZE) { + rte_bbdev_log(ERR, "ARAM Configuration not fitting %d %d\n", + aram_address, ACC200_WORDS_IN_ARAM_SIZE); + return -EINVAL; + } + + /* Performance tuning */ + acc200_reg_write(d, HWPfFabricI2Mdma_weight, 0x0FFF); + acc200_reg_write(d, HWPfDma4gdlIbThld, 0x1f10); + + /* ==== HI Configuration ==== */ + + /* No Info Ring/MSI by default */ + address = HWPfHiInfoRingIntWrEnRegPf; + value = 0; + acc200_reg_write(d, address, value); + address = HWPfHiCfgMsiIntWrEnRegPf; + value = 0xFFFFFFFF; + acc200_reg_write(d, address, value); + /* Prevent Block on Transmit Error */ + address = HWPfHiBlockTransmitOnErrorEn; + value = 0; + acc200_reg_write(d, address, value); + /* Prevents to drop MSI */ + address = HWPfHiMsiDropEnableReg; + value = 0; + acc200_reg_write(d, address, value); + /* Set the PF Mode register */ + address = HWPfHiPfMode; + value = (conf->pf_mode_en) ? ACC200_PF_VAL : 0; + acc200_reg_write(d, address, value); + + /* QoS overflow init */ + value = 1; + address = HWPfQosmonAEvalOverflow0; + acc200_reg_write(d, address, value); + address = HWPfQosmonBEvalOverflow0; + acc200_reg_write(d, address, value); + + /* Configure the FFT RAM LUT */ + uint32_t fft_lut[ACC200_FFT_RAM_SIZE] = { + 0x1FFFF, 0x1FFFF, 0x1FFFE, 0x1FFFA, 0x1FFF6, 0x1FFF1, 0x1FFEA, 0x1FFE2, + 0x1FFD9, 0x1FFCE, 0x1FFC2, 0x1FFB5, 0x1FFA7, 0x1FF98, 0x1FF87, 0x1FF75, + 0x1FF62, 0x1FF4E, 0x1FF38, 0x1FF21, 0x1FF09, 0x1FEF0, 0x1FED6, 0x1FEBA, + 0x1FE9D, 0x1FE7F, 0x1FE5F, 0x1FE3F, 0x1FE1D, 0x1FDFA, 0x1FDD5, 0x1FDB0, + 0x1FD89, 0x1FD61, 0x1FD38, 0x1FD0D, 0x1FCE1, 0x1FCB4, 0x1FC86, 0x1FC57, + 0x1FC26, 0x1FBF4, 0x1FBC1, 0x1FB8D, 0x1FB58, 0x1FB21, 0x1FAE9, 0x1FAB0, + 0x1FA75, 0x1FA3A, 0x1F9FD, 0x1F9BF, 0x1F980, 0x1F93F, 0x1F8FD, 0x1F8BA, + 0x1F876, 0x1F831, 0x1F7EA, 0x1F7A3, 0x1F75A, 0x1F70F, 0x1F6C4, 0x1F677, + 0x1F629, 0x1F5DA, 0x1F58A, 0x1F539, 0x1F4E6, 0x1F492, 0x1F43D, 0x1F3E7, + 0x1F38F, 0x1F337, 0x1F2DD, 0x1F281, 0x1F225, 0x1F1C8, 0x1F169, 0x1F109, + 0x1F0A8, 0x1F046, 0x1EFE2, 0x1EF7D, 0x1EF18, 0x1EEB0, 0x1EE48, 0x1EDDF, + 0x1ED74, 0x1ED08, 0x1EC9B, 0x1EC2D, 0x1EBBE, 0x1EB4D, 0x1EADB, 0x1EA68, + 0x1E9F4, 0x1E97F, 0x1E908, 0x1E891, 0x1E818, 0x1E79E, 0x1E722, 0x1E6A6, + 0x1E629, 0x1E5AA, 0x1E52A, 0x1E4A9, 0x1E427, 0x1E3A3, 0x1E31F, 0x1E299, + 0x1E212, 0x1E18A, 0x1E101, 0x1E076, 0x1DFEB, 0x1DF5E, 0x1DED0, 0x1DE41, + 0x1DDB1, 0x1DD20, 0x1DC8D, 0x1DBFA, 0x1DB65, 0x1DACF, 0x1DA38, 0x1D9A0, + 0x1D907, 0x1D86C, 0x1D7D1, 0x1D734, 0x1D696, 0x1D5F7, 0x1D557, 0x1D4B6, + 0x1D413, 0x1D370, 0x1D2CB, 0x1D225, 0x1D17E, 0x1D0D6, 0x1D02D, 0x1CF83, + 0x1CED8, 0x1CE2B, 0x1CD7E, 0x1CCCF, 0x1CC1F, 0x1CB6E, 0x1CABC, 0x1CA09, + 0x1C955, 0x1C89F, 0x1C7E9, 0x1C731, 0x1C679, 0x1C5BF, 0x1C504, 0x1C448, + 0x1C38B, 0x1C2CD, 0x1C20E, 0x1C14E, 0x1C08C, 0x1BFCA, 0x1BF06, 0x1BE42, + 0x1BD7C, 0x1BCB5, 0x1BBED, 0x1BB25, 0x1BA5B, 0x1B990, 0x1B8C4, 0x1B7F6, + 0x1B728, 0x1B659, 0x1B589, 0x1B4B7, 0x1B3E5, 0x1B311, 0x1B23D, 0x1B167, + 0x1B091, 0x1AFB9, 0x1AEE0, 0x1AE07, 0x1AD2C, 0x1AC50, 0x1AB73, 0x1AA95, + 0x1A9B6, 0x1A8D6, 0x1A7F6, 0x1A714, 0x1A631, 0x1A54D, 0x1A468, 0x1A382, + 0x1A29A, 0x1A1B2, 0x1A0C9, 0x19FDF, 0x19EF4, 0x19E08, 0x19D1B, 0x19C2D, + 0x19B3E, 0x19A4E, 0x1995D, 0x1986B, 0x19778, 0x19684, 0x1958F, 0x19499, + 0x193A2, 0x192AA, 0x191B1, 0x190B8, 0x18FBD, 0x18EC1, 0x18DC4, 0x18CC7, + 0x18BC8, 0x18AC8, 0x189C8, 0x188C6, 0x187C4, 0x186C1, 0x185BC, 0x184B7, + 0x183B1, 0x182AA, 0x181A2, 0x18099, 0x17F8F, 0x17E84, 0x17D78, 0x17C6C, + 0x17B5E, 0x17A4F, 0x17940, 0x17830, 0x1771E, 0x1760C, 0x174F9, 0x173E5, + 0x172D1, 0x171BB, 0x170A4, 0x16F8D, 0x16E74, 0x16D5B, 0x16C41, 0x16B26, + 0x16A0A, 0x168ED, 0x167CF, 0x166B1, 0x16592, 0x16471, 0x16350, 0x1622E, + 0x1610B, 0x15FE8, 0x15EC3, 0x15D9E, 0x15C78, 0x15B51, 0x15A29, 0x15900, + 0x157D7, 0x156AC, 0x15581, 0x15455, 0x15328, 0x151FB, 0x150CC, 0x14F9D, + 0x14E6D, 0x14D3C, 0x14C0A, 0x14AD8, 0x149A4, 0x14870, 0x1473B, 0x14606, + 0x144CF, 0x14398, 0x14260, 0x14127, 0x13FEE, 0x13EB3, 0x13D78, 0x13C3C, + 0x13B00, 0x139C2, 0x13884, 0x13745, 0x13606, 0x134C5, 0x13384, 0x13242, + 0x130FF, 0x12FBC, 0x12E78, 0x12D33, 0x12BEE, 0x12AA7, 0x12960, 0x12819, + 0x126D0, 0x12587, 0x1243D, 0x122F3, 0x121A8, 0x1205C, 0x11F0F, 0x11DC2, + 0x11C74, 0x11B25, 0x119D6, 0x11886, 0x11735, 0x115E3, 0x11491, 0x1133F, + 0x111EB, 0x11097, 0x10F42, 0x10DED, 0x10C97, 0x10B40, 0x109E9, 0x10891, + 0x10738, 0x105DF, 0x10485, 0x1032B, 0x101D0, 0x10074, 0x0FF18, 0x0FDBB, + 0x0FC5D, 0x0FAFF, 0x0F9A0, 0x0F841, 0x0F6E1, 0x0F580, 0x0F41F, 0x0F2BD, + 0x0F15B, 0x0EFF8, 0x0EE94, 0x0ED30, 0x0EBCC, 0x0EA67, 0x0E901, 0x0E79A, + 0x0E633, 0x0E4CC, 0x0E364, 0x0E1FB, 0x0E092, 0x0DF29, 0x0DDBE, 0x0DC54, + 0x0DAE9, 0x0D97D, 0x0D810, 0x0D6A4, 0x0D536, 0x0D3C8, 0x0D25A, 0x0D0EB, + 0x0CF7C, 0x0CE0C, 0x0CC9C, 0x0CB2B, 0x0C9B9, 0x0C847, 0x0C6D5, 0x0C562, + 0x0C3EF, 0x0C27B, 0x0C107, 0x0BF92, 0x0BE1D, 0x0BCA8, 0x0BB32, 0x0B9BB, + 0x0B844, 0x0B6CD, 0x0B555, 0x0B3DD, 0x0B264, 0x0B0EB, 0x0AF71, 0x0ADF7, + 0x0AC7D, 0x0AB02, 0x0A987, 0x0A80B, 0x0A68F, 0x0A513, 0x0A396, 0x0A219, + 0x0A09B, 0x09F1D, 0x09D9E, 0x09C20, 0x09AA1, 0x09921, 0x097A1, 0x09621, + 0x094A0, 0x0931F, 0x0919E, 0x0901C, 0x08E9A, 0x08D18, 0x08B95, 0x08A12, + 0x0888F, 0x0870B, 0x08587, 0x08402, 0x0827E, 0x080F9, 0x07F73, 0x07DEE, + 0x07C68, 0x07AE2, 0x0795B, 0x077D4, 0x0764D, 0x074C6, 0x0733E, 0x071B6, + 0x0702E, 0x06EA6, 0x06D1D, 0x06B94, 0x06A0B, 0x06881, 0x066F7, 0x0656D, + 0x063E3, 0x06258, 0x060CE, 0x05F43, 0x05DB7, 0x05C2C, 0x05AA0, 0x05914, + 0x05788, 0x055FC, 0x0546F, 0x052E3, 0x05156, 0x04FC9, 0x04E3B, 0x04CAE, + 0x04B20, 0x04992, 0x04804, 0x04676, 0x044E8, 0x04359, 0x041CB, 0x0403C, + 0x03EAD, 0x03D1D, 0x03B8E, 0x039FF, 0x0386F, 0x036DF, 0x0354F, 0x033BF, + 0x0322F, 0x0309F, 0x02F0F, 0x02D7E, 0x02BEE, 0x02A5D, 0x028CC, 0x0273B, + 0x025AA, 0x02419, 0x02288, 0x020F7, 0x01F65, 0x01DD4, 0x01C43, 0x01AB1, + 0x0191F, 0x0178E, 0x015FC, 0x0146A, 0x012D8, 0x01147, 0x00FB5, 0x00E23, + 0x00C91, 0x00AFF, 0x0096D, 0x007DB, 0x00648, 0x004B6, 0x00324, 0x00192}; + + acc200_reg_write(d, HWPfFftRamPageAccess, ACC200_FFT_RAM_EN + 64); + for (i = 0; i < ACC200_FFT_RAM_SIZE; i++) + acc200_reg_write(d, HWPfFftRamOff + i * 4, fft_lut[i]); + acc200_reg_write(d, HWPfFftRamPageAccess, ACC200_FFT_RAM_DIS); + + /* Enabling AQueues through the Queue hierarchy*/ + for (vf_idx = 0; vf_idx < ACC200_NUM_VFS; vf_idx++) { + for (qg_idx = 0; qg_idx < ACC200_NUM_QGRPS; qg_idx++) { + value = 0; + if (vf_idx < conf->num_vf_bundles && + qg_idx < totalQgs) + value = (1 << aqNum(qg_idx, conf)) - 1; + address = HWPfQmgrAqEnableVf + + vf_idx * ACC200_BYTES_IN_WORD; + value += (qg_idx << 16); + acc200_reg_write(d, address, value); + } + } + + rte_bbdev_log_debug("PF Tip configuration complete for %s", dev_name); + return 0; +} diff --git a/drivers/baseband/acc200/version.map b/drivers/baseband/acc200/version.map index c2e0723..9542f2b 100644 --- a/drivers/baseband/acc200/version.map +++ b/drivers/baseband/acc200/version.map @@ -1,3 +1,10 @@ DPDK_22 { local: *; }; + +EXPERIMENTAL { + global: + + rte_acc200_configure; + +}; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru ` (9 preceding siblings ...) 2022-07-08 0:01 ` [PATCH v1 10/10] baseband/acc200: add PF configure companion function Nicolas Chautru @ 2022-07-12 13:48 ` Maxime Coquelin 2022-07-14 18:49 ` Vargas, Hernan 2022-08-30 7:44 ` Maxime Coquelin 10 siblings, 2 replies; 50+ messages in thread From: Maxime Coquelin @ 2022-07-12 13:48 UTC (permalink / raw) To: Nicolas Chautru, dev, thomas, gakhil, hemant.agrawal, trix, Vargas, Hernan Cc: mdr, bruce.richardson, david.marchand, stephen Hi Nicolas, Hernan, (Adding Hernan in the recipients list) On 7/8/22 02:01, Nicolas Chautru wrote: > This is targeting 22.11 and includes the PMD for the > integrated accelerator on Intel Xeon SPR-EEC. > There is a dependency on that parallel serie still in-flight > which extends the bbdev api https://patches.dpdk.org/project/dpdk/list/?series=23894 > > I will be offline for a few weeks for the summer break but > Hernan will cover for me during that time if required. > > Thanks > Nic > > Nicolas Chautru (10): > baseband/acc200: introduce PMD for ACC200 > baseband/acc200: add HW register definitions > baseband/acc200: add info get function > baseband/acc200: add queue configuration > baseband/acc200: add LDPC processing functions > baseband/acc200: add LTE processing functions > baseband/acc200: add support for FFT operations > baseband/acc200: support interrupt > baseband/acc200: add device status and vf2pf comms > baseband/acc200: add PF configure companion function > > MAINTAINERS | 3 + > app/test-bbdev/meson.build | 3 + > app/test-bbdev/test_bbdev_perf.c | 76 + > doc/guides/bbdevs/acc200.rst | 244 ++ > doc/guides/bbdevs/index.rst | 1 + > drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ > drivers/baseband/acc200/acc200_pmd.h | 690 ++++ > drivers/baseband/acc200/acc200_vf_enum.h | 89 + > drivers/baseband/acc200/meson.build | 8 + > drivers/baseband/acc200/rte_acc200_cfg.h | 115 + > drivers/baseband/acc200/rte_acc200_pmd.c | 5403 ++++++++++++++++++++++++++++++ > drivers/baseband/acc200/version.map | 10 + > drivers/baseband/meson.build | 1 + > 13 files changed, 7111 insertions(+) > create mode 100644 doc/guides/bbdevs/acc200.rst > create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h > create mode 100644 drivers/baseband/acc200/acc200_pmd.h > create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h > create mode 100644 drivers/baseband/acc200/meson.build > create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h > create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c > create mode 100644 drivers/baseband/acc200/version.map > Comparing ACC200 & ACC100 header files, I understand ACC200 is an evolution of the ACC10x family. The FEC bits are really close, ACC200 main addition seems to be FFT acceleration which could be handled in ACC10x driver based on device ID. I think both drivers have to be merged in order to avoid code duplication. That's how other families of devices (e.g. i40e) are handled. Thanks, Maxime ^ permalink raw reply [flat|nested] 50+ messages in thread
* RE: [PATCH v1 00/10] baseband/acc200 2022-07-12 13:48 ` [PATCH v1 00/10] baseband/acc200 Maxime Coquelin @ 2022-07-14 18:49 ` Vargas, Hernan 2022-07-17 13:08 ` Tom Rix 2022-08-30 7:44 ` Maxime Coquelin 1 sibling, 1 reply; 50+ messages in thread From: Vargas, Hernan @ 2022-07-14 18:49 UTC (permalink / raw) To: Maxime Coquelin, Chautru, Nicolas, dev, thomas, gakhil, hemant.agrawal, trix Cc: mdr, Richardson, Bruce, david.marchand, stephen Hi Tom, Maxime, Could you please review the v5 series that Nic submitted last week? https://patches.dpdk.org/project/dpdk/list/?series=23912 Thanks, Hernan -----Original Message----- From: Maxime Coquelin <maxime.coquelin@redhat.com> Sent: Tuesday, July 12, 2022 8:49 AM To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; david.marchand@redhat.com; stephen@networkplumber.org Subject: Re: [PATCH v1 00/10] baseband/acc200 Hi Nicolas, Hernan, (Adding Hernan in the recipients list) On 7/8/22 02:01, Nicolas Chautru wrote: > This is targeting 22.11 and includes the PMD for the integrated > accelerator on Intel Xeon SPR-EEC. > There is a dependency on that parallel serie still in-flight which > extends the bbdev api > https://patches.dpdk.org/project/dpdk/list/?series=23894 > > I will be offline for a few weeks for the summer break but Hernan will > cover for me during that time if required. > > Thanks > Nic > > Nicolas Chautru (10): > baseband/acc200: introduce PMD for ACC200 > baseband/acc200: add HW register definitions > baseband/acc200: add info get function > baseband/acc200: add queue configuration > baseband/acc200: add LDPC processing functions > baseband/acc200: add LTE processing functions > baseband/acc200: add support for FFT operations > baseband/acc200: support interrupt > baseband/acc200: add device status and vf2pf comms > baseband/acc200: add PF configure companion function > > MAINTAINERS | 3 + > app/test-bbdev/meson.build | 3 + > app/test-bbdev/test_bbdev_perf.c | 76 + > doc/guides/bbdevs/acc200.rst | 244 ++ > doc/guides/bbdevs/index.rst | 1 + > drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ > drivers/baseband/acc200/acc200_pmd.h | 690 ++++ > drivers/baseband/acc200/acc200_vf_enum.h | 89 + > drivers/baseband/acc200/meson.build | 8 + > drivers/baseband/acc200/rte_acc200_cfg.h | 115 + > drivers/baseband/acc200/rte_acc200_pmd.c | 5403 ++++++++++++++++++++++++++++++ > drivers/baseband/acc200/version.map | 10 + > drivers/baseband/meson.build | 1 + > 13 files changed, 7111 insertions(+) > create mode 100644 doc/guides/bbdevs/acc200.rst > create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h > create mode 100644 drivers/baseband/acc200/acc200_pmd.h > create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h > create mode 100644 drivers/baseband/acc200/meson.build > create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h > create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c > create mode 100644 drivers/baseband/acc200/version.map > Comparing ACC200 & ACC100 header files, I understand ACC200 is an evolution of the ACC10x family. The FEC bits are really close, ACC200 main addition seems to be FFT acceleration which could be handled in ACC10x driver based on device ID. I think both drivers have to be merged in order to avoid code duplication. That's how other families of devices (e.g. i40e) are handled. Thanks, Maxime ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-07-14 18:49 ` Vargas, Hernan @ 2022-07-17 13:08 ` Tom Rix 2022-07-22 18:29 ` Vargas, Hernan 0 siblings, 1 reply; 50+ messages in thread From: Tom Rix @ 2022-07-17 13:08 UTC (permalink / raw) To: Vargas, Hernan, Maxime Coquelin, Chautru, Nicolas, dev, thomas, gakhil, hemant.agrawal Cc: mdr, Richardson, Bruce, david.marchand, stephen On 7/14/22 11:49 AM, Vargas, Hernan wrote: > Hi Tom, Maxime, > > Could you please review the v5 series that Nic submitted last week? > https://patches.dpdk.org/project/dpdk/list/?series=23912 > > Thanks, > Hernan Hernan, For this patch series for the acc200, will you be able to refactor it so acc has a common base ? Or will this be on hold until Nic is back ? Tom > > > -----Original Message----- > From: Maxime Coquelin <maxime.coquelin@redhat.com> > Sent: Tuesday, July 12, 2022 8:49 AM > To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> > Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; david.marchand@redhat.com; stephen@networkplumber.org > Subject: Re: [PATCH v1 00/10] baseband/acc200 > > Hi Nicolas, Hernan, > > (Adding Hernan in the recipients list) > > On 7/8/22 02:01, Nicolas Chautru wrote: >> This is targeting 22.11 and includes the PMD for the integrated >> accelerator on Intel Xeon SPR-EEC. >> There is a dependency on that parallel serie still in-flight which >> extends the bbdev api >> https://patches.dpdk.org/project/dpdk/list/?series=23894 >> >> I will be offline for a few weeks for the summer break but Hernan will >> cover for me during that time if required. >> >> Thanks >> Nic >> >> Nicolas Chautru (10): >> baseband/acc200: introduce PMD for ACC200 >> baseband/acc200: add HW register definitions >> baseband/acc200: add info get function >> baseband/acc200: add queue configuration >> baseband/acc200: add LDPC processing functions >> baseband/acc200: add LTE processing functions >> baseband/acc200: add support for FFT operations >> baseband/acc200: support interrupt >> baseband/acc200: add device status and vf2pf comms >> baseband/acc200: add PF configure companion function >> >> MAINTAINERS | 3 + >> app/test-bbdev/meson.build | 3 + >> app/test-bbdev/test_bbdev_perf.c | 76 + >> doc/guides/bbdevs/acc200.rst | 244 ++ >> doc/guides/bbdevs/index.rst | 1 + >> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ >> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ >> drivers/baseband/acc200/acc200_vf_enum.h | 89 + >> drivers/baseband/acc200/meson.build | 8 + >> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + >> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 ++++++++++++++++++++++++++++++ >> drivers/baseband/acc200/version.map | 10 + >> drivers/baseband/meson.build | 1 + >> 13 files changed, 7111 insertions(+) >> create mode 100644 doc/guides/bbdevs/acc200.rst >> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h >> create mode 100644 drivers/baseband/acc200/acc200_pmd.h >> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h >> create mode 100644 drivers/baseband/acc200/meson.build >> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h >> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c >> create mode 100644 drivers/baseband/acc200/version.map >> > Comparing ACC200 & ACC100 header files, I understand ACC200 is an evolution of the ACC10x family. The FEC bits are really close, ACC200 main addition seems to be FFT acceleration which could be handled in ACC10x driver based on device ID. > > I think both drivers have to be merged in order to avoid code duplication. That's how other families of devices (e.g. i40e) are handled. > > Thanks, > Maxime > ^ permalink raw reply [flat|nested] 50+ messages in thread
* RE: [PATCH v1 00/10] baseband/acc200 2022-07-17 13:08 ` Tom Rix @ 2022-07-22 18:29 ` Vargas, Hernan 2022-07-22 20:19 ` Tom Rix 0 siblings, 1 reply; 50+ messages in thread From: Vargas, Hernan @ 2022-07-22 18:29 UTC (permalink / raw) To: Tom Rix, Maxime Coquelin, Chautru, Nicolas, dev, thomas, gakhil, hemant.agrawal Cc: mdr, Richardson, Bruce, david.marchand, stephen Hi Tom, The patch series for the ACC200 can wait until Nic's back. Our priority are the changes for the bbdev API here: https://patches.dpdk.org/project/dpdk/list/?series=23912 Thanks, Hernan -----Original Message----- From: Tom Rix <trix@redhat.com> Sent: Sunday, July 17, 2022 8:08 AM To: Vargas, Hernan <hernan.vargas@intel.com>; Maxime Coquelin <maxime.coquelin@redhat.com>; Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; david.marchand@redhat.com; stephen@networkplumber.org Subject: Re: [PATCH v1 00/10] baseband/acc200 On 7/14/22 11:49 AM, Vargas, Hernan wrote: > Hi Tom, Maxime, > > Could you please review the v5 series that Nic submitted last week? > https://patches.dpdk.org/project/dpdk/list/?series=23912 > > Thanks, > Hernan Hernan, For this patch series for the acc200, will you be able to refactor it so acc has a common base ? Or will this be on hold until Nic is back ? Tom > > > -----Original Message----- > From: Maxime Coquelin <maxime.coquelin@redhat.com> > Sent: Tuesday, July 12, 2022 8:49 AM > To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; > thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; > trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> > Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > david.marchand@redhat.com; stephen@networkplumber.org > Subject: Re: [PATCH v1 00/10] baseband/acc200 > > Hi Nicolas, Hernan, > > (Adding Hernan in the recipients list) > > On 7/8/22 02:01, Nicolas Chautru wrote: >> This is targeting 22.11 and includes the PMD for the integrated >> accelerator on Intel Xeon SPR-EEC. >> There is a dependency on that parallel serie still in-flight which >> extends the bbdev api >> https://patches.dpdk.org/project/dpdk/list/?series=23894 >> >> I will be offline for a few weeks for the summer break but Hernan >> will cover for me during that time if required. >> >> Thanks >> Nic >> >> Nicolas Chautru (10): >> baseband/acc200: introduce PMD for ACC200 >> baseband/acc200: add HW register definitions >> baseband/acc200: add info get function >> baseband/acc200: add queue configuration >> baseband/acc200: add LDPC processing functions >> baseband/acc200: add LTE processing functions >> baseband/acc200: add support for FFT operations >> baseband/acc200: support interrupt >> baseband/acc200: add device status and vf2pf comms >> baseband/acc200: add PF configure companion function >> >> MAINTAINERS | 3 + >> app/test-bbdev/meson.build | 3 + >> app/test-bbdev/test_bbdev_perf.c | 76 + >> doc/guides/bbdevs/acc200.rst | 244 ++ >> doc/guides/bbdevs/index.rst | 1 + >> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ >> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ >> drivers/baseband/acc200/acc200_vf_enum.h | 89 + >> drivers/baseband/acc200/meson.build | 8 + >> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + >> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 ++++++++++++++++++++++++++++++ >> drivers/baseband/acc200/version.map | 10 + >> drivers/baseband/meson.build | 1 + >> 13 files changed, 7111 insertions(+) >> create mode 100644 doc/guides/bbdevs/acc200.rst >> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h >> create mode 100644 drivers/baseband/acc200/acc200_pmd.h >> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h >> create mode 100644 drivers/baseband/acc200/meson.build >> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h >> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c >> create mode 100644 drivers/baseband/acc200/version.map >> > Comparing ACC200 & ACC100 header files, I understand ACC200 is an evolution of the ACC10x family. The FEC bits are really close, ACC200 main addition seems to be FFT acceleration which could be handled in ACC10x driver based on device ID. > > I think both drivers have to be merged in order to avoid code duplication. That's how other families of devices (e.g. i40e) are handled. > > Thanks, > Maxime > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-07-22 18:29 ` Vargas, Hernan @ 2022-07-22 20:19 ` Tom Rix 2022-08-15 17:52 ` Chautru, Nicolas 0 siblings, 1 reply; 50+ messages in thread From: Tom Rix @ 2022-07-22 20:19 UTC (permalink / raw) To: Vargas, Hernan, Maxime Coquelin, Chautru, Nicolas, dev, thomas, gakhil, hemant.agrawal Cc: mdr, Richardson, Bruce, david.marchand, stephen Hernan The changes I requested in v4, were not addressed in v5. Can you make these changes for v6? Tom On 7/22/22 11:29 AM, Vargas, Hernan wrote: > Hi Tom, > > The patch series for the ACC200 can wait until Nic's back. > Our priority are the changes for the bbdev API here: https://patches.dpdk.org/project/dpdk/list/?series=23912 > > Thanks, > Hernan > > -----Original Message----- > From: Tom Rix <trix@redhat.com> > Sent: Sunday, July 17, 2022 8:08 AM > To: Vargas, Hernan <hernan.vargas@intel.com>; Maxime Coquelin <maxime.coquelin@redhat.com>; Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com > Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; david.marchand@redhat.com; stephen@networkplumber.org > Subject: Re: [PATCH v1 00/10] baseband/acc200 > > > On 7/14/22 11:49 AM, Vargas, Hernan wrote: >> Hi Tom, Maxime, >> >> Could you please review the v5 series that Nic submitted last week? >> https://patches.dpdk.org/project/dpdk/list/?series=23912 >> >> Thanks, >> Hernan > Hernan, > > For this patch series for the acc200, will you be able to refactor it so acc has a common base ? > > Or will this be on hold until Nic is back ? > > Tom > >> >> -----Original Message----- >> From: Maxime Coquelin <maxime.coquelin@redhat.com> >> Sent: Tuesday, July 12, 2022 8:49 AM >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; >> thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; >> trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> >> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; >> david.marchand@redhat.com; stephen@networkplumber.org >> Subject: Re: [PATCH v1 00/10] baseband/acc200 >> >> Hi Nicolas, Hernan, >> >> (Adding Hernan in the recipients list) >> >> On 7/8/22 02:01, Nicolas Chautru wrote: >>> This is targeting 22.11 and includes the PMD for the integrated >>> accelerator on Intel Xeon SPR-EEC. >>> There is a dependency on that parallel serie still in-flight which >>> extends the bbdev api >>> https://patches.dpdk.org/project/dpdk/list/?series=23894 >>> >>> I will be offline for a few weeks for the summer break but Hernan >>> will cover for me during that time if required. >>> >>> Thanks >>> Nic >>> >>> Nicolas Chautru (10): >>> baseband/acc200: introduce PMD for ACC200 >>> baseband/acc200: add HW register definitions >>> baseband/acc200: add info get function >>> baseband/acc200: add queue configuration >>> baseband/acc200: add LDPC processing functions >>> baseband/acc200: add LTE processing functions >>> baseband/acc200: add support for FFT operations >>> baseband/acc200: support interrupt >>> baseband/acc200: add device status and vf2pf comms >>> baseband/acc200: add PF configure companion function >>> >>> MAINTAINERS | 3 + >>> app/test-bbdev/meson.build | 3 + >>> app/test-bbdev/test_bbdev_perf.c | 76 + >>> doc/guides/bbdevs/acc200.rst | 244 ++ >>> doc/guides/bbdevs/index.rst | 1 + >>> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ >>> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ >>> drivers/baseband/acc200/acc200_vf_enum.h | 89 + >>> drivers/baseband/acc200/meson.build | 8 + >>> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + >>> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 ++++++++++++++++++++++++++++++ >>> drivers/baseband/acc200/version.map | 10 + >>> drivers/baseband/meson.build | 1 + >>> 13 files changed, 7111 insertions(+) >>> create mode 100644 doc/guides/bbdevs/acc200.rst >>> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h >>> create mode 100644 drivers/baseband/acc200/acc200_pmd.h >>> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h >>> create mode 100644 drivers/baseband/acc200/meson.build >>> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h >>> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c >>> create mode 100644 drivers/baseband/acc200/version.map >>> >> Comparing ACC200 & ACC100 header files, I understand ACC200 is an evolution of the ACC10x family. The FEC bits are really close, ACC200 main addition seems to be FFT acceleration which could be handled in ACC10x driver based on device ID. >> >> I think both drivers have to be merged in order to avoid code duplication. That's how other families of devices (e.g. i40e) are handled. >> >> Thanks, >> Maxime >> ^ permalink raw reply [flat|nested] 50+ messages in thread
* RE: [PATCH v1 00/10] baseband/acc200 2022-07-22 20:19 ` Tom Rix @ 2022-08-15 17:52 ` Chautru, Nicolas 0 siblings, 0 replies; 50+ messages in thread From: Chautru, Nicolas @ 2022-08-15 17:52 UTC (permalink / raw) To: Tom Rix, Vargas, Hernan, Maxime Coquelin, dev, thomas, gakhil, hemant.agrawal Cc: mdr, Richardson, Bruce, david.marchand, stephen Hi Tom, I had answered all of your comments from v4 before I went on time off. Let me know if any concern acking that v5, thanks Nic > -----Original Message----- > From: Tom Rix <trix@redhat.com> > Sent: Friday, July 22, 2022 1:20 PM > To: Vargas, Hernan <hernan.vargas@intel.com>; Maxime Coquelin > <maxime.coquelin@redhat.com>; Chautru, Nicolas > <nicolas.chautru@intel.com>; dev@dpdk.org; thomas@monjalon.net; > gakhil@marvell.com; hemant.agrawal@nxp.com > Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > david.marchand@redhat.com; stephen@networkplumber.org > Subject: Re: [PATCH v1 00/10] baseband/acc200 > > Hernan > > The changes I requested in v4, were not addressed in v5. > > Can you make these changes for v6? > > Tom > > On 7/22/22 11:29 AM, Vargas, Hernan wrote: > > Hi Tom, > > > > The patch series for the ACC200 can wait until Nic's back. > > Our priority are the changes for the bbdev API here: > > https://patches.dpdk.org/project/dpdk/list/?series=23912 > > > > Thanks, > > Hernan > > > > -----Original Message----- > > From: Tom Rix <trix@redhat.com> > > Sent: Sunday, July 17, 2022 8:08 AM > > To: Vargas, Hernan <hernan.vargas@intel.com>; Maxime Coquelin > > <maxime.coquelin@redhat.com>; Chautru, Nicolas > > <nicolas.chautru@intel.com>; dev@dpdk.org; thomas@monjalon.net; > > gakhil@marvell.com; hemant.agrawal@nxp.com > > Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > > david.marchand@redhat.com; stephen@networkplumber.org > > Subject: Re: [PATCH v1 00/10] baseband/acc200 > > > > > > On 7/14/22 11:49 AM, Vargas, Hernan wrote: > >> Hi Tom, Maxime, > >> > >> Could you please review the v5 series that Nic submitted last week? > >> https://patches.dpdk.org/project/dpdk/list/?series=23912 > >> > >> Thanks, > >> Hernan > > Hernan, > > > > For this patch series for the acc200, will you be able to refactor it so acc > has a common base ? > > > > Or will this be on hold until Nic is back ? > > > > Tom > > > >> > >> -----Original Message----- > >> From: Maxime Coquelin <maxime.coquelin@redhat.com> > >> Sent: Tuesday, July 12, 2022 8:49 AM > >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; > >> thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; > >> trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> > >> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > >> david.marchand@redhat.com; stephen@networkplumber.org > >> Subject: Re: [PATCH v1 00/10] baseband/acc200 > >> > >> Hi Nicolas, Hernan, > >> > >> (Adding Hernan in the recipients list) > >> > >> On 7/8/22 02:01, Nicolas Chautru wrote: > >>> This is targeting 22.11 and includes the PMD for the integrated > >>> accelerator on Intel Xeon SPR-EEC. > >>> There is a dependency on that parallel serie still in-flight which > >>> extends the bbdev api > >>> https://patches.dpdk.org/project/dpdk/list/?series=23894 > >>> > >>> I will be offline for a few weeks for the summer break but Hernan > >>> will cover for me during that time if required. > >>> > >>> Thanks > >>> Nic > >>> > >>> Nicolas Chautru (10): > >>> baseband/acc200: introduce PMD for ACC200 > >>> baseband/acc200: add HW register definitions > >>> baseband/acc200: add info get function > >>> baseband/acc200: add queue configuration > >>> baseband/acc200: add LDPC processing functions > >>> baseband/acc200: add LTE processing functions > >>> baseband/acc200: add support for FFT operations > >>> baseband/acc200: support interrupt > >>> baseband/acc200: add device status and vf2pf comms > >>> baseband/acc200: add PF configure companion function > >>> > >>> MAINTAINERS | 3 + > >>> app/test-bbdev/meson.build | 3 + > >>> app/test-bbdev/test_bbdev_perf.c | 76 + > >>> doc/guides/bbdevs/acc200.rst | 244 ++ > >>> doc/guides/bbdevs/index.rst | 1 + > >>> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ > >>> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ > >>> drivers/baseband/acc200/acc200_vf_enum.h | 89 + > >>> drivers/baseband/acc200/meson.build | 8 + > >>> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + > >>> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 > ++++++++++++++++++++++++++++++ > >>> drivers/baseband/acc200/version.map | 10 + > >>> drivers/baseband/meson.build | 1 + > >>> 13 files changed, 7111 insertions(+) > >>> create mode 100644 doc/guides/bbdevs/acc200.rst > >>> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h > >>> create mode 100644 drivers/baseband/acc200/acc200_pmd.h > >>> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h > >>> create mode 100644 drivers/baseband/acc200/meson.build > >>> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h > >>> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c > >>> create mode 100644 drivers/baseband/acc200/version.map > >>> > >> Comparing ACC200 & ACC100 header files, I understand ACC200 is an > evolution of the ACC10x family. The FEC bits are really close, ACC200 main > addition seems to be FFT acceleration which could be handled in ACC10x > driver based on device ID. > >> > >> I think both drivers have to be merged in order to avoid code duplication. > That's how other families of devices (e.g. i40e) are handled. > >> > >> Thanks, > >> Maxime > >> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-07-12 13:48 ` [PATCH v1 00/10] baseband/acc200 Maxime Coquelin 2022-07-14 18:49 ` Vargas, Hernan @ 2022-08-30 7:44 ` Maxime Coquelin 2022-08-30 19:45 ` Chautru, Nicolas 1 sibling, 1 reply; 50+ messages in thread From: Maxime Coquelin @ 2022-08-30 7:44 UTC (permalink / raw) To: Nicolas Chautru, dev, thomas, gakhil, hemant.agrawal, trix, Vargas, Hernan Cc: mdr, bruce.richardson, david.marchand, stephen Hi Nicolas, On 7/12/22 15:48, Maxime Coquelin wrote: > Hi Nicolas, Hernan, > > (Adding Hernan in the recipients list) > > On 7/8/22 02:01, Nicolas Chautru wrote: >> This is targeting 22.11 and includes the PMD for the >> integrated accelerator on Intel Xeon SPR-EEC. >> There is a dependency on that parallel serie still in-flight >> which extends the bbdev api >> https://patches.dpdk.org/project/dpdk/list/?series=23894 >> >> I will be offline for a few weeks for the summer break but >> Hernan will cover for me during that time if required. >> >> Thanks >> Nic >> >> Nicolas Chautru (10): >> baseband/acc200: introduce PMD for ACC200 >> baseband/acc200: add HW register definitions >> baseband/acc200: add info get function >> baseband/acc200: add queue configuration >> baseband/acc200: add LDPC processing functions >> baseband/acc200: add LTE processing functions >> baseband/acc200: add support for FFT operations >> baseband/acc200: support interrupt >> baseband/acc200: add device status and vf2pf comms >> baseband/acc200: add PF configure companion function >> >> MAINTAINERS | 3 + >> app/test-bbdev/meson.build | 3 + >> app/test-bbdev/test_bbdev_perf.c | 76 + >> doc/guides/bbdevs/acc200.rst | 244 ++ >> doc/guides/bbdevs/index.rst | 1 + >> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ >> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ >> drivers/baseband/acc200/acc200_vf_enum.h | 89 + >> drivers/baseband/acc200/meson.build | 8 + >> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + >> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 >> ++++++++++++++++++++++++++++++ >> drivers/baseband/acc200/version.map | 10 + >> drivers/baseband/meson.build | 1 + >> 13 files changed, 7111 insertions(+) >> create mode 100644 doc/guides/bbdevs/acc200.rst >> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h >> create mode 100644 drivers/baseband/acc200/acc200_pmd.h >> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h >> create mode 100644 drivers/baseband/acc200/meson.build >> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h >> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c >> create mode 100644 drivers/baseband/acc200/version.map >> > > Comparing ACC200 & ACC100 header files, I understand ACC200 is an > evolution of the ACC10x family. The FEC bits are really close, ACC200 > main addition seems to be FFT acceleration which could be handled in > ACC10x driver based on device ID. > > I think both drivers have to be merged in order to avoid code > duplication. That's how other families of devices (e.g. i40e) are > handled. I haven't seen your reply on this point. Do you confirm you are working on a single driver for ACC family in order to avoid code duplication? Maxime > Thanks, > Maxime ^ permalink raw reply [flat|nested] 50+ messages in thread
* RE: [PATCH v1 00/10] baseband/acc200 2022-08-30 7:44 ` Maxime Coquelin @ 2022-08-30 19:45 ` Chautru, Nicolas 2022-08-31 16:43 ` Maxime Coquelin 2022-08-31 19:26 ` Tom Rix 0 siblings, 2 replies; 50+ messages in thread From: Chautru, Nicolas @ 2022-08-30 19:45 UTC (permalink / raw) To: Maxime Coquelin, dev, thomas, gakhil, hemant.agrawal, trix, Vargas, Hernan Cc: mdr, Richardson, Bruce, david.marchand, stephen Hi Maxime, > -----Original Message----- > From: Maxime Coquelin <maxime.coquelin@redhat.com> > Sent: Tuesday, August 30, 2022 12:45 AM > To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; > thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; > trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> > Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > david.marchand@redhat.com; stephen@networkplumber.org > Subject: Re: [PATCH v1 00/10] baseband/acc200 > > Hi Nicolas, > > On 7/12/22 15:48, Maxime Coquelin wrote: > > Hi Nicolas, Hernan, > > > > (Adding Hernan in the recipients list) > > > > On 7/8/22 02:01, Nicolas Chautru wrote: > >> This is targeting 22.11 and includes the PMD for the integrated > >> accelerator on Intel Xeon SPR-EEC. > >> There is a dependency on that parallel serie still in-flight which > >> extends the bbdev api > >> https://patches.dpdk.org/project/dpdk/list/?series=23894 > >> > >> I will be offline for a few weeks for the summer break but Hernan > >> will cover for me during that time if required. > >> > >> Thanks > >> Nic > >> > >> Nicolas Chautru (10): > >> baseband/acc200: introduce PMD for ACC200 > >> baseband/acc200: add HW register definitions > >> baseband/acc200: add info get function > >> baseband/acc200: add queue configuration > >> baseband/acc200: add LDPC processing functions > >> baseband/acc200: add LTE processing functions > >> baseband/acc200: add support for FFT operations > >> baseband/acc200: support interrupt > >> baseband/acc200: add device status and vf2pf comms > >> baseband/acc200: add PF configure companion function > >> > >> MAINTAINERS | 3 + > >> app/test-bbdev/meson.build | 3 + > >> app/test-bbdev/test_bbdev_perf.c | 76 + > >> doc/guides/bbdevs/acc200.rst | 244 ++ > >> doc/guides/bbdevs/index.rst | 1 + > >> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ > >> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ > >> drivers/baseband/acc200/acc200_vf_enum.h | 89 + > >> drivers/baseband/acc200/meson.build | 8 + > >> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + > >> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 > >> ++++++++++++++++++++++++++++++ > >> drivers/baseband/acc200/version.map | 10 + > >> drivers/baseband/meson.build | 1 + > >> 13 files changed, 7111 insertions(+) > >> create mode 100644 doc/guides/bbdevs/acc200.rst > >> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h > >> create mode 100644 drivers/baseband/acc200/acc200_pmd.h > >> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h > >> create mode 100644 drivers/baseband/acc200/meson.build > >> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h > >> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c > >> create mode 100644 drivers/baseband/acc200/version.map > >> > > > > Comparing ACC200 & ACC100 header files, I understand ACC200 is an > > evolution of the ACC10x family. The FEC bits are really close, ACC200 > > main addition seems to be FFT acceleration which could be handled in > > ACC10x driver based on device ID. > > > > I think both drivers have to be merged in order to avoid code > > duplication. That's how other families of devices (e.g. i40e) are > > handled. > > I haven't seen your reply on this point. > Do you confirm you are working on a single driver for ACC family in order to > avoid code duplication? > The implementation is based on distinct ACC100 and ACC200 drivers. The 2 devices are fundamentally different generation, processes and IP. MountBryce is an eASIC device over PCIe while ACC200 is an integrated accelerator on Xeon CPU. The actual implementation are not the same, underlying IP are all distinct even if many of the descriptor format have similarities. The actual capabilities of the acceleration are different and/or new. The workaround and silicon errata are also different causing different limitation and implementation in the driver (see the serie with ongoing changes for ACC100 in parallel). This is fundamentally distinct from ACC101 which was a derivative product from ACC100 and where it made sense to share implementation between ACC100 and ACC101. So in a nutshell these 2 devices and drivers are 2 different beasts and the intention is to keep them intentionally separate as in the serie. Let me know if unclear, thanks! Thanks Nic > Maxime > > > Thanks, > > Maxime ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-08-30 19:45 ` Chautru, Nicolas @ 2022-08-31 16:43 ` Maxime Coquelin 2022-08-31 19:20 ` Thomas Monjalon 2022-08-31 19:26 ` Tom Rix 1 sibling, 1 reply; 50+ messages in thread From: Maxime Coquelin @ 2022-08-31 16:43 UTC (permalink / raw) To: Chautru, Nicolas, dev, thomas, gakhil, hemant.agrawal, trix, Vargas, Hernan Cc: mdr, Richardson, Bruce, david.marchand, stephen Hello Nicolas, On 8/30/22 21:45, Chautru, Nicolas wrote: > Hi Maxime, > >> -----Original Message----- >> From: Maxime Coquelin <maxime.coquelin@redhat.com> >> Sent: Tuesday, August 30, 2022 12:45 AM >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; >> thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; >> trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> >> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; >> david.marchand@redhat.com; stephen@networkplumber.org >> Subject: Re: [PATCH v1 00/10] baseband/acc200 >> >> Hi Nicolas, >> >> On 7/12/22 15:48, Maxime Coquelin wrote: >>> Hi Nicolas, Hernan, >>> >>> (Adding Hernan in the recipients list) >>> >>> On 7/8/22 02:01, Nicolas Chautru wrote: >>>> This is targeting 22.11 and includes the PMD for the integrated >>>> accelerator on Intel Xeon SPR-EEC. >>>> There is a dependency on that parallel serie still in-flight which >>>> extends the bbdev api >>>> https://patches.dpdk.org/project/dpdk/list/?series=23894 >>>> >>>> I will be offline for a few weeks for the summer break but Hernan >>>> will cover for me during that time if required. >>>> >>>> Thanks >>>> Nic >>>> >>>> Nicolas Chautru (10): >>>> baseband/acc200: introduce PMD for ACC200 >>>> baseband/acc200: add HW register definitions >>>> baseband/acc200: add info get function >>>> baseband/acc200: add queue configuration >>>> baseband/acc200: add LDPC processing functions >>>> baseband/acc200: add LTE processing functions >>>> baseband/acc200: add support for FFT operations >>>> baseband/acc200: support interrupt >>>> baseband/acc200: add device status and vf2pf comms >>>> baseband/acc200: add PF configure companion function >>>> >>>> MAINTAINERS | 3 + >>>> app/test-bbdev/meson.build | 3 + >>>> app/test-bbdev/test_bbdev_perf.c | 76 + >>>> doc/guides/bbdevs/acc200.rst | 244 ++ >>>> doc/guides/bbdevs/index.rst | 1 + >>>> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ >>>> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ >>>> drivers/baseband/acc200/acc200_vf_enum.h | 89 + >>>> drivers/baseband/acc200/meson.build | 8 + >>>> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + >>>> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 >>>> ++++++++++++++++++++++++++++++ >>>> drivers/baseband/acc200/version.map | 10 + >>>> drivers/baseband/meson.build | 1 + >>>> 13 files changed, 7111 insertions(+) >>>> create mode 100644 doc/guides/bbdevs/acc200.rst >>>> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h >>>> create mode 100644 drivers/baseband/acc200/acc200_pmd.h >>>> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h >>>> create mode 100644 drivers/baseband/acc200/meson.build >>>> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h >>>> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c >>>> create mode 100644 drivers/baseband/acc200/version.map >>>> >>> >>> Comparing ACC200 & ACC100 header files, I understand ACC200 is an >>> evolution of the ACC10x family. The FEC bits are really close, ACC200 >>> main addition seems to be FFT acceleration which could be handled in >>> ACC10x driver based on device ID. >>> >>> I think both drivers have to be merged in order to avoid code >>> duplication. That's how other families of devices (e.g. i40e) are >>> handled. >> >> I haven't seen your reply on this point. >> Do you confirm you are working on a single driver for ACC family in order to >> avoid code duplication? >> > > The implementation is based on distinct ACC100 and ACC200 drivers. The 2 devices are fundamentally different generation, processes and IP. > MountBryce is an eASIC device over PCIe while ACC200 is an integrated accelerator on Xeon CPU. The underlying technology does not matter much. For example we use same Virtio driver for SW emulated devices and fully HW offloaded ones. I have spent some time today comparing the drivers and what I can see is the ACC200 driver is a copy-paste of the ACC100, modulo FFT addition and other small changes that I think could be handled dynamically based on capabilities flags and device ID. > The actual implementation are not the same, underlying IP are all distinct even if many of the descriptor format have similarities. > The actual capabilities of the acceleration are different and/or new. New capabilities should be backed by device capabilities flags. > The workaround and silicon errata are also different causing different limitation and implementation in the driver (see the serie with ongoing changes for ACC100 in parallel). > This is fundamentally distinct from ACC101 which was a derivative product from ACC100 and where it made sense to share implementation between ACC100 and ACC101. > So in a nutshell these 2 devices and drivers are 2 different beasts and the intention is to keep them intentionally separate as in the serie. > Let me know if unclear, thanks! Thanks for the information. I still think it should be a single driver, I would appreciate a second opinion. Thomas, Bruce, Stephen, do you have time to have a look? Thanks, Maxime > Thanks > Nic > > >> Maxime >> >>> Thanks, >>> Maxime > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-08-31 16:43 ` Maxime Coquelin @ 2022-08-31 19:20 ` Thomas Monjalon 0 siblings, 0 replies; 50+ messages in thread From: Thomas Monjalon @ 2022-08-31 19:20 UTC (permalink / raw) To: Chautru, Nicolas, dev, gakhil, hemant.agrawal, trix, Vargas, Hernan, Maxime Coquelin Cc: mdr, Richardson, Bruce, david.marchand, stephen 31/08/2022 18:43, Maxime Coquelin: > Hello Nicolas, > > On 8/30/22 21:45, Chautru, Nicolas wrote: > > Hi Maxime, > > > >> -----Original Message----- > >> From: Maxime Coquelin <maxime.coquelin@redhat.com> > >> Sent: Tuesday, August 30, 2022 12:45 AM > >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; > >> thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; > >> trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> > >> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > >> david.marchand@redhat.com; stephen@networkplumber.org > >> Subject: Re: [PATCH v1 00/10] baseband/acc200 > >> > >> Hi Nicolas, > >> > >> On 7/12/22 15:48, Maxime Coquelin wrote: > >>> Hi Nicolas, Hernan, > >>> > >>> (Adding Hernan in the recipients list) > >>> > >>> On 7/8/22 02:01, Nicolas Chautru wrote: > >>>> This is targeting 22.11 and includes the PMD for the integrated > >>>> accelerator on Intel Xeon SPR-EEC. > >>>> There is a dependency on that parallel serie still in-flight which > >>>> extends the bbdev api > >>>> https://patches.dpdk.org/project/dpdk/list/?series=23894 > >>>> > >>>> I will be offline for a few weeks for the summer break but Hernan > >>>> will cover for me during that time if required. > >>>> > >>>> Thanks > >>>> Nic > >>>> > >>>> Nicolas Chautru (10): > >>>> baseband/acc200: introduce PMD for ACC200 > >>>> baseband/acc200: add HW register definitions > >>>> baseband/acc200: add info get function > >>>> baseband/acc200: add queue configuration > >>>> baseband/acc200: add LDPC processing functions > >>>> baseband/acc200: add LTE processing functions > >>>> baseband/acc200: add support for FFT operations > >>>> baseband/acc200: support interrupt > >>>> baseband/acc200: add device status and vf2pf comms > >>>> baseband/acc200: add PF configure companion function > >>>> > >>>> MAINTAINERS | 3 + > >>>> app/test-bbdev/meson.build | 3 + > >>>> app/test-bbdev/test_bbdev_perf.c | 76 + > >>>> doc/guides/bbdevs/acc200.rst | 244 ++ > >>>> doc/guides/bbdevs/index.rst | 1 + > >>>> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ > >>>> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ > >>>> drivers/baseband/acc200/acc200_vf_enum.h | 89 + > >>>> drivers/baseband/acc200/meson.build | 8 + > >>>> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + > >>>> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 > >>>> ++++++++++++++++++++++++++++++ > >>>> drivers/baseband/acc200/version.map | 10 + > >>>> drivers/baseband/meson.build | 1 + > >>>> 13 files changed, 7111 insertions(+) > >>>> create mode 100644 doc/guides/bbdevs/acc200.rst > >>>> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h > >>>> create mode 100644 drivers/baseband/acc200/acc200_pmd.h > >>>> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h > >>>> create mode 100644 drivers/baseband/acc200/meson.build > >>>> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h > >>>> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c > >>>> create mode 100644 drivers/baseband/acc200/version.map > >>>> > >>> > >>> Comparing ACC200 & ACC100 header files, I understand ACC200 is an > >>> evolution of the ACC10x family. The FEC bits are really close, ACC200 > >>> main addition seems to be FFT acceleration which could be handled in > >>> ACC10x driver based on device ID. > >>> > >>> I think both drivers have to be merged in order to avoid code > >>> duplication. That's how other families of devices (e.g. i40e) are > >>> handled. > >> > >> I haven't seen your reply on this point. > >> Do you confirm you are working on a single driver for ACC family in order to > >> avoid code duplication? > >> > > > > The implementation is based on distinct ACC100 and ACC200 drivers. The 2 devices are fundamentally different generation, processes and IP. > > MountBryce is an eASIC device over PCIe while ACC200 is an integrated accelerator on Xeon CPU. > > The underlying technology does not matter much. For example we use same > Virtio driver for SW emulated devices and fully HW offloaded ones. > > I have spent some time today comparing the drivers and what I can see is > the ACC200 driver is a copy-paste of the ACC100, modulo FFT addition and > other small changes that I think could be handled dynamically based on > capabilities flags and device ID. > > > The actual implementation are not the same, underlying IP are all distinct even if many of the descriptor format have similarities. > > The actual capabilities of the acceleration are different and/or new. > > New capabilities should be backed by device capabilities flags. > > > The workaround and silicon errata are also different causing different limitation and implementation in the driver (see the serie with ongoing changes for ACC100 in parallel). > > This is fundamentally distinct from ACC101 which was a derivative product from ACC100 and where it made sense to share implementation between ACC100 and ACC101. > > So in a nutshell these 2 devices and drivers are 2 different beasts and the intention is to keep them intentionally separate as in the serie. > > Let me know if unclear, thanks! > > Thanks for the information. > I still think it should be a single driver, I would appreciate a second > opinion. Thomas, Bruce, Stephen, do you have time to have a look? If most code is similar, it should be the same driver. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-08-30 19:45 ` Chautru, Nicolas 2022-08-31 16:43 ` Maxime Coquelin @ 2022-08-31 19:26 ` Tom Rix 2022-08-31 22:37 ` Chautru, Nicolas 1 sibling, 1 reply; 50+ messages in thread From: Tom Rix @ 2022-08-31 19:26 UTC (permalink / raw) To: Chautru, Nicolas, Maxime Coquelin, dev, thomas, gakhil, hemant.agrawal, Vargas, Hernan Cc: mdr, Richardson, Bruce, david.marchand, stephen On 8/30/22 12:45 PM, Chautru, Nicolas wrote: > Hi Maxime, > >> -----Original Message----- >> From: Maxime Coquelin <maxime.coquelin@redhat.com> >> Sent: Tuesday, August 30, 2022 12:45 AM >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; >> thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; >> trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> >> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; >> david.marchand@redhat.com; stephen@networkplumber.org >> Subject: Re: [PATCH v1 00/10] baseband/acc200 >> >> Hi Nicolas, >> >> On 7/12/22 15:48, Maxime Coquelin wrote: >>> Hi Nicolas, Hernan, >>> >>> (Adding Hernan in the recipients list) >>> >>> On 7/8/22 02:01, Nicolas Chautru wrote: >>>> This is targeting 22.11 and includes the PMD for the integrated >>>> accelerator on Intel Xeon SPR-EEC. >>>> There is a dependency on that parallel serie still in-flight which >>>> extends the bbdev api >>>> https://patches.dpdk.org/project/dpdk/list/?series=23894 >>>> >>>> I will be offline for a few weeks for the summer break but Hernan >>>> will cover for me during that time if required. >>>> >>>> Thanks >>>> Nic >>>> >>>> Nicolas Chautru (10): >>>> baseband/acc200: introduce PMD for ACC200 >>>> baseband/acc200: add HW register definitions >>>> baseband/acc200: add info get function >>>> baseband/acc200: add queue configuration >>>> baseband/acc200: add LDPC processing functions >>>> baseband/acc200: add LTE processing functions >>>> baseband/acc200: add support for FFT operations >>>> baseband/acc200: support interrupt >>>> baseband/acc200: add device status and vf2pf comms >>>> baseband/acc200: add PF configure companion function >>>> >>>> MAINTAINERS | 3 + >>>> app/test-bbdev/meson.build | 3 + >>>> app/test-bbdev/test_bbdev_perf.c | 76 + >>>> doc/guides/bbdevs/acc200.rst | 244 ++ >>>> doc/guides/bbdevs/index.rst | 1 + >>>> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ >>>> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ >>>> drivers/baseband/acc200/acc200_vf_enum.h | 89 + >>>> drivers/baseband/acc200/meson.build | 8 + >>>> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + >>>> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 >>>> ++++++++++++++++++++++++++++++ >>>> drivers/baseband/acc200/version.map | 10 + >>>> drivers/baseband/meson.build | 1 + >>>> 13 files changed, 7111 insertions(+) >>>> create mode 100644 doc/guides/bbdevs/acc200.rst >>>> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h >>>> create mode 100644 drivers/baseband/acc200/acc200_pmd.h >>>> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h >>>> create mode 100644 drivers/baseband/acc200/meson.build >>>> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h >>>> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c >>>> create mode 100644 drivers/baseband/acc200/version.map >>>> >>> Comparing ACC200 & ACC100 header files, I understand ACC200 is an >>> evolution of the ACC10x family. The FEC bits are really close, ACC200 >>> main addition seems to be FFT acceleration which could be handled in >>> ACC10x driver based on device ID. >>> >>> I think both drivers have to be merged in order to avoid code >>> duplication. That's how other families of devices (e.g. i40e) are >>> handled. >> I haven't seen your reply on this point. >> Do you confirm you are working on a single driver for ACC family in order to >> avoid code duplication? >> > The implementation is based on distinct ACC100 and ACC200 drivers. The 2 devices are fundamentally different generation, processes and IP. > MountBryce is an eASIC device over PCIe while ACC200 is an integrated accelerator on Xeon CPU. > The actual implementation are not the same, underlying IP are all distinct even if many of the descriptor format have similarities. > The actual capabilities of the acceleration are different and/or new. > The workaround and silicon errata are also different causing different limitation and implementation in the driver (see the serie with ongoing changes for ACC100 in parallel). > This is fundamentally distinct from ACC101 which was a derivative product from ACC100 and where it made sense to share implementation between ACC100 and ACC101. > So in a nutshell these 2 devices and drivers are 2 different beasts and the intention is to keep them intentionally separate as in the serie. > Let me know if unclear, thanks! Nic, I used a similarity checker to compare acc100 and acc200 https://dickgrune.com/Programs/similarity_tester/ l=simum.log if [ -f $l ]; then rm $l fi sim_c -s -R -o$l -R -p -P -a . There results are ./acc200/acc200_pf_enum.h consists for 100 % of ./acc100/acc100_pf_enum.h material ./acc100/acc100_pf_enum.h consists for 98 % of ./acc200/acc200_pf_enum.h material ./acc100/rte_acc100_pmd.h consists for 98 % of ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h consists for 95 % of ./acc100/acc100_pf_enum.h material ./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h material ./acc200/rte_acc200_cfg.h consists for 92 % of ./acc100/rte_acc100_cfg.h material ./acc100/rte_acc100_pmd.c consists for 87 % of ./acc200/rte_acc200_pmd.c material ./acc100/acc100_vf_enum.h consists for 80 % of ./acc200/acc200_pf_enum.h material ./acc200/rte_acc200_pmd.c consists for 78 % of ./acc100/rte_acc100_pmd.c material ./acc100/rte_acc100_cfg.h consists for 75 % of ./acc200/rte_acc200_cfg.h material Spot checking the first *pf_enum.h at 100%, these are the devices' registers, they are the same. I raised this similarity issue with 100 vs 101. Having multiple copies is difficult to support and should be avoided. For the end user, they should have to use only one driver. Tom > > Thanks > Nic > > >> Maxime >> >>> Thanks, >>> Maxime ^ permalink raw reply [flat|nested] 50+ messages in thread
* RE: [PATCH v1 00/10] baseband/acc200 2022-08-31 19:26 ` Tom Rix @ 2022-08-31 22:37 ` Chautru, Nicolas 2022-09-01 0:28 ` Tom Rix 0 siblings, 1 reply; 50+ messages in thread From: Chautru, Nicolas @ 2022-08-31 22:37 UTC (permalink / raw) To: Tom Rix, Maxime Coquelin, dev, thomas, gakhil, hemant.agrawal, Vargas, Hernan Cc: mdr, Richardson, Bruce, david.marchand, stephen Hi Thomas, Tom, > -----Original Message----- > From: Tom Rix <trix@redhat.com> > Sent: Wednesday, August 31, 2022 12:26 PM > To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin > <maxime.coquelin@redhat.com>; dev@dpdk.org; thomas@monjalon.net; > gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan > <hernan.vargas@intel.com> > Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > david.marchand@redhat.com; stephen@networkplumber.org > Subject: Re: [PATCH v1 00/10] baseband/acc200 > > > On 8/30/22 12:45 PM, Chautru, Nicolas wrote: > > Hi Maxime, > > > >> -----Original Message----- > >> From: Maxime Coquelin <maxime.coquelin@redhat.com> > >> Sent: Tuesday, August 30, 2022 12:45 AM > >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; > >> thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; > >> trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> > >> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > >> david.marchand@redhat.com; stephen@networkplumber.org > >> Subject: Re: [PATCH v1 00/10] baseband/acc200 > >> > >> Hi Nicolas, > >> > >> On 7/12/22 15:48, Maxime Coquelin wrote: > >>> Hi Nicolas, Hernan, > >>> > >>> (Adding Hernan in the recipients list) > >>> > >>> On 7/8/22 02:01, Nicolas Chautru wrote: > >>>> This is targeting 22.11 and includes the PMD for the integrated > >>>> accelerator on Intel Xeon SPR-EEC. > >>>> There is a dependency on that parallel serie still in-flight which > >>>> extends the bbdev api > >>>> https://patches.dpdk.org/project/dpdk/list/?series=23894 > >>>> > >>>> I will be offline for a few weeks for the summer break but Hernan > >>>> will cover for me during that time if required. > >>>> > >>>> Thanks > >>>> Nic > >>>> > >>>> Nicolas Chautru (10): > >>>> baseband/acc200: introduce PMD for ACC200 > >>>> baseband/acc200: add HW register definitions > >>>> baseband/acc200: add info get function > >>>> baseband/acc200: add queue configuration > >>>> baseband/acc200: add LDPC processing functions > >>>> baseband/acc200: add LTE processing functions > >>>> baseband/acc200: add support for FFT operations > >>>> baseband/acc200: support interrupt > >>>> baseband/acc200: add device status and vf2pf comms > >>>> baseband/acc200: add PF configure companion function > >>>> > >>>> MAINTAINERS | 3 + > >>>> app/test-bbdev/meson.build | 3 + > >>>> app/test-bbdev/test_bbdev_perf.c | 76 + > >>>> doc/guides/bbdevs/acc200.rst | 244 ++ > >>>> doc/guides/bbdevs/index.rst | 1 + > >>>> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ > >>>> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ > >>>> drivers/baseband/acc200/acc200_vf_enum.h | 89 + > >>>> drivers/baseband/acc200/meson.build | 8 + > >>>> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + > >>>> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 > >>>> ++++++++++++++++++++++++++++++ > >>>> drivers/baseband/acc200/version.map | 10 + > >>>> drivers/baseband/meson.build | 1 + > >>>> 13 files changed, 7111 insertions(+) > >>>> create mode 100644 doc/guides/bbdevs/acc200.rst > >>>> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h > >>>> create mode 100644 drivers/baseband/acc200/acc200_pmd.h > >>>> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h > >>>> create mode 100644 drivers/baseband/acc200/meson.build > >>>> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h > >>>> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c > >>>> create mode 100644 drivers/baseband/acc200/version.map > >>>> > >>> Comparing ACC200 & ACC100 header files, I understand ACC200 is an > >>> evolution of the ACC10x family. The FEC bits are really close, > >>> ACC200 main addition seems to be FFT acceleration which could be > >>> handled in ACC10x driver based on device ID. > >>> > >>> I think both drivers have to be merged in order to avoid code > >>> duplication. That's how other families of devices (e.g. i40e) are > >>> handled. > >> I haven't seen your reply on this point. > >> Do you confirm you are working on a single driver for ACC family in > >> order to avoid code duplication? > >> > > The implementation is based on distinct ACC100 and ACC200 drivers. The 2 > devices are fundamentally different generation, processes and IP. > > MountBryce is an eASIC device over PCIe while ACC200 is an integrated > accelerator on Xeon CPU. > > The actual implementation are not the same, underlying IP are all distinct > even if many of the descriptor format have similarities. > > The actual capabilities of the acceleration are different and/or new. > > The workaround and silicon errata are also different causing different > limitation and implementation in the driver (see the serie with ongoing > changes for ACC100 in parallel). > > This is fundamentally distinct from ACC101 which was a derivative product > from ACC100 and where it made sense to share implementation between > ACC100 and ACC101. > > So in a nutshell these 2 devices and drivers are 2 different beasts and the > intention is to keep them intentionally separate as in the serie. > > Let me know if unclear, thanks! > > Nic, > > I used a similarity checker to compare acc100 and acc200 > > https://dickgrune.com/Programs/similarity_tester/ > > l=simum.log > if [ -f $l ]; then > rm $l > fi > > sim_c -s -R -o$l -R -p -P -a . > > There results are > > ./acc200/acc200_pf_enum.h consists for 100 % of ./acc100/acc100_pf_enum.h > material ./acc100/acc100_pf_enum.h consists for 98 % of > ./acc200/acc200_pf_enum.h material ./acc100/rte_acc100_pmd.h consists for > 98 % of ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h consists > for 95 % of ./acc100/acc100_pf_enum.h material ./acc200/acc200_pmd.h > consists for 92 % of ./acc100/rte_acc100_pmd.h material > ./acc200/rte_acc200_cfg.h consists for 92 % of ./acc100/rte_acc100_cfg.h > material ./acc100/rte_acc100_pmd.c consists for 87 % of > ./acc200/rte_acc200_pmd.c material ./acc100/acc100_vf_enum.h consists for > 80 % of ./acc200/acc200_pf_enum.h material ./acc200/rte_acc200_pmd.c > consists for 78 % of ./acc100/rte_acc100_pmd.c material > ./acc100/rte_acc100_cfg.h consists for 75 % of ./acc200/rte_acc200_cfg.h > material > > Spot checking the first *pf_enum.h at 100%, these are the devices' > registers, they are the same. > > I raised this similarity issue with 100 vs 101. > > Having multiple copies is difficult to support and should be avoided. > > For the end user, they should have to use only one driver. > There are really different IP and do not have the same interface (PCIe/DDR vs integrated) and there is big serie of changes which are specific to ACC100 coming in parallel. Any workaround, optimization would be different. I agree that for the coming serie of integrated accelerator we will use a unified driver approach but for that very case that would be quite messy to artificially put them within the same PMD. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-08-31 22:37 ` Chautru, Nicolas @ 2022-09-01 0:28 ` Tom Rix 2022-09-01 1:26 ` Chautru, Nicolas 0 siblings, 1 reply; 50+ messages in thread From: Tom Rix @ 2022-09-01 0:28 UTC (permalink / raw) To: Chautru, Nicolas, Maxime Coquelin, dev, thomas, gakhil, hemant.agrawal, Vargas, Hernan Cc: mdr, Richardson, Bruce, david.marchand, stephen On 8/31/22 3:37 PM, Chautru, Nicolas wrote: > Hi Thomas, Tom, > >> -----Original Message----- >> From: Tom Rix <trix@redhat.com> >> Sent: Wednesday, August 31, 2022 12:26 PM >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin >> <maxime.coquelin@redhat.com>; dev@dpdk.org; thomas@monjalon.net; >> gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan >> <hernan.vargas@intel.com> >> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; >> david.marchand@redhat.com; stephen@networkplumber.org >> Subject: Re: [PATCH v1 00/10] baseband/acc200 >> >> >> On 8/30/22 12:45 PM, Chautru, Nicolas wrote: >>> Hi Maxime, >>> >>>> -----Original Message----- >>>> From: Maxime Coquelin <maxime.coquelin@redhat.com> >>>> Sent: Tuesday, August 30, 2022 12:45 AM >>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; >>>> thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; >>>> trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> >>>> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; >>>> david.marchand@redhat.com; stephen@networkplumber.org >>>> Subject: Re: [PATCH v1 00/10] baseband/acc200 >>>> >>>> Hi Nicolas, >>>> >>>> On 7/12/22 15:48, Maxime Coquelin wrote: >>>>> Hi Nicolas, Hernan, >>>>> >>>>> (Adding Hernan in the recipients list) >>>>> >>>>> On 7/8/22 02:01, Nicolas Chautru wrote: >>>>>> This is targeting 22.11 and includes the PMD for the integrated >>>>>> accelerator on Intel Xeon SPR-EEC. >>>>>> There is a dependency on that parallel serie still in-flight which >>>>>> extends the bbdev api >>>>>> https://patches.dpdk.org/project/dpdk/list/?series=23894 >>>>>> >>>>>> I will be offline for a few weeks for the summer break but Hernan >>>>>> will cover for me during that time if required. >>>>>> >>>>>> Thanks >>>>>> Nic >>>>>> >>>>>> Nicolas Chautru (10): >>>>>> baseband/acc200: introduce PMD for ACC200 >>>>>> baseband/acc200: add HW register definitions >>>>>> baseband/acc200: add info get function >>>>>> baseband/acc200: add queue configuration >>>>>> baseband/acc200: add LDPC processing functions >>>>>> baseband/acc200: add LTE processing functions >>>>>> baseband/acc200: add support for FFT operations >>>>>> baseband/acc200: support interrupt >>>>>> baseband/acc200: add device status and vf2pf comms >>>>>> baseband/acc200: add PF configure companion function >>>>>> >>>>>> MAINTAINERS | 3 + >>>>>> app/test-bbdev/meson.build | 3 + >>>>>> app/test-bbdev/test_bbdev_perf.c | 76 + >>>>>> doc/guides/bbdevs/acc200.rst | 244 ++ >>>>>> doc/guides/bbdevs/index.rst | 1 + >>>>>> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ >>>>>> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ >>>>>> drivers/baseband/acc200/acc200_vf_enum.h | 89 + >>>>>> drivers/baseband/acc200/meson.build | 8 + >>>>>> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + >>>>>> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 >>>>>> ++++++++++++++++++++++++++++++ >>>>>> drivers/baseband/acc200/version.map | 10 + >>>>>> drivers/baseband/meson.build | 1 + >>>>>> 13 files changed, 7111 insertions(+) >>>>>> create mode 100644 doc/guides/bbdevs/acc200.rst >>>>>> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h >>>>>> create mode 100644 drivers/baseband/acc200/acc200_pmd.h >>>>>> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h >>>>>> create mode 100644 drivers/baseband/acc200/meson.build >>>>>> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h >>>>>> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c >>>>>> create mode 100644 drivers/baseband/acc200/version.map >>>>>> >>>>> Comparing ACC200 & ACC100 header files, I understand ACC200 is an >>>>> evolution of the ACC10x family. The FEC bits are really close, >>>>> ACC200 main addition seems to be FFT acceleration which could be >>>>> handled in ACC10x driver based on device ID. >>>>> >>>>> I think both drivers have to be merged in order to avoid code >>>>> duplication. That's how other families of devices (e.g. i40e) are >>>>> handled. >>>> I haven't seen your reply on this point. >>>> Do you confirm you are working on a single driver for ACC family in >>>> order to avoid code duplication? >>>> >>> The implementation is based on distinct ACC100 and ACC200 drivers. The 2 >> devices are fundamentally different generation, processes and IP. >>> MountBryce is an eASIC device over PCIe while ACC200 is an integrated >> accelerator on Xeon CPU. >>> The actual implementation are not the same, underlying IP are all distinct >> even if many of the descriptor format have similarities. >>> The actual capabilities of the acceleration are different and/or new. >>> The workaround and silicon errata are also different causing different >> limitation and implementation in the driver (see the serie with ongoing >> changes for ACC100 in parallel). >>> This is fundamentally distinct from ACC101 which was a derivative product >> from ACC100 and where it made sense to share implementation between >> ACC100 and ACC101. >>> So in a nutshell these 2 devices and drivers are 2 different beasts and the >> intention is to keep them intentionally separate as in the serie. >>> Let me know if unclear, thanks! >> Nic, >> >> I used a similarity checker to compare acc100 and acc200 >> >> https://dickgrune.com/Programs/similarity_tester/ >> >> l=simum.log >> if [ -f $l ]; then >> rm $l >> fi >> >> sim_c -s -R -o$l -R -p -P -a . >> >> There results are >> >> ./acc200/acc200_pf_enum.h consists for 100 % of ./acc100/acc100_pf_enum.h >> material ./acc100/acc100_pf_enum.h consists for 98 % of >> ./acc200/acc200_pf_enum.h material ./acc100/rte_acc100_pmd.h consists for >> 98 % of ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h consists >> for 95 % of ./acc100/acc100_pf_enum.h material ./acc200/acc200_pmd.h >> consists for 92 % of ./acc100/rte_acc100_pmd.h material >> ./acc200/rte_acc200_cfg.h consists for 92 % of ./acc100/rte_acc100_cfg.h >> material ./acc100/rte_acc100_pmd.c consists for 87 % of >> ./acc200/rte_acc200_pmd.c material ./acc100/acc100_vf_enum.h consists for >> 80 % of ./acc200/acc200_pf_enum.h material ./acc200/rte_acc200_pmd.c >> consists for 78 % of ./acc100/rte_acc100_pmd.c material >> ./acc100/rte_acc100_cfg.h consists for 75 % of ./acc200/rte_acc200_cfg.h >> material >> >> Spot checking the first *pf_enum.h at 100%, these are the devices' >> registers, they are the same. >> >> I raised this similarity issue with 100 vs 101. >> >> Having multiple copies is difficult to support and should be avoided. >> >> For the end user, they should have to use only one driver. >> > There are really different IP and do not have the same interface (PCIe/DDR vs integrated) and there is big serie of changes which are specific to ACC100 coming in parallel. Any workaround, optimization would be different. > I agree that for the coming serie of integrated accelerator we will use a unified driver approach but for that very case that would be quite messy to artificially put them within the same PMD. How is the IP different when 100% of the registers are the same ? Tom > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* RE: [PATCH v1 00/10] baseband/acc200 2022-09-01 0:28 ` Tom Rix @ 2022-09-01 1:26 ` Chautru, Nicolas 2022-09-01 13:49 ` Tom Rix 0 siblings, 1 reply; 50+ messages in thread From: Chautru, Nicolas @ 2022-09-01 1:26 UTC (permalink / raw) To: Tom Rix, Maxime Coquelin, dev, thomas, gakhil, hemant.agrawal, Vargas, Hernan Cc: mdr, Richardson, Bruce, david.marchand, stephen Hi Tom, > -----Original Message----- > From: Tom Rix <trix@redhat.com> > Sent: Wednesday, August 31, 2022 5:28 PM > To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin > <maxime.coquelin@redhat.com>; dev@dpdk.org; thomas@monjalon.net; > gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan > <hernan.vargas@intel.com> > Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > david.marchand@redhat.com; stephen@networkplumber.org > Subject: Re: [PATCH v1 00/10] baseband/acc200 > > > On 8/31/22 3:37 PM, Chautru, Nicolas wrote: > > Hi Thomas, Tom, > > > >> -----Original Message----- > >> From: Tom Rix <trix@redhat.com> > >> Sent: Wednesday, August 31, 2022 12:26 PM > >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin > >> <maxime.coquelin@redhat.com>; dev@dpdk.org; thomas@monjalon.net; > >> gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan > >> <hernan.vargas@intel.com> > >> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > >> david.marchand@redhat.com; stephen@networkplumber.org > >> Subject: Re: [PATCH v1 00/10] baseband/acc200 > >> > >> > >> On 8/30/22 12:45 PM, Chautru, Nicolas wrote: > >>> Hi Maxime, > >>> > >>>> -----Original Message----- > >>>> From: Maxime Coquelin <maxime.coquelin@redhat.com> > >>>> Sent: Tuesday, August 30, 2022 12:45 AM > >>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; > >>>> thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; > >>>> trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> > >>>> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > >>>> david.marchand@redhat.com; stephen@networkplumber.org > >>>> Subject: Re: [PATCH v1 00/10] baseband/acc200 > >>>> > >>>> Hi Nicolas, > >>>> > >>>> On 7/12/22 15:48, Maxime Coquelin wrote: > >>>>> Hi Nicolas, Hernan, > >>>>> > >>>>> (Adding Hernan in the recipients list) > >>>>> > >>>>> On 7/8/22 02:01, Nicolas Chautru wrote: > >>>>>> This is targeting 22.11 and includes the PMD for the integrated > >>>>>> accelerator on Intel Xeon SPR-EEC. > >>>>>> There is a dependency on that parallel serie still in-flight > >>>>>> which extends the bbdev api > >>>>>> https://patches.dpdk.org/project/dpdk/list/?series=23894 > >>>>>> > >>>>>> I will be offline for a few weeks for the summer break but Hernan > >>>>>> will cover for me during that time if required. > >>>>>> > >>>>>> Thanks > >>>>>> Nic > >>>>>> > >>>>>> Nicolas Chautru (10): > >>>>>> baseband/acc200: introduce PMD for ACC200 > >>>>>> baseband/acc200: add HW register definitions > >>>>>> baseband/acc200: add info get function > >>>>>> baseband/acc200: add queue configuration > >>>>>> baseband/acc200: add LDPC processing functions > >>>>>> baseband/acc200: add LTE processing functions > >>>>>> baseband/acc200: add support for FFT operations > >>>>>> baseband/acc200: support interrupt > >>>>>> baseband/acc200: add device status and vf2pf comms > >>>>>> baseband/acc200: add PF configure companion function > >>>>>> > >>>>>> MAINTAINERS | 3 + > >>>>>> app/test-bbdev/meson.build | 3 + > >>>>>> app/test-bbdev/test_bbdev_perf.c | 76 + > >>>>>> doc/guides/bbdevs/acc200.rst | 244 ++ > >>>>>> doc/guides/bbdevs/index.rst | 1 + > >>>>>> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ > >>>>>> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ > >>>>>> drivers/baseband/acc200/acc200_vf_enum.h | 89 + > >>>>>> drivers/baseband/acc200/meson.build | 8 + > >>>>>> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + > >>>>>> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 > >>>>>> ++++++++++++++++++++++++++++++ > >>>>>> drivers/baseband/acc200/version.map | 10 + > >>>>>> drivers/baseband/meson.build | 1 + > >>>>>> 13 files changed, 7111 insertions(+) > >>>>>> create mode 100644 doc/guides/bbdevs/acc200.rst > >>>>>> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h > >>>>>> create mode 100644 drivers/baseband/acc200/acc200_pmd.h > >>>>>> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h > >>>>>> create mode 100644 drivers/baseband/acc200/meson.build > >>>>>> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h > >>>>>> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c > >>>>>> create mode 100644 drivers/baseband/acc200/version.map > >>>>>> > >>>>> Comparing ACC200 & ACC100 header files, I understand ACC200 is an > >>>>> evolution of the ACC10x family. The FEC bits are really close, > >>>>> ACC200 main addition seems to be FFT acceleration which could be > >>>>> handled in ACC10x driver based on device ID. > >>>>> > >>>>> I think both drivers have to be merged in order to avoid code > >>>>> duplication. That's how other families of devices (e.g. i40e) are > >>>>> handled. > >>>> I haven't seen your reply on this point. > >>>> Do you confirm you are working on a single driver for ACC family in > >>>> order to avoid code duplication? > >>>> > >>> The implementation is based on distinct ACC100 and ACC200 drivers. > >>> The 2 > >> devices are fundamentally different generation, processes and IP. > >>> MountBryce is an eASIC device over PCIe while ACC200 is an > >>> integrated > >> accelerator on Xeon CPU. > >>> The actual implementation are not the same, underlying IP are all > >>> distinct > >> even if many of the descriptor format have similarities. > >>> The actual capabilities of the acceleration are different and/or new. > >>> The workaround and silicon errata are also different causing > >>> different > >> limitation and implementation in the driver (see the serie with > >> ongoing changes for ACC100 in parallel). > >>> This is fundamentally distinct from ACC101 which was a derivative > >>> product > >> from ACC100 and where it made sense to share implementation between > >> ACC100 and ACC101. > >>> So in a nutshell these 2 devices and drivers are 2 different beasts > >>> and the > >> intention is to keep them intentionally separate as in the serie. > >>> Let me know if unclear, thanks! > >> Nic, > >> > >> I used a similarity checker to compare acc100 and acc200 > >> > >> https://dickgrune.com/Programs/similarity_tester/ > >> > >> l=simum.log > >> if [ -f $l ]; then > >> rm $l > >> fi > >> > >> sim_c -s -R -o$l -R -p -P -a . > >> > >> There results are > >> > >> ./acc200/acc200_pf_enum.h consists for 100 % of > >> ./acc100/acc100_pf_enum.h material ./acc100/acc100_pf_enum.h consists > >> for 98 % of ./acc200/acc200_pf_enum.h material > >> ./acc100/rte_acc100_pmd.h consists for > >> 98 % of ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h > >> consists for 95 % of ./acc100/acc100_pf_enum.h material > >> ./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h > >> material ./acc200/rte_acc200_cfg.h consists for 92 % of > >> ./acc100/rte_acc100_cfg.h material ./acc100/rte_acc100_pmd.c consists > >> for 87 % of ./acc200/rte_acc200_pmd.c material > >> ./acc100/acc100_vf_enum.h consists for > >> 80 % of ./acc200/acc200_pf_enum.h material ./acc200/rte_acc200_pmd.c > >> consists for 78 % of ./acc100/rte_acc100_pmd.c material > >> ./acc100/rte_acc100_cfg.h consists for 75 % of > >> ./acc200/rte_acc200_cfg.h material > >> > >> Spot checking the first *pf_enum.h at 100%, these are the devices' > >> registers, they are the same. > >> > >> I raised this similarity issue with 100 vs 101. > >> > >> Having multiple copies is difficult to support and should be avoided. > >> > >> For the end user, they should have to use only one driver. > >> > > There are really different IP and do not have the same interface (PCIe/DDR vs > integrated) and there is big serie of changes which are specific to ACC100 > coming in parallel. Any workaround, optimization would be different. > > I agree that for the coming serie of integrated accelerator we will use a > unified driver approach but for that very case that would be quite messy to > artificially put them within the same PMD. > > How is the IP different when 100% of the registers are the same ? > These are 2 different HW aspects. The base toplevel configuration registers are kept similar on purpose but the underlying IP are totally different design and implementation. Even the registers have differences but not visible here, the actual RDL file would define more specifically these registers bitfields and implementation including which ones are not implemented (but that is proprietary information), and at bbdev level the interface is not some much register based than processing based on data from DMA. Basically even if there was a common driver, all these would be duplicated and they are indeed different IP (including different vendors).. But I agree with the general intent and to have a common driver for the integrated driver serie (ACC200, ACC300...) now that we are moving away from PCIe/DDR lookaside acceleration and eASIC/FPGA implementation (ACC100/AC101). Thanks Nic > Tom > > > > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-09-01 1:26 ` Chautru, Nicolas @ 2022-09-01 13:49 ` Tom Rix 2022-09-01 20:34 ` Chautru, Nicolas 0 siblings, 1 reply; 50+ messages in thread From: Tom Rix @ 2022-09-01 13:49 UTC (permalink / raw) To: Chautru, Nicolas, Maxime Coquelin, dev, thomas, gakhil, hemant.agrawal, Vargas, Hernan Cc: mdr, Richardson, Bruce, david.marchand, stephen On 8/31/22 6:26 PM, Chautru, Nicolas wrote: > Hi Tom, > >> -----Original Message----- >> From: Tom Rix <trix@redhat.com> >> Sent: Wednesday, August 31, 2022 5:28 PM >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin >> <maxime.coquelin@redhat.com>; dev@dpdk.org; thomas@monjalon.net; >> gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan >> <hernan.vargas@intel.com> >> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; >> david.marchand@redhat.com; stephen@networkplumber.org >> Subject: Re: [PATCH v1 00/10] baseband/acc200 >> >> >> On 8/31/22 3:37 PM, Chautru, Nicolas wrote: >>> Hi Thomas, Tom, >>> >>>> -----Original Message----- >>>> From: Tom Rix <trix@redhat.com> >>>> Sent: Wednesday, August 31, 2022 12:26 PM >>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin >>>> <maxime.coquelin@redhat.com>; dev@dpdk.org; thomas@monjalon.net; >>>> gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan >>>> <hernan.vargas@intel.com> >>>> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; >>>> david.marchand@redhat.com; stephen@networkplumber.org >>>> Subject: Re: [PATCH v1 00/10] baseband/acc200 >>>> >>>> >>>> On 8/30/22 12:45 PM, Chautru, Nicolas wrote: >>>>> Hi Maxime, >>>>> >>>>>> -----Original Message----- >>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com> >>>>>> Sent: Tuesday, August 30, 2022 12:45 AM >>>>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; >>>>>> thomas@monjalon.net; gakhil@marvell.com; hemant.agrawal@nxp.com; >>>>>> trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> >>>>>> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; >>>>>> david.marchand@redhat.com; stephen@networkplumber.org >>>>>> Subject: Re: [PATCH v1 00/10] baseband/acc200 >>>>>> >>>>>> Hi Nicolas, >>>>>> >>>>>> On 7/12/22 15:48, Maxime Coquelin wrote: >>>>>>> Hi Nicolas, Hernan, >>>>>>> >>>>>>> (Adding Hernan in the recipients list) >>>>>>> >>>>>>> On 7/8/22 02:01, Nicolas Chautru wrote: >>>>>>>> This is targeting 22.11 and includes the PMD for the integrated >>>>>>>> accelerator on Intel Xeon SPR-EEC. >>>>>>>> There is a dependency on that parallel serie still in-flight >>>>>>>> which extends the bbdev api >>>>>>>> https://patches.dpdk.org/project/dpdk/list/?series=23894 >>>>>>>> >>>>>>>> I will be offline for a few weeks for the summer break but Hernan >>>>>>>> will cover for me during that time if required. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Nic >>>>>>>> >>>>>>>> Nicolas Chautru (10): >>>>>>>> baseband/acc200: introduce PMD for ACC200 >>>>>>>> baseband/acc200: add HW register definitions >>>>>>>> baseband/acc200: add info get function >>>>>>>> baseband/acc200: add queue configuration >>>>>>>> baseband/acc200: add LDPC processing functions >>>>>>>> baseband/acc200: add LTE processing functions >>>>>>>> baseband/acc200: add support for FFT operations >>>>>>>> baseband/acc200: support interrupt >>>>>>>> baseband/acc200: add device status and vf2pf comms >>>>>>>> baseband/acc200: add PF configure companion function >>>>>>>> >>>>>>>> MAINTAINERS | 3 + >>>>>>>> app/test-bbdev/meson.build | 3 + >>>>>>>> app/test-bbdev/test_bbdev_perf.c | 76 + >>>>>>>> doc/guides/bbdevs/acc200.rst | 244 ++ >>>>>>>> doc/guides/bbdevs/index.rst | 1 + >>>>>>>> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ >>>>>>>> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ >>>>>>>> drivers/baseband/acc200/acc200_vf_enum.h | 89 + >>>>>>>> drivers/baseband/acc200/meson.build | 8 + >>>>>>>> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + >>>>>>>> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 >>>>>>>> ++++++++++++++++++++++++++++++ >>>>>>>> drivers/baseband/acc200/version.map | 10 + >>>>>>>> drivers/baseband/meson.build | 1 + >>>>>>>> 13 files changed, 7111 insertions(+) >>>>>>>> create mode 100644 doc/guides/bbdevs/acc200.rst >>>>>>>> create mode 100644 drivers/baseband/acc200/acc200_pf_enum.h >>>>>>>> create mode 100644 drivers/baseband/acc200/acc200_pmd.h >>>>>>>> create mode 100644 drivers/baseband/acc200/acc200_vf_enum.h >>>>>>>> create mode 100644 drivers/baseband/acc200/meson.build >>>>>>>> create mode 100644 drivers/baseband/acc200/rte_acc200_cfg.h >>>>>>>> create mode 100644 drivers/baseband/acc200/rte_acc200_pmd.c >>>>>>>> create mode 100644 drivers/baseband/acc200/version.map >>>>>>>> >>>>>>> Comparing ACC200 & ACC100 header files, I understand ACC200 is an >>>>>>> evolution of the ACC10x family. The FEC bits are really close, >>>>>>> ACC200 main addition seems to be FFT acceleration which could be >>>>>>> handled in ACC10x driver based on device ID. >>>>>>> >>>>>>> I think both drivers have to be merged in order to avoid code >>>>>>> duplication. That's how other families of devices (e.g. i40e) are >>>>>>> handled. >>>>>> I haven't seen your reply on this point. >>>>>> Do you confirm you are working on a single driver for ACC family in >>>>>> order to avoid code duplication? >>>>>> >>>>> The implementation is based on distinct ACC100 and ACC200 drivers. >>>>> The 2 >>>> devices are fundamentally different generation, processes and IP. >>>>> MountBryce is an eASIC device over PCIe while ACC200 is an >>>>> integrated >>>> accelerator on Xeon CPU. >>>>> The actual implementation are not the same, underlying IP are all >>>>> distinct >>>> even if many of the descriptor format have similarities. >>>>> The actual capabilities of the acceleration are different and/or new. >>>>> The workaround and silicon errata are also different causing >>>>> different >>>> limitation and implementation in the driver (see the serie with >>>> ongoing changes for ACC100 in parallel). >>>>> This is fundamentally distinct from ACC101 which was a derivative >>>>> product >>>> from ACC100 and where it made sense to share implementation between >>>> ACC100 and ACC101. >>>>> So in a nutshell these 2 devices and drivers are 2 different beasts >>>>> and the >>>> intention is to keep them intentionally separate as in the serie. >>>>> Let me know if unclear, thanks! >>>> Nic, >>>> >>>> I used a similarity checker to compare acc100 and acc200 >>>> >>>> https://dickgrune.com/Programs/similarity_tester/ >>>> >>>> l=simum.log >>>> if [ -f $l ]; then >>>> rm $l >>>> fi >>>> >>>> sim_c -s -R -o$l -R -p -P -a . >>>> >>>> There results are >>>> >>>> ./acc200/acc200_pf_enum.h consists for 100 % of >>>> ./acc100/acc100_pf_enum.h material ./acc100/acc100_pf_enum.h consists >>>> for 98 % of ./acc200/acc200_pf_enum.h material >>>> ./acc100/rte_acc100_pmd.h consists for >>>> 98 % of ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h >>>> consists for 95 % of ./acc100/acc100_pf_enum.h material >>>> ./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h >>>> material ./acc200/rte_acc200_cfg.h consists for 92 % of >>>> ./acc100/rte_acc100_cfg.h material ./acc100/rte_acc100_pmd.c consists >>>> for 87 % of ./acc200/rte_acc200_pmd.c material >>>> ./acc100/acc100_vf_enum.h consists for >>>> 80 % of ./acc200/acc200_pf_enum.h material ./acc200/rte_acc200_pmd.c >>>> consists for 78 % of ./acc100/rte_acc100_pmd.c material >>>> ./acc100/rte_acc100_cfg.h consists for 75 % of >>>> ./acc200/rte_acc200_cfg.h material >>>> >>>> Spot checking the first *pf_enum.h at 100%, these are the devices' >>>> registers, they are the same. >>>> >>>> I raised this similarity issue with 100 vs 101. >>>> >>>> Having multiple copies is difficult to support and should be avoided. >>>> >>>> For the end user, they should have to use only one driver. >>>> >>> There are really different IP and do not have the same interface (PCIe/DDR vs >> integrated) and there is big serie of changes which are specific to ACC100 >> coming in parallel. Any workaround, optimization would be different. >>> I agree that for the coming serie of integrated accelerator we will use a >> unified driver approach but for that very case that would be quite messy to >> artificially put them within the same PMD. >> >> How is the IP different when 100% of the registers are the same ? >> > These are 2 different HW aspects. The base toplevel configuration registers are kept similar on purpose but the underlying IP are totally different design and implementation. > Even the registers have differences but not visible here, the actual RDL file would define more specifically these registers bitfields and implementation including which ones are not implemented (but that is proprietary information), and at bbdev level the interface is not some much register based than processing based on data from DMA. > Basically even if there was a common driver, all these would be duplicated and they are indeed different IP (including different vendors).. > But I agree with the general intent and to have a common driver for the integrated driver serie (ACC200, ACC300...) now that we are moving away from PCIe/DDR lookaside acceleration and eASIC/FPGA implementation (ACC100/AC101). Looking a little deeper, at how the driver is lays out some of its bitfields and private data by reviewing the ./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h There are some minor changes to existing reserved bitfields. A new structure for fft. The acc200_device, the private data for the driver, is an exact copy of acc100_device. acc200_pmd.h is the superset and could be used with little changes as a common acc_pmd.h. acc200 is doing everything the acc100 did in a very similar if not exact way, adding the fft feature. Can you point to some portion of this patchset that is so unique that it could not be abstracted to an if-check or function and so requiring this separate, nearly identical driver ? Tom >> Tom >> >>> ^ permalink raw reply [flat|nested] 50+ messages in thread
* RE: [PATCH v1 00/10] baseband/acc200 2022-09-01 13:49 ` Tom Rix @ 2022-09-01 20:34 ` Chautru, Nicolas 2022-09-06 12:51 ` Tom Rix 0 siblings, 1 reply; 50+ messages in thread From: Chautru, Nicolas @ 2022-09-01 20:34 UTC (permalink / raw) To: Tom Rix, Maxime Coquelin, dev, thomas, gakhil, hemant.agrawal, Vargas, Hernan Cc: mdr, Richardson, Bruce, david.marchand, stephen Hi Tom, > -----Original Message----- > From: Tom Rix <trix@redhat.com> > Sent: Thursday, September 1, 2022 6:49 AM > To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin > <maxime.coquelin@redhat.com>; dev@dpdk.org; thomas@monjalon.net; > gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan > <hernan.vargas@intel.com> > Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > david.marchand@redhat.com; stephen@networkplumber.org > Subject: Re: [PATCH v1 00/10] baseband/acc200 > > > On 8/31/22 6:26 PM, Chautru, Nicolas wrote: > > Hi Tom, > > > >> -----Original Message----- > >> From: Tom Rix <trix@redhat.com> > >> Sent: Wednesday, August 31, 2022 5:28 PM > >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin > >> <maxime.coquelin@redhat.com>; dev@dpdk.org; > thomas@monjalon.net; > >> gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan > >> <hernan.vargas@intel.com> > >> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > >> david.marchand@redhat.com; stephen@networkplumber.org > >> Subject: Re: [PATCH v1 00/10] baseband/acc200 > >> > >> > >> On 8/31/22 3:37 PM, Chautru, Nicolas wrote: > >>> Hi Thomas, Tom, > >>> > >>>> -----Original Message----- > >>>> From: Tom Rix <trix@redhat.com> > >>>> Sent: Wednesday, August 31, 2022 12:26 PM > >>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin > >>>> <maxime.coquelin@redhat.com>; dev@dpdk.org; > thomas@monjalon.net; > >>>> gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan > >>>> <hernan.vargas@intel.com> > >>>> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; > >>>> david.marchand@redhat.com; stephen@networkplumber.org > >>>> Subject: Re: [PATCH v1 00/10] baseband/acc200 > >>>> > >>>> > >>>> On 8/30/22 12:45 PM, Chautru, Nicolas wrote: > >>>>> Hi Maxime, > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com> > >>>>>> Sent: Tuesday, August 30, 2022 12:45 AM > >>>>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; > >>>>>> thomas@monjalon.net; gakhil@marvell.com; > hemant.agrawal@nxp.com; > >>>>>> trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> > >>>>>> Cc: mdr@ashroe.eu; Richardson, Bruce > >>>>>> <bruce.richardson@intel.com>; david.marchand@redhat.com; > >>>>>> stephen@networkplumber.org > >>>>>> Subject: Re: [PATCH v1 00/10] baseband/acc200 > >>>>>> > >>>>>> Hi Nicolas, > >>>>>> > >>>>>> On 7/12/22 15:48, Maxime Coquelin wrote: > >>>>>>> Hi Nicolas, Hernan, > >>>>>>> > >>>>>>> (Adding Hernan in the recipients list) > >>>>>>> > >>>>>>> On 7/8/22 02:01, Nicolas Chautru wrote: > >>>>>>>> This is targeting 22.11 and includes the PMD for the integrated > >>>>>>>> accelerator on Intel Xeon SPR-EEC. > >>>>>>>> There is a dependency on that parallel serie still in-flight > >>>>>>>> which extends the bbdev api > >>>>>>>> https://patches.dpdk.org/project/dpdk/list/?series=23894 > >>>>>>>> > >>>>>>>> I will be offline for a few weeks for the summer break but > >>>>>>>> Hernan will cover for me during that time if required. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> Nic > >>>>>>>> > >>>>>>>> Nicolas Chautru (10): > >>>>>>>> baseband/acc200: introduce PMD for ACC200 > >>>>>>>> baseband/acc200: add HW register definitions > >>>>>>>> baseband/acc200: add info get function > >>>>>>>> baseband/acc200: add queue configuration > >>>>>>>> baseband/acc200: add LDPC processing functions > >>>>>>>> baseband/acc200: add LTE processing functions > >>>>>>>> baseband/acc200: add support for FFT operations > >>>>>>>> baseband/acc200: support interrupt > >>>>>>>> baseband/acc200: add device status and vf2pf comms > >>>>>>>> baseband/acc200: add PF configure companion function > >>>>>>>> > >>>>>>>> MAINTAINERS | 3 + > >>>>>>>> app/test-bbdev/meson.build | 3 + > >>>>>>>> app/test-bbdev/test_bbdev_perf.c | 76 + > >>>>>>>> doc/guides/bbdevs/acc200.rst | 244 ++ > >>>>>>>> doc/guides/bbdevs/index.rst | 1 + > >>>>>>>> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ > >>>>>>>> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ > >>>>>>>> drivers/baseband/acc200/acc200_vf_enum.h | 89 + > >>>>>>>> drivers/baseband/acc200/meson.build | 8 + > >>>>>>>> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + > >>>>>>>> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 > >>>>>>>> ++++++++++++++++++++++++++++++ > >>>>>>>> drivers/baseband/acc200/version.map | 10 + > >>>>>>>> drivers/baseband/meson.build | 1 + > >>>>>>>> 13 files changed, 7111 insertions(+) > >>>>>>>> create mode 100644 doc/guides/bbdevs/acc200.rst > >>>>>>>> create mode 100644 > drivers/baseband/acc200/acc200_pf_enum.h > >>>>>>>> create mode 100644 drivers/baseband/acc200/acc200_pmd.h > >>>>>>>> create mode 100644 > drivers/baseband/acc200/acc200_vf_enum.h > >>>>>>>> create mode 100644 drivers/baseband/acc200/meson.build > >>>>>>>> create mode 100644 > drivers/baseband/acc200/rte_acc200_cfg.h > >>>>>>>> create mode 100644 > drivers/baseband/acc200/rte_acc200_pmd.c > >>>>>>>> create mode 100644 drivers/baseband/acc200/version.map > >>>>>>>> > >>>>>>> Comparing ACC200 & ACC100 header files, I understand ACC200 is > >>>>>>> an evolution of the ACC10x family. The FEC bits are really > >>>>>>> close, > >>>>>>> ACC200 main addition seems to be FFT acceleration which could be > >>>>>>> handled in ACC10x driver based on device ID. > >>>>>>> > >>>>>>> I think both drivers have to be merged in order to avoid code > >>>>>>> duplication. That's how other families of devices (e.g. i40e) > >>>>>>> are handled. > >>>>>> I haven't seen your reply on this point. > >>>>>> Do you confirm you are working on a single driver for ACC family > >>>>>> in order to avoid code duplication? > >>>>>> > >>>>> The implementation is based on distinct ACC100 and ACC200 drivers. > >>>>> The 2 > >>>> devices are fundamentally different generation, processes and IP. > >>>>> MountBryce is an eASIC device over PCIe while ACC200 is an > >>>>> integrated > >>>> accelerator on Xeon CPU. > >>>>> The actual implementation are not the same, underlying IP are all > >>>>> distinct > >>>> even if many of the descriptor format have similarities. > >>>>> The actual capabilities of the acceleration are different and/or new. > >>>>> The workaround and silicon errata are also different causing > >>>>> different > >>>> limitation and implementation in the driver (see the serie with > >>>> ongoing changes for ACC100 in parallel). > >>>>> This is fundamentally distinct from ACC101 which was a derivative > >>>>> product > >>>> from ACC100 and where it made sense to share implementation > between > >>>> ACC100 and ACC101. > >>>>> So in a nutshell these 2 devices and drivers are 2 different > >>>>> beasts and the > >>>> intention is to keep them intentionally separate as in the serie. > >>>>> Let me know if unclear, thanks! > >>>> Nic, > >>>> > >>>> I used a similarity checker to compare acc100 and acc200 > >>>> > >>>> https://dickgrune.com/Programs/similarity_tester/ > >>>> > >>>> l=simum.log > >>>> if [ -f $l ]; then > >>>> rm $l > >>>> fi > >>>> > >>>> sim_c -s -R -o$l -R -p -P -a . > >>>> > >>>> There results are > >>>> > >>>> ./acc200/acc200_pf_enum.h consists for 100 % of > >>>> ./acc100/acc100_pf_enum.h material ./acc100/acc100_pf_enum.h > >>>> consists for 98 % of ./acc200/acc200_pf_enum.h material > >>>> ./acc100/rte_acc100_pmd.h consists for > >>>> 98 % of ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h > >>>> consists for 95 % of ./acc100/acc100_pf_enum.h material > >>>> ./acc200/acc200_pmd.h consists for 92 % of > >>>> ./acc100/rte_acc100_pmd.h material ./acc200/rte_acc200_cfg.h > >>>> consists for 92 % of ./acc100/rte_acc100_cfg.h material > >>>> ./acc100/rte_acc100_pmd.c consists for 87 % of > >>>> ./acc200/rte_acc200_pmd.c material ./acc100/acc100_vf_enum.h > >>>> consists for > >>>> 80 % of ./acc200/acc200_pf_enum.h material > >>>> ./acc200/rte_acc200_pmd.c consists for 78 % of > >>>> ./acc100/rte_acc100_pmd.c material ./acc100/rte_acc100_cfg.h > >>>> consists for 75 % of ./acc200/rte_acc200_cfg.h material > >>>> > >>>> Spot checking the first *pf_enum.h at 100%, these are the devices' > >>>> registers, they are the same. > >>>> > >>>> I raised this similarity issue with 100 vs 101. > >>>> > >>>> Having multiple copies is difficult to support and should be avoided. > >>>> > >>>> For the end user, they should have to use only one driver. > >>>> > >>> There are really different IP and do not have the same interface > >>> (PCIe/DDR vs > >> integrated) and there is big serie of changes which are specific to > >> ACC100 coming in parallel. Any workaround, optimization would be > different. > >>> I agree that for the coming serie of integrated accelerator we will > >>> use a > >> unified driver approach but for that very case that would be quite > >> messy to artificially put them within the same PMD. > >> > >> How is the IP different when 100% of the registers are the same ? > >> > > These are 2 different HW aspects. The base toplevel configuration registers > are kept similar on purpose but the underlying IP are totally different design > and implementation. > > Even the registers have differences but not visible here, the actual RDL file > would define more specifically these registers bitfields and implementation > including which ones are not implemented (but that is proprietary > information), and at bbdev level the interface is not some much register > based than processing based on data from DMA. > > Basically even if there was a common driver, all these would be duplicated > and they are indeed different IP (including different vendors).. > > But I agree with the general intent and to have a common driver for the > integrated driver serie (ACC200, ACC300...) now that we are moving away > from PCIe/DDR lookaside acceleration and eASIC/FPGA implementation > (ACC100/AC101). > > Looking a little deeper, at how the driver is lays out some of its bitfields and > private data by reviewing the > > ./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h > > There are some minor changes to existing reserved bitfields. > A new structure for fft. > The acc200_device, the private data for the driver, is an exact copy of > acc100_device. > > acc200_pmd.h is the superset and could be used with little changes as a > common acc_pmd.h. > acc200 is doing everything the acc100 did in a very similar if not exact way, > adding the fft feature. > > Can you point to some portion of this patchset that is so unique that it could > not be abstracted to an if-check or function and so requiring this separate, > nearly identical driver ? > You used a similarity checker really, there are actually way more relevent differences than what you imply here. With regards to the 2 pf_enum.h file, there are many registers that have same or similar names but have now different values being mapped hence you just cannot use one for the other. Saying that "./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h" is just not correct and really irrelevant. Just do a diff side by side please and check, that should be extremely obvious, that metrics tells more about the similarity checker limitation than anything else. Even when using a common driver for ACC200/300 they will have distinct register enum files being auto-generated and coming from distinct RDL. Again just do a diff of these 2 files. I believe you will agree that is not relevant for these files to try to artificially merged these together. With regards to the pmd.h, some structure/defines are indeed common and could be moved to a common file (for instance turboencoder and LDPC encoder which are more vanilla and unlikely to change for future product unlike the decoders which have different feature set and behaviour; or some 3GPP constant that can be defined once). We can definitely change these to put together shared structures/defines, but not intending to try to artificially put things together with spaghetti code. We would like to keep 3 parallel versions of these PMD for 3 different product lines which are indeed fundamentally different designs (including different workaround required as can be seen on the parallel ACC100 serie under review). - one version for FPGA implementation (support for N3000, N6000, ...) - one version for eASIC lookaside card implementation (ACC100, ACC101, ...) - one version for the integrated Xeon accelerators (ACC200, ACC300, ...) Let me know if unclear Nic > Tom > > > > >> Tom > >> > >>> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-09-01 20:34 ` Chautru, Nicolas @ 2022-09-06 12:51 ` Tom Rix 2022-09-14 10:35 ` Thomas Monjalon 0 siblings, 1 reply; 50+ messages in thread From: Tom Rix @ 2022-09-06 12:51 UTC (permalink / raw) To: Chautru, Nicolas, Maxime Coquelin, dev, thomas, gakhil, hemant.agrawal, Vargas, Hernan Cc: mdr, Richardson, Bruce, david.marchand, stephen On 9/1/22 1:34 PM, Chautru, Nicolas wrote: > Hi Tom, > >> -----Original Message----- >> From: Tom Rix <trix@redhat.com> >> Sent: Thursday, September 1, 2022 6:49 AM >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin >> <maxime.coquelin@redhat.com>; dev@dpdk.org; thomas@monjalon.net; >> gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan >> <hernan.vargas@intel.com> >> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; >> david.marchand@redhat.com; stephen@networkplumber.org >> Subject: Re: [PATCH v1 00/10] baseband/acc200 >> >> >> On 8/31/22 6:26 PM, Chautru, Nicolas wrote: >>> Hi Tom, >>> >>>> -----Original Message----- >>>> From: Tom Rix <trix@redhat.com> >>>> Sent: Wednesday, August 31, 2022 5:28 PM >>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin >>>> <maxime.coquelin@redhat.com>; dev@dpdk.org; >> thomas@monjalon.net; >>>> gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan >>>> <hernan.vargas@intel.com> >>>> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; >>>> david.marchand@redhat.com; stephen@networkplumber.org >>>> Subject: Re: [PATCH v1 00/10] baseband/acc200 >>>> >>>> >>>> On 8/31/22 3:37 PM, Chautru, Nicolas wrote: >>>>> Hi Thomas, Tom, >>>>> >>>>>> -----Original Message----- >>>>>> From: Tom Rix <trix@redhat.com> >>>>>> Sent: Wednesday, August 31, 2022 12:26 PM >>>>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; Maxime Coquelin >>>>>> <maxime.coquelin@redhat.com>; dev@dpdk.org; >> thomas@monjalon.net; >>>>>> gakhil@marvell.com; hemant.agrawal@nxp.com; Vargas, Hernan >>>>>> <hernan.vargas@intel.com> >>>>>> Cc: mdr@ashroe.eu; Richardson, Bruce <bruce.richardson@intel.com>; >>>>>> david.marchand@redhat.com; stephen@networkplumber.org >>>>>> Subject: Re: [PATCH v1 00/10] baseband/acc200 >>>>>> >>>>>> >>>>>> On 8/30/22 12:45 PM, Chautru, Nicolas wrote: >>>>>>> Hi Maxime, >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com> >>>>>>>> Sent: Tuesday, August 30, 2022 12:45 AM >>>>>>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org; >>>>>>>> thomas@monjalon.net; gakhil@marvell.com; >> hemant.agrawal@nxp.com; >>>>>>>> trix@redhat.com; Vargas, Hernan <hernan.vargas@intel.com> >>>>>>>> Cc: mdr@ashroe.eu; Richardson, Bruce >>>>>>>> <bruce.richardson@intel.com>; david.marchand@redhat.com; >>>>>>>> stephen@networkplumber.org >>>>>>>> Subject: Re: [PATCH v1 00/10] baseband/acc200 >>>>>>>> >>>>>>>> Hi Nicolas, >>>>>>>> >>>>>>>> On 7/12/22 15:48, Maxime Coquelin wrote: >>>>>>>>> Hi Nicolas, Hernan, >>>>>>>>> >>>>>>>>> (Adding Hernan in the recipients list) >>>>>>>>> >>>>>>>>> On 7/8/22 02:01, Nicolas Chautru wrote: >>>>>>>>>> This is targeting 22.11 and includes the PMD for the integrated >>>>>>>>>> accelerator on Intel Xeon SPR-EEC. >>>>>>>>>> There is a dependency on that parallel serie still in-flight >>>>>>>>>> which extends the bbdev api >>>>>>>>>> https://patches.dpdk.org/project/dpdk/list/?series=23894 >>>>>>>>>> >>>>>>>>>> I will be offline for a few weeks for the summer break but >>>>>>>>>> Hernan will cover for me during that time if required. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Nic >>>>>>>>>> >>>>>>>>>> Nicolas Chautru (10): >>>>>>>>>> baseband/acc200: introduce PMD for ACC200 >>>>>>>>>> baseband/acc200: add HW register definitions >>>>>>>>>> baseband/acc200: add info get function >>>>>>>>>> baseband/acc200: add queue configuration >>>>>>>>>> baseband/acc200: add LDPC processing functions >>>>>>>>>> baseband/acc200: add LTE processing functions >>>>>>>>>> baseband/acc200: add support for FFT operations >>>>>>>>>> baseband/acc200: support interrupt >>>>>>>>>> baseband/acc200: add device status and vf2pf comms >>>>>>>>>> baseband/acc200: add PF configure companion function >>>>>>>>>> >>>>>>>>>> MAINTAINERS | 3 + >>>>>>>>>> app/test-bbdev/meson.build | 3 + >>>>>>>>>> app/test-bbdev/test_bbdev_perf.c | 76 + >>>>>>>>>> doc/guides/bbdevs/acc200.rst | 244 ++ >>>>>>>>>> doc/guides/bbdevs/index.rst | 1 + >>>>>>>>>> drivers/baseband/acc200/acc200_pf_enum.h | 468 +++ >>>>>>>>>> drivers/baseband/acc200/acc200_pmd.h | 690 ++++ >>>>>>>>>> drivers/baseband/acc200/acc200_vf_enum.h | 89 + >>>>>>>>>> drivers/baseband/acc200/meson.build | 8 + >>>>>>>>>> drivers/baseband/acc200/rte_acc200_cfg.h | 115 + >>>>>>>>>> drivers/baseband/acc200/rte_acc200_pmd.c | 5403 >>>>>>>>>> ++++++++++++++++++++++++++++++ >>>>>>>>>> drivers/baseband/acc200/version.map | 10 + >>>>>>>>>> drivers/baseband/meson.build | 1 + >>>>>>>>>> 13 files changed, 7111 insertions(+) >>>>>>>>>> create mode 100644 doc/guides/bbdevs/acc200.rst >>>>>>>>>> create mode 100644 >> drivers/baseband/acc200/acc200_pf_enum.h >>>>>>>>>> create mode 100644 drivers/baseband/acc200/acc200_pmd.h >>>>>>>>>> create mode 100644 >> drivers/baseband/acc200/acc200_vf_enum.h >>>>>>>>>> create mode 100644 drivers/baseband/acc200/meson.build >>>>>>>>>> create mode 100644 >> drivers/baseband/acc200/rte_acc200_cfg.h >>>>>>>>>> create mode 100644 >> drivers/baseband/acc200/rte_acc200_pmd.c >>>>>>>>>> create mode 100644 drivers/baseband/acc200/version.map >>>>>>>>>> >>>>>>>>> Comparing ACC200 & ACC100 header files, I understand ACC200 is >>>>>>>>> an evolution of the ACC10x family. The FEC bits are really >>>>>>>>> close, >>>>>>>>> ACC200 main addition seems to be FFT acceleration which could be >>>>>>>>> handled in ACC10x driver based on device ID. >>>>>>>>> >>>>>>>>> I think both drivers have to be merged in order to avoid code >>>>>>>>> duplication. That's how other families of devices (e.g. i40e) >>>>>>>>> are handled. >>>>>>>> I haven't seen your reply on this point. >>>>>>>> Do you confirm you are working on a single driver for ACC family >>>>>>>> in order to avoid code duplication? >>>>>>>> >>>>>>> The implementation is based on distinct ACC100 and ACC200 drivers. >>>>>>> The 2 >>>>>> devices are fundamentally different generation, processes and IP. >>>>>>> MountBryce is an eASIC device over PCIe while ACC200 is an >>>>>>> integrated >>>>>> accelerator on Xeon CPU. >>>>>>> The actual implementation are not the same, underlying IP are all >>>>>>> distinct >>>>>> even if many of the descriptor format have similarities. >>>>>>> The actual capabilities of the acceleration are different and/or new. >>>>>>> The workaround and silicon errata are also different causing >>>>>>> different >>>>>> limitation and implementation in the driver (see the serie with >>>>>> ongoing changes for ACC100 in parallel). >>>>>>> This is fundamentally distinct from ACC101 which was a derivative >>>>>>> product >>>>>> from ACC100 and where it made sense to share implementation >> between >>>>>> ACC100 and ACC101. >>>>>>> So in a nutshell these 2 devices and drivers are 2 different >>>>>>> beasts and the >>>>>> intention is to keep them intentionally separate as in the serie. >>>>>>> Let me know if unclear, thanks! >>>>>> Nic, >>>>>> >>>>>> I used a similarity checker to compare acc100 and acc200 >>>>>> >>>>>> https://dickgrune.com/Programs/similarity_tester/ >>>>>> >>>>>> l=simum.log >>>>>> if [ -f $l ]; then >>>>>> rm $l >>>>>> fi >>>>>> >>>>>> sim_c -s -R -o$l -R -p -P -a . >>>>>> >>>>>> There results are >>>>>> >>>>>> ./acc200/acc200_pf_enum.h consists for 100 % of >>>>>> ./acc100/acc100_pf_enum.h material ./acc100/acc100_pf_enum.h >>>>>> consists for 98 % of ./acc200/acc200_pf_enum.h material >>>>>> ./acc100/rte_acc100_pmd.h consists for >>>>>> 98 % of ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h >>>>>> consists for 95 % of ./acc100/acc100_pf_enum.h material >>>>>> ./acc200/acc200_pmd.h consists for 92 % of >>>>>> ./acc100/rte_acc100_pmd.h material ./acc200/rte_acc200_cfg.h >>>>>> consists for 92 % of ./acc100/rte_acc100_cfg.h material >>>>>> ./acc100/rte_acc100_pmd.c consists for 87 % of >>>>>> ./acc200/rte_acc200_pmd.c material ./acc100/acc100_vf_enum.h >>>>>> consists for >>>>>> 80 % of ./acc200/acc200_pf_enum.h material >>>>>> ./acc200/rte_acc200_pmd.c consists for 78 % of >>>>>> ./acc100/rte_acc100_pmd.c material ./acc100/rte_acc100_cfg.h >>>>>> consists for 75 % of ./acc200/rte_acc200_cfg.h material >>>>>> >>>>>> Spot checking the first *pf_enum.h at 100%, these are the devices' >>>>>> registers, they are the same. >>>>>> >>>>>> I raised this similarity issue with 100 vs 101. >>>>>> >>>>>> Having multiple copies is difficult to support and should be avoided. >>>>>> >>>>>> For the end user, they should have to use only one driver. >>>>>> >>>>> There are really different IP and do not have the same interface >>>>> (PCIe/DDR vs >>>> integrated) and there is big serie of changes which are specific to >>>> ACC100 coming in parallel. Any workaround, optimization would be >> different. >>>>> I agree that for the coming serie of integrated accelerator we will >>>>> use a >>>> unified driver approach but for that very case that would be quite >>>> messy to artificially put them within the same PMD. >>>> >>>> How is the IP different when 100% of the registers are the same ? >>>> >>> These are 2 different HW aspects. The base toplevel configuration registers >> are kept similar on purpose but the underlying IP are totally different design >> and implementation. >>> Even the registers have differences but not visible here, the actual RDL file >> would define more specifically these registers bitfields and implementation >> including which ones are not implemented (but that is proprietary >> information), and at bbdev level the interface is not some much register >> based than processing based on data from DMA. >>> Basically even if there was a common driver, all these would be duplicated >> and they are indeed different IP (including different vendors).. >>> But I agree with the general intent and to have a common driver for the >> integrated driver serie (ACC200, ACC300...) now that we are moving away >> from PCIe/DDR lookaside acceleration and eASIC/FPGA implementation >> (ACC100/AC101). >> >> Looking a little deeper, at how the driver is lays out some of its bitfields and >> private data by reviewing the >> >> ./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h >> >> There are some minor changes to existing reserved bitfields. >> A new structure for fft. >> The acc200_device, the private data for the driver, is an exact copy of >> acc100_device. >> >> acc200_pmd.h is the superset and could be used with little changes as a >> common acc_pmd.h. >> acc200 is doing everything the acc100 did in a very similar if not exact way, >> adding the fft feature. >> >> Can you point to some portion of this patchset that is so unique that it could >> not be abstracted to an if-check or function and so requiring this separate, >> nearly identical driver ? >> > You used a similarity checker really, there are actually way more relevent differences than what you imply here. > With regards to the 2 pf_enum.h file, there are many registers that have same or similar names but have now different values being mapped hence you just cannot use one for the other. > Saying that "./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h" is just not correct and really irrelevant. > Just do a diff side by side please and check, that should be extremely obvious, that metrics tells more about the similarity checker limitation than anything else. > Even when using a common driver for ACC200/300 they will have distinct register enum files being auto-generated and coming from distinct RDL. > Again just do a diff of these 2 files. I believe you will agree that is not relevant for these files to try to artificially merged these together. > > With regards to the pmd.h, some structure/defines are indeed common and could be moved to a common file (for instance turboencoder and LDPC encoder which are more vanilla and unlikely to change for future product unlike the decoders which have different feature set and behaviour; or some 3GPP constant that can be defined once). > We can definitely change these to put together shared structures/defines, but not intending to try to artificially put things together with spaghetti code. > We would like to keep 3 parallel versions of these PMD for 3 different product lines which are indeed fundamentally different designs (including different workaround required as can be seen on the parallel ACC100 serie under review). > - one version for FPGA implementation (support for N3000, N6000, ...) > - one version for eASIC lookaside card implementation (ACC100, ACC101, ...) > - one version for the integrated Xeon accelerators (ACC200, ACC300, ...) Some suggestions on refactoring, For the registers, have a common file. For the shared functionality, ex/ ldpc encoder, break these out to its own shared file. The public interface, see my earlier comments on the documentation, should be have the same interfaces and the few differences highlighted. Tom > > Let me know if unclear > Nic > > > > > > > >> Tom >> >> >> >>>> Tom >>>> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-09-06 12:51 ` Tom Rix @ 2022-09-14 10:35 ` Thomas Monjalon 2022-09-14 11:50 ` Maxime Coquelin 0 siblings, 1 reply; 50+ messages in thread From: Thomas Monjalon @ 2022-09-14 10:35 UTC (permalink / raw) To: Chautru, Nicolas, Maxime Coquelin, dev, gakhil, hemant.agrawal, Vargas, Hernan, Tom Rix Cc: mdr, Richardson, Bruce, david.marchand, stephen 06/09/2022 14:51, Tom Rix: > On 9/1/22 1:34 PM, Chautru, Nicolas wrote: > > From: Tom Rix <trix@redhat.com> > >> On 8/31/22 6:26 PM, Chautru, Nicolas wrote: > >>> From: Tom Rix <trix@redhat.com> > >>>> On 8/31/22 3:37 PM, Chautru, Nicolas wrote: > >>>>>>>>> Comparing ACC200 & ACC100 header files, I understand ACC200 is > >>>>>>>>> an evolution of the ACC10x family. The FEC bits are really > >>>>>>>>> close, > >>>>>>>>> ACC200 main addition seems to be FFT acceleration which could be > >>>>>>>>> handled in ACC10x driver based on device ID. > >>>>>>>>> > >>>>>>>>> I think both drivers have to be merged in order to avoid code > >>>>>>>>> duplication. That's how other families of devices (e.g. i40e) > >>>>>>>>> are handled. > >>>>>>>> I haven't seen your reply on this point. > >>>>>>>> Do you confirm you are working on a single driver for ACC family > >>>>>>>> in order to avoid code duplication? > >>>>>>>> > >>>>>>> The implementation is based on distinct ACC100 and ACC200 drivers. > >>>>>>> The 2 > >>>>>> devices are fundamentally different generation, processes and IP. > >>>>>>> MountBryce is an eASIC device over PCIe while ACC200 is an > >>>>>>> integrated > >>>>>> accelerator on Xeon CPU. > >>>>>>> The actual implementation are not the same, underlying IP are all > >>>>>>> distinct > >>>>>> even if many of the descriptor format have similarities. > >>>>>>> The actual capabilities of the acceleration are different and/or new. > >>>>>>> The workaround and silicon errata are also different causing > >>>>>>> different > >>>>>> limitation and implementation in the driver (see the serie with > >>>>>> ongoing changes for ACC100 in parallel). > >>>>>>> This is fundamentally distinct from ACC101 which was a derivative > >>>>>>> product > >>>>>> from ACC100 and where it made sense to share implementation > >> between > >>>>>> ACC100 and ACC101. > >>>>>>> So in a nutshell these 2 devices and drivers are 2 different > >>>>>>> beasts and the > >>>>>> intention is to keep them intentionally separate as in the serie. > >>>>>>> Let me know if unclear, thanks! > >>>>>> Nic, > >>>>>> > >>>>>> I used a similarity checker to compare acc100 and acc200 > >>>>>> > >>>>>> https://dickgrune.com/Programs/similarity_tester/ > >>>>>> > >>>>>> l=simum.log > >>>>>> if [ -f $l ]; then > >>>>>> rm $l > >>>>>> fi > >>>>>> > >>>>>> sim_c -s -R -o$l -R -p -P -a . > >>>>>> > >>>>>> There results are > >>>>>> > >>>>>> ./acc200/acc200_pf_enum.h consists for 100 % of > >>>>>> ./acc100/acc100_pf_enum.h material ./acc100/acc100_pf_enum.h > >>>>>> consists for 98 % of ./acc200/acc200_pf_enum.h material > >>>>>> ./acc100/rte_acc100_pmd.h consists for > >>>>>> 98 % of ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h > >>>>>> consists for 95 % of ./acc100/acc100_pf_enum.h material > >>>>>> ./acc200/acc200_pmd.h consists for 92 % of > >>>>>> ./acc100/rte_acc100_pmd.h material ./acc200/rte_acc200_cfg.h > >>>>>> consists for 92 % of ./acc100/rte_acc100_cfg.h material > >>>>>> ./acc100/rte_acc100_pmd.c consists for 87 % of > >>>>>> ./acc200/rte_acc200_pmd.c material ./acc100/acc100_vf_enum.h > >>>>>> consists for > >>>>>> 80 % of ./acc200/acc200_pf_enum.h material > >>>>>> ./acc200/rte_acc200_pmd.c consists for 78 % of > >>>>>> ./acc100/rte_acc100_pmd.c material ./acc100/rte_acc100_cfg.h > >>>>>> consists for 75 % of ./acc200/rte_acc200_cfg.h material > >>>>>> > >>>>>> Spot checking the first *pf_enum.h at 100%, these are the devices' > >>>>>> registers, they are the same. > >>>>>> > >>>>>> I raised this similarity issue with 100 vs 101. > >>>>>> > >>>>>> Having multiple copies is difficult to support and should be avoided. > >>>>>> > >>>>>> For the end user, they should have to use only one driver. > >>>>>> > >>>>> There are really different IP and do not have the same interface > >>>>> (PCIe/DDR vs > >>>> integrated) and there is big serie of changes which are specific to > >>>> ACC100 coming in parallel. Any workaround, optimization would be > >> different. > >>>>> I agree that for the coming serie of integrated accelerator we will > >>>>> use a > >>>> unified driver approach but for that very case that would be quite > >>>> messy to artificially put them within the same PMD. > >>>> > >>>> How is the IP different when 100% of the registers are the same ? > >>>> > >>> These are 2 different HW aspects. The base toplevel configuration registers > >> are kept similar on purpose but the underlying IP are totally different design > >> and implementation. > >>> Even the registers have differences but not visible here, the actual RDL file > >> would define more specifically these registers bitfields and implementation > >> including which ones are not implemented (but that is proprietary > >> information), and at bbdev level the interface is not some much register > >> based than processing based on data from DMA. > >>> Basically even if there was a common driver, all these would be duplicated > >> and they are indeed different IP (including different vendors).. > >>> But I agree with the general intent and to have a common driver for the > >> integrated driver serie (ACC200, ACC300...) now that we are moving away > >> from PCIe/DDR lookaside acceleration and eASIC/FPGA implementation > >> (ACC100/AC101). > >> > >> Looking a little deeper, at how the driver is lays out some of its bitfields and > >> private data by reviewing the > >> > >> ./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h > >> > >> There are some minor changes to existing reserved bitfields. > >> A new structure for fft. > >> The acc200_device, the private data for the driver, is an exact copy of > >> acc100_device. > >> > >> acc200_pmd.h is the superset and could be used with little changes as a > >> common acc_pmd.h. > >> acc200 is doing everything the acc100 did in a very similar if not exact way, > >> adding the fft feature. > >> > >> Can you point to some portion of this patchset that is so unique that it could > >> not be abstracted to an if-check or function and so requiring this separate, > >> nearly identical driver ? > >> > > You used a similarity checker really, there are actually way more relevent differences than what you imply here. > > With regards to the 2 pf_enum.h file, there are many registers that have same or similar names but have now different values being mapped hence you just cannot use one for the other. > > Saying that "./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h" is just not correct and really irrelevant. > > Just do a diff side by side please and check, that should be extremely obvious, that metrics tells more about the similarity checker limitation than anything else. > > Even when using a common driver for ACC200/300 they will have distinct register enum files being auto-generated and coming from distinct RDL. > > Again just do a diff of these 2 files. I believe you will agree that is not relevant for these files to try to artificially merged these together. > > > > With regards to the pmd.h, some structure/defines are indeed common and could be moved to a common file (for instance turboencoder and LDPC encoder which are more vanilla and unlikely to change for future product unlike the decoders which have different feature set and behaviour; or some 3GPP constant that can be defined once). > > We can definitely change these to put together shared structures/defines, but not intending to try to artificially put things together with spaghetti code. > > We would like to keep 3 parallel versions of these PMD for 3 different product lines which are indeed fundamentally different designs (including different workaround required as can be seen on the parallel ACC100 serie under review). > > - one version for FPGA implementation (support for N3000, N6000, ...) > > - one version for eASIC lookaside card implementation (ACC100, ACC101, ...) > > - one version for the integrated Xeon accelerators (ACC200, ACC300, ...) > > Some suggestions on refactoring, > > For the registers, have a common file. > > For the shared functionality, ex/ ldpc encoder, break these out to its > own shared file. > > The public interface, see my earlier comments on the documentation, > should be have the same interfaces and the few differences highlighted. +1 to have common files, and all in a single directory drivers/baseband/acc100/ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-09-14 10:35 ` Thomas Monjalon @ 2022-09-14 11:50 ` Maxime Coquelin 2022-09-14 13:19 ` Bruce Richardson 0 siblings, 1 reply; 50+ messages in thread From: Maxime Coquelin @ 2022-09-14 11:50 UTC (permalink / raw) To: Thomas Monjalon, Chautru, Nicolas, dev, gakhil, hemant.agrawal, Vargas, Hernan, Tom Rix Cc: mdr, Richardson, Bruce, david.marchand, stephen On 9/14/22 12:35, Thomas Monjalon wrote: > 06/09/2022 14:51, Tom Rix: >> On 9/1/22 1:34 PM, Chautru, Nicolas wrote: >>> From: Tom Rix <trix@redhat.com> >>>> On 8/31/22 6:26 PM, Chautru, Nicolas wrote: >>>>> From: Tom Rix <trix@redhat.com> >>>>>> On 8/31/22 3:37 PM, Chautru, Nicolas wrote: >>>>>>>>>>> Comparing ACC200 & ACC100 header files, I understand ACC200 is >>>>>>>>>>> an evolution of the ACC10x family. The FEC bits are really >>>>>>>>>>> close, >>>>>>>>>>> ACC200 main addition seems to be FFT acceleration which could be >>>>>>>>>>> handled in ACC10x driver based on device ID. >>>>>>>>>>> >>>>>>>>>>> I think both drivers have to be merged in order to avoid code >>>>>>>>>>> duplication. That's how other families of devices (e.g. i40e) >>>>>>>>>>> are handled. >>>>>>>>>> I haven't seen your reply on this point. >>>>>>>>>> Do you confirm you are working on a single driver for ACC family >>>>>>>>>> in order to avoid code duplication? >>>>>>>>>> >>>>>>>>> The implementation is based on distinct ACC100 and ACC200 drivers. >>>>>>>>> The 2 >>>>>>>> devices are fundamentally different generation, processes and IP. >>>>>>>>> MountBryce is an eASIC device over PCIe while ACC200 is an >>>>>>>>> integrated >>>>>>>> accelerator on Xeon CPU. >>>>>>>>> The actual implementation are not the same, underlying IP are all >>>>>>>>> distinct >>>>>>>> even if many of the descriptor format have similarities. >>>>>>>>> The actual capabilities of the acceleration are different and/or new. >>>>>>>>> The workaround and silicon errata are also different causing >>>>>>>>> different >>>>>>>> limitation and implementation in the driver (see the serie with >>>>>>>> ongoing changes for ACC100 in parallel). >>>>>>>>> This is fundamentally distinct from ACC101 which was a derivative >>>>>>>>> product >>>>>>>> from ACC100 and where it made sense to share implementation >>>> between >>>>>>>> ACC100 and ACC101. >>>>>>>>> So in a nutshell these 2 devices and drivers are 2 different >>>>>>>>> beasts and the >>>>>>>> intention is to keep them intentionally separate as in the serie. >>>>>>>>> Let me know if unclear, thanks! >>>>>>>> Nic, >>>>>>>> >>>>>>>> I used a similarity checker to compare acc100 and acc200 >>>>>>>> >>>>>>>> https://dickgrune.com/Programs/similarity_tester/ >>>>>>>> >>>>>>>> l=simum.log >>>>>>>> if [ -f $l ]; then >>>>>>>> rm $l >>>>>>>> fi >>>>>>>> >>>>>>>> sim_c -s -R -o$l -R -p -P -a . >>>>>>>> >>>>>>>> There results are >>>>>>>> >>>>>>>> ./acc200/acc200_pf_enum.h consists for 100 % of >>>>>>>> ./acc100/acc100_pf_enum.h material ./acc100/acc100_pf_enum.h >>>>>>>> consists for 98 % of ./acc200/acc200_pf_enum.h material >>>>>>>> ./acc100/rte_acc100_pmd.h consists for >>>>>>>> 98 % of ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h >>>>>>>> consists for 95 % of ./acc100/acc100_pf_enum.h material >>>>>>>> ./acc200/acc200_pmd.h consists for 92 % of >>>>>>>> ./acc100/rte_acc100_pmd.h material ./acc200/rte_acc200_cfg.h >>>>>>>> consists for 92 % of ./acc100/rte_acc100_cfg.h material >>>>>>>> ./acc100/rte_acc100_pmd.c consists for 87 % of >>>>>>>> ./acc200/rte_acc200_pmd.c material ./acc100/acc100_vf_enum.h >>>>>>>> consists for >>>>>>>> 80 % of ./acc200/acc200_pf_enum.h material >>>>>>>> ./acc200/rte_acc200_pmd.c consists for 78 % of >>>>>>>> ./acc100/rte_acc100_pmd.c material ./acc100/rte_acc100_cfg.h >>>>>>>> consists for 75 % of ./acc200/rte_acc200_cfg.h material >>>>>>>> >>>>>>>> Spot checking the first *pf_enum.h at 100%, these are the devices' >>>>>>>> registers, they are the same. >>>>>>>> >>>>>>>> I raised this similarity issue with 100 vs 101. >>>>>>>> >>>>>>>> Having multiple copies is difficult to support and should be avoided. >>>>>>>> >>>>>>>> For the end user, they should have to use only one driver. >>>>>>>> >>>>>>> There are really different IP and do not have the same interface >>>>>>> (PCIe/DDR vs >>>>>> integrated) and there is big serie of changes which are specific to >>>>>> ACC100 coming in parallel. Any workaround, optimization would be >>>> different. >>>>>>> I agree that for the coming serie of integrated accelerator we will >>>>>>> use a >>>>>> unified driver approach but for that very case that would be quite >>>>>> messy to artificially put them within the same PMD. >>>>>> >>>>>> How is the IP different when 100% of the registers are the same ? >>>>>> >>>>> These are 2 different HW aspects. The base toplevel configuration registers >>>> are kept similar on purpose but the underlying IP are totally different design >>>> and implementation. >>>>> Even the registers have differences but not visible here, the actual RDL file >>>> would define more specifically these registers bitfields and implementation >>>> including which ones are not implemented (but that is proprietary >>>> information), and at bbdev level the interface is not some much register >>>> based than processing based on data from DMA. >>>>> Basically even if there was a common driver, all these would be duplicated >>>> and they are indeed different IP (including different vendors).. >>>>> But I agree with the general intent and to have a common driver for the >>>> integrated driver serie (ACC200, ACC300...) now that we are moving away >>>> from PCIe/DDR lookaside acceleration and eASIC/FPGA implementation >>>> (ACC100/AC101). >>>> >>>> Looking a little deeper, at how the driver is lays out some of its bitfields and >>>> private data by reviewing the >>>> >>>> ./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h >>>> >>>> There are some minor changes to existing reserved bitfields. >>>> A new structure for fft. >>>> The acc200_device, the private data for the driver, is an exact copy of >>>> acc100_device. >>>> >>>> acc200_pmd.h is the superset and could be used with little changes as a >>>> common acc_pmd.h. >>>> acc200 is doing everything the acc100 did in a very similar if not exact way, >>>> adding the fft feature. >>>> >>>> Can you point to some portion of this patchset that is so unique that it could >>>> not be abstracted to an if-check or function and so requiring this separate, >>>> nearly identical driver ? >>>> >>> You used a similarity checker really, there are actually way more relevent differences than what you imply here. >>> With regards to the 2 pf_enum.h file, there are many registers that have same or similar names but have now different values being mapped hence you just cannot use one for the other. >>> Saying that "./acc200/acc200_pmd.h consists for 92 % of ./acc100/rte_acc100_pmd.h" is just not correct and really irrelevant. >>> Just do a diff side by side please and check, that should be extremely obvious, that metrics tells more about the similarity checker limitation than anything else. >>> Even when using a common driver for ACC200/300 they will have distinct register enum files being auto-generated and coming from distinct RDL. >>> Again just do a diff of these 2 files. I believe you will agree that is not relevant for these files to try to artificially merged these together. >>> >>> With regards to the pmd.h, some structure/defines are indeed common and could be moved to a common file (for instance turboencoder and LDPC encoder which are more vanilla and unlikely to change for future product unlike the decoders which have different feature set and behaviour; or some 3GPP constant that can be defined once). >>> We can definitely change these to put together shared structures/defines, but not intending to try to artificially put things together with spaghetti code. >>> We would like to keep 3 parallel versions of these PMD for 3 different product lines which are indeed fundamentally different designs (including different workaround required as can be seen on the parallel ACC100 serie under review). >>> - one version for FPGA implementation (support for N3000, N6000, ...) >>> - one version for eASIC lookaside card implementation (ACC100, ACC101, ...) >>> - one version for the integrated Xeon accelerators (ACC200, ACC300, ...) >> >> Some suggestions on refactoring, >> >> For the registers, have a common file. >> >> For the shared functionality, ex/ ldpc encoder, break these out to its >> own shared file. >> >> The public interface, see my earlier comments on the documentation, >> should be have the same interfaces and the few differences highlighted. > > +1 to have common files, and all in a single directory drivers/baseband/acc100/ Jus to be sure we are aligned, do you mean to have both drivers in the same directory, which will share some common files? That's the way I would go. Thanks, Maxime ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-09-14 11:50 ` Maxime Coquelin @ 2022-09-14 13:19 ` Bruce Richardson 2022-09-14 13:27 ` Maxime Coquelin 2022-09-14 13:44 ` [EXT] " Akhil Goyal 0 siblings, 2 replies; 50+ messages in thread From: Bruce Richardson @ 2022-09-14 13:19 UTC (permalink / raw) To: Maxime Coquelin Cc: Thomas Monjalon, Chautru, Nicolas, dev, gakhil, hemant.agrawal, Vargas, Hernan, Tom Rix, mdr, david.marchand, stephen On Wed, Sep 14, 2022 at 01:50:05PM +0200, Maxime Coquelin wrote: > > > On 9/14/22 12:35, Thomas Monjalon wrote: > > 06/09/2022 14:51, Tom Rix: > > > On 9/1/22 1:34 PM, Chautru, Nicolas wrote: > > > > From: Tom Rix <trix@redhat.com> > > > > > On 8/31/22 6:26 PM, Chautru, Nicolas wrote: > > > > > > From: Tom Rix <trix@redhat.com> > > > > > > > On 8/31/22 3:37 PM, Chautru, Nicolas wrote: > > > > > > > > > > > > Comparing ACC200 & ACC100 header files, I > > > > > > > > > > > > understand ACC200 is an evolution of the ACC10x > > > > > > > > > > > > family. The FEC bits are really close, ACC200 main > > > > > > > > > > > > addition seems to be FFT acceleration which could > > > > > > > > > > > > be handled in ACC10x driver based on device ID. > > > > > > > > > > > > > > > > > > > > > > > > I think both drivers have to be merged in order to > > > > > > > > > > > > avoid code duplication. That's how other families > > > > > > > > > > > > of devices (e.g. i40e) are handled. > > > > > > > > > > > I haven't seen your reply on this point. Do you > > > > > > > > > > > confirm you are working on a single driver for ACC > > > > > > > > > > > family in order to avoid code duplication? > > > > > > > > > > > > > > > > > > > > > The implementation is based on distinct ACC100 and > > > > > > > > > > ACC200 drivers. The 2 > > > > > > > > > devices are fundamentally different generation, processes > > > > > > > > > and IP. > > > > > > > > > > MountBryce is an eASIC device over PCIe while ACC200 is > > > > > > > > > > an integrated > > > > > > > > > accelerator on Xeon CPU. > > > > > > > > > > The actual implementation are not the same, underlying > > > > > > > > > > IP are all distinct > > > > > > > > > even if many of the descriptor format have similarities. > > > > > > > > > > The actual capabilities of the acceleration are > > > > > > > > > > different and/or new. The workaround and silicon > > > > > > > > > > errata are also different causing different > > > > > > > > > limitation and implementation in the driver (see the > > > > > > > > > serie with ongoing changes for ACC100 in parallel). > > > > > > > > > > This is fundamentally distinct from ACC101 which was a > > > > > > > > > > derivative product > > > > > > > > > from ACC100 and where it made sense to share > > > > > > > > > implementation > > > > > between > > > > > > > > > ACC100 and ACC101. > > > > > > > > > > So in a nutshell these 2 devices and drivers are 2 > > > > > > > > > > different beasts and the > > > > > > > > > intention is to keep them intentionally separate as in > > > > > > > > > the serie. > > > > > > > > > > Let me know if unclear, thanks! > > > > > > > > > Nic, > > > > > > > > > > > > > > > > > > I used a similarity checker to compare acc100 and acc200 > > > > > > > > > > > > > > > > > > https://dickgrune.com/Programs/similarity_tester/ > > > > > > > > > > > > > > > > > > l=simum.log if [ -f $l ]; then rm $l fi > > > > > > > > > > > > > > > > > > sim_c -s -R -o$l -R -p -P -a . > > > > > > > > > > > > > > > > > > There results are > > > > > > > > > > > > > > > > > > ./acc200/acc200_pf_enum.h consists for 100 % of > > > > > > > > > ./acc100/acc100_pf_enum.h material > > > > > > > > > ./acc100/acc100_pf_enum.h consists for 98 % of > > > > > > > > > ./acc200/acc200_pf_enum.h material > > > > > > > > > ./acc100/rte_acc100_pmd.h consists for 98 % of > > > > > > > > > ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h > > > > > > > > > consists for 95 % of ./acc100/acc100_pf_enum.h material > > > > > > > > > ./acc200/acc200_pmd.h consists for 92 % of > > > > > > > > > ./acc100/rte_acc100_pmd.h material > > > > > > > > > ./acc200/rte_acc200_cfg.h consists for 92 % of > > > > > > > > > ./acc100/rte_acc100_cfg.h material > > > > > > > > > ./acc100/rte_acc100_pmd.c consists for 87 % of > > > > > > > > > ./acc200/rte_acc200_pmd.c material > > > > > > > > > ./acc100/acc100_vf_enum.h consists for 80 % of > > > > > > > > > ./acc200/acc200_pf_enum.h material > > > > > > > > > ./acc200/rte_acc200_pmd.c consists for 78 % of > > > > > > > > > ./acc100/rte_acc100_pmd.c material > > > > > > > > > ./acc100/rte_acc100_cfg.h consists for 75 % of > > > > > > > > > ./acc200/rte_acc200_cfg.h material > > > > > > > > > > > > > > > > > > Spot checking the first *pf_enum.h at 100%, these are the > > > > > > > > > devices' registers, they are the same. > > > > > > > > > > > > > > > > > > I raised this similarity issue with 100 vs 101. > > > > > > > > > > > > > > > > > > Having multiple copies is difficult to support and should > > > > > > > > > be avoided. > > > > > > > > > > > > > > > > > > For the end user, they should have to use only one > > > > > > > > > driver. > > > > > > > > > > > > > > > > > There are really different IP and do not have the same > > > > > > > > interface (PCIe/DDR vs > > > > > > > integrated) and there is big serie of changes which are > > > > > > > specific to ACC100 coming in parallel. Any workaround, > > > > > > > optimization would be > > > > > different. > > > > > > > > I agree that for the coming serie of integrated accelerator > > > > > > > > we will use a > > > > > > > unified driver approach but for that very case that would be > > > > > > > quite messy to artificially put them within the same PMD. > > > > > > > > > > > > > > How is the IP different when 100% of the registers are the > > > > > > > same ? > > > > > > > > > > > > > These are 2 different HW aspects. The base toplevel > > > > > > configuration registers > > > > > are kept similar on purpose but the underlying IP are totally > > > > > different design and implementation. > > > > > > Even the registers have differences but not visible here, the > > > > > > actual RDL file > > > > > would define more specifically these registers bitfields and > > > > > implementation including which ones are not implemented (but that > > > > > is proprietary information), and at bbdev level the interface is > > > > > not some much register based than processing based on data from > > > > > DMA. > > > > > > Basically even if there was a common driver, all these would be > > > > > > duplicated > > > > > and they are indeed different IP (including different vendors).. > > > > > > But I agree with the general intent and to have a common driver > > > > > > for the > > > > > integrated driver serie (ACC200, ACC300...) now that we are > > > > > moving away from PCIe/DDR lookaside acceleration and eASIC/FPGA > > > > > implementation (ACC100/AC101). > > > > > > > > > > Looking a little deeper, at how the driver is lays out some of > > > > > its bitfields and private data by reviewing the > > > > > > > > > > ./acc200/acc200_pmd.h consists for 92 % of > > > > > ./acc100/rte_acc100_pmd.h > > > > > > > > > > There are some minor changes to existing reserved bitfields. A > > > > > new structure for fft. The acc200_device, the private data for > > > > > the driver, is an exact copy of acc100_device. > > > > > > > > > > acc200_pmd.h is the superset and could be used with little > > > > > changes as a common acc_pmd.h. acc200 is doing everything the > > > > > acc100 did in a very similar if not exact way, adding the fft > > > > > feature. > > > > > > > > > > Can you point to some portion of this patchset that is so unique > > > > > that it could not be abstracted to an if-check or function and so > > > > > requiring this separate, nearly identical driver ? > > > > > > > > > You used a similarity checker really, there are actually way more > > > > relevent differences than what you imply here. With regards to the > > > > 2 pf_enum.h file, there are many registers that have same or > > > > similar names but have now different values being mapped hence you > > > > just cannot use one for the other. Saying that > > > > "./acc200/acc200_pmd.h consists for 92 % of > > > > ./acc100/rte_acc100_pmd.h" is just not correct and really > > > > irrelevant. Just do a diff side by side please and check, that > > > > should be extremely obvious, that metrics tells more about the > > > > similarity checker limitation than anything else. Even when using > > > > a common driver for ACC200/300 they will have distinct register > > > > enum files being auto-generated and coming from distinct RDL. > > > > Again just do a diff of these 2 files. I believe you will agree > > > > that is not relevant for these files to try to artificially merged > > > > these together. > > > > > > > > With regards to the pmd.h, some structure/defines are indeed common > > > > and could be moved to a common file (for instance turboencoder and > > > > LDPC encoder which are more vanilla and unlikely to change for > > > > future product unlike the decoders which have different feature set > > > > and behaviour; or some 3GPP constant that can be defined once). We > > > > can definitely change these to put together shared > > > > structures/defines, but not intending to try to artificially put > > > > things together with spaghetti code. We would like to keep 3 > > > > parallel versions of these PMD for 3 different product lines which > > > > are indeed fundamentally different designs (including different > > > > workaround required as can be seen on the parallel ACC100 serie > > > > under review). - one version for FPGA implementation (support for > > > > N3000, N6000, ...) - one version for eASIC lookaside card > > > > implementation (ACC100, ACC101, ...) - one version for the > > > > integrated Xeon accelerators (ACC200, ACC300, ...) > > > > > > Some suggestions on refactoring, > > > > > > For the registers, have a common file. > > > > > > For the shared functionality, ex/ ldpc encoder, break these out to > > > its own shared file. > > > > > > The public interface, see my earlier comments on the documentation, > > > should be have the same interfaces and the few differences > > > highlighted. > > > > +1 to have common files, and all in a single directory > > drivers/baseband/acc100/ > > Jus to be sure we are aligned, do you mean to have both drivers in the > same directory, which will share some common files? That's the way I > would go. > I think the expectation is that the two drivers will diverge in future, so having separate directories should be ok, even with common files placed in one directory are shared with another. With meson include paths its pretty trivial to manage if it's just header files, and even if there are common C files, there is always the option of using drivers/common if we want to split them out. As I understand it, right now it's only headers inluding functions which can be static inline, so simple sharing via include paths should work fine. /Bruce ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v1 00/10] baseband/acc200 2022-09-14 13:19 ` Bruce Richardson @ 2022-09-14 13:27 ` Maxime Coquelin 2022-09-14 13:44 ` [EXT] " Akhil Goyal 1 sibling, 0 replies; 50+ messages in thread From: Maxime Coquelin @ 2022-09-14 13:27 UTC (permalink / raw) To: Bruce Richardson Cc: Thomas Monjalon, Chautru, Nicolas, dev, gakhil, hemant.agrawal, Vargas, Hernan, Tom Rix, mdr, david.marchand, stephen On 9/14/22 15:19, Bruce Richardson wrote: > On Wed, Sep 14, 2022 at 01:50:05PM +0200, Maxime Coquelin wrote: >> >> >> On 9/14/22 12:35, Thomas Monjalon wrote: >>> 06/09/2022 14:51, Tom Rix: >>>> On 9/1/22 1:34 PM, Chautru, Nicolas wrote: >>>>> From: Tom Rix <trix@redhat.com> >>>>>> On 8/31/22 6:26 PM, Chautru, Nicolas wrote: >>>>>>> From: Tom Rix <trix@redhat.com> >>>>>>>> On 8/31/22 3:37 PM, Chautru, Nicolas wrote: >>>>>>>>>>>>> Comparing ACC200 & ACC100 header files, I >>>>>>>>>>>>> understand ACC200 is an evolution of the ACC10x >>>>>>>>>>>>> family. The FEC bits are really close, ACC200 main >>>>>>>>>>>>> addition seems to be FFT acceleration which could >>>>>>>>>>>>> be handled in ACC10x driver based on device ID. >>>>>>>>>>>>> >>>>>>>>>>>>> I think both drivers have to be merged in order to >>>>>>>>>>>>> avoid code duplication. That's how other families >>>>>>>>>>>>> of devices (e.g. i40e) are handled. >>>>>>>>>>>> I haven't seen your reply on this point. Do you >>>>>>>>>>>> confirm you are working on a single driver for ACC >>>>>>>>>>>> family in order to avoid code duplication? >>>>>>>>>>>> >>>>>>>>>>> The implementation is based on distinct ACC100 and >>>>>>>>>>> ACC200 drivers. The 2 >>>>>>>>>> devices are fundamentally different generation, processes >>>>>>>>>> and IP. >>>>>>>>>>> MountBryce is an eASIC device over PCIe while ACC200 is >>>>>>>>>>> an integrated >>>>>>>>>> accelerator on Xeon CPU. >>>>>>>>>>> The actual implementation are not the same, underlying >>>>>>>>>>> IP are all distinct >>>>>>>>>> even if many of the descriptor format have similarities. >>>>>>>>>>> The actual capabilities of the acceleration are >>>>>>>>>>> different and/or new. The workaround and silicon >>>>>>>>>>> errata are also different causing different >>>>>>>>>> limitation and implementation in the driver (see the >>>>>>>>>> serie with ongoing changes for ACC100 in parallel). >>>>>>>>>>> This is fundamentally distinct from ACC101 which was a >>>>>>>>>>> derivative product >>>>>>>>>> from ACC100 and where it made sense to share >>>>>>>>>> implementation >>>>>> between >>>>>>>>>> ACC100 and ACC101. >>>>>>>>>>> So in a nutshell these 2 devices and drivers are 2 >>>>>>>>>>> different beasts and the >>>>>>>>>> intention is to keep them intentionally separate as in >>>>>>>>>> the serie. >>>>>>>>>>> Let me know if unclear, thanks! >>>>>>>>>> Nic, >>>>>>>>>> >>>>>>>>>> I used a similarity checker to compare acc100 and acc200 >>>>>>>>>> >>>>>>>>>> https://dickgrune.com/Programs/similarity_tester/ >>>>>>>>>> >>>>>>>>>> l=simum.log if [ -f $l ]; then rm $l fi >>>>>>>>>> >>>>>>>>>> sim_c -s -R -o$l -R -p -P -a . >>>>>>>>>> >>>>>>>>>> There results are >>>>>>>>>> >>>>>>>>>> ./acc200/acc200_pf_enum.h consists for 100 % of >>>>>>>>>> ./acc100/acc100_pf_enum.h material >>>>>>>>>> ./acc100/acc100_pf_enum.h consists for 98 % of >>>>>>>>>> ./acc200/acc200_pf_enum.h material >>>>>>>>>> ./acc100/rte_acc100_pmd.h consists for 98 % of >>>>>>>>>> ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h >>>>>>>>>> consists for 95 % of ./acc100/acc100_pf_enum.h material >>>>>>>>>> ./acc200/acc200_pmd.h consists for 92 % of >>>>>>>>>> ./acc100/rte_acc100_pmd.h material >>>>>>>>>> ./acc200/rte_acc200_cfg.h consists for 92 % of >>>>>>>>>> ./acc100/rte_acc100_cfg.h material >>>>>>>>>> ./acc100/rte_acc100_pmd.c consists for 87 % of >>>>>>>>>> ./acc200/rte_acc200_pmd.c material >>>>>>>>>> ./acc100/acc100_vf_enum.h consists for 80 % of >>>>>>>>>> ./acc200/acc200_pf_enum.h material >>>>>>>>>> ./acc200/rte_acc200_pmd.c consists for 78 % of >>>>>>>>>> ./acc100/rte_acc100_pmd.c material >>>>>>>>>> ./acc100/rte_acc100_cfg.h consists for 75 % of >>>>>>>>>> ./acc200/rte_acc200_cfg.h material >>>>>>>>>> >>>>>>>>>> Spot checking the first *pf_enum.h at 100%, these are the >>>>>>>>>> devices' registers, they are the same. >>>>>>>>>> >>>>>>>>>> I raised this similarity issue with 100 vs 101. >>>>>>>>>> >>>>>>>>>> Having multiple copies is difficult to support and should >>>>>>>>>> be avoided. >>>>>>>>>> >>>>>>>>>> For the end user, they should have to use only one >>>>>>>>>> driver. >>>>>>>>>> >>>>>>>>> There are really different IP and do not have the same >>>>>>>>> interface (PCIe/DDR vs >>>>>>>> integrated) and there is big serie of changes which are >>>>>>>> specific to ACC100 coming in parallel. Any workaround, >>>>>>>> optimization would be >>>>>> different. >>>>>>>>> I agree that for the coming serie of integrated accelerator >>>>>>>>> we will use a >>>>>>>> unified driver approach but for that very case that would be >>>>>>>> quite messy to artificially put them within the same PMD. >>>>>>>> >>>>>>>> How is the IP different when 100% of the registers are the >>>>>>>> same ? >>>>>>>> >>>>>>> These are 2 different HW aspects. The base toplevel >>>>>>> configuration registers >>>>>> are kept similar on purpose but the underlying IP are totally >>>>>> different design and implementation. >>>>>>> Even the registers have differences but not visible here, the >>>>>>> actual RDL file >>>>>> would define more specifically these registers bitfields and >>>>>> implementation including which ones are not implemented (but that >>>>>> is proprietary information), and at bbdev level the interface is >>>>>> not some much register based than processing based on data from >>>>>> DMA. >>>>>>> Basically even if there was a common driver, all these would be >>>>>>> duplicated >>>>>> and they are indeed different IP (including different vendors).. >>>>>>> But I agree with the general intent and to have a common driver >>>>>>> for the >>>>>> integrated driver serie (ACC200, ACC300...) now that we are >>>>>> moving away from PCIe/DDR lookaside acceleration and eASIC/FPGA >>>>>> implementation (ACC100/AC101). >>>>>> >>>>>> Looking a little deeper, at how the driver is lays out some of >>>>>> its bitfields and private data by reviewing the >>>>>> >>>>>> ./acc200/acc200_pmd.h consists for 92 % of >>>>>> ./acc100/rte_acc100_pmd.h >>>>>> >>>>>> There are some minor changes to existing reserved bitfields. A >>>>>> new structure for fft. The acc200_device, the private data for >>>>>> the driver, is an exact copy of acc100_device. >>>>>> >>>>>> acc200_pmd.h is the superset and could be used with little >>>>>> changes as a common acc_pmd.h. acc200 is doing everything the >>>>>> acc100 did in a very similar if not exact way, adding the fft >>>>>> feature. >>>>>> >>>>>> Can you point to some portion of this patchset that is so unique >>>>>> that it could not be abstracted to an if-check or function and so >>>>>> requiring this separate, nearly identical driver ? >>>>>> >>>>> You used a similarity checker really, there are actually way more >>>>> relevent differences than what you imply here. With regards to the >>>>> 2 pf_enum.h file, there are many registers that have same or >>>>> similar names but have now different values being mapped hence you >>>>> just cannot use one for the other. Saying that >>>>> "./acc200/acc200_pmd.h consists for 92 % of >>>>> ./acc100/rte_acc100_pmd.h" is just not correct and really >>>>> irrelevant. Just do a diff side by side please and check, that >>>>> should be extremely obvious, that metrics tells more about the >>>>> similarity checker limitation than anything else. Even when using >>>>> a common driver for ACC200/300 they will have distinct register >>>>> enum files being auto-generated and coming from distinct RDL. >>>>> Again just do a diff of these 2 files. I believe you will agree >>>>> that is not relevant for these files to try to artificially merged >>>>> these together. >>>>> >>>>> With regards to the pmd.h, some structure/defines are indeed common >>>>> and could be moved to a common file (for instance turboencoder and >>>>> LDPC encoder which are more vanilla and unlikely to change for >>>>> future product unlike the decoders which have different feature set >>>>> and behaviour; or some 3GPP constant that can be defined once). We >>>>> can definitely change these to put together shared >>>>> structures/defines, but not intending to try to artificially put >>>>> things together with spaghetti code. We would like to keep 3 >>>>> parallel versions of these PMD for 3 different product lines which >>>>> are indeed fundamentally different designs (including different >>>>> workaround required as can be seen on the parallel ACC100 serie >>>>> under review). - one version for FPGA implementation (support for >>>>> N3000, N6000, ...) - one version for eASIC lookaside card >>>>> implementation (ACC100, ACC101, ...) - one version for the >>>>> integrated Xeon accelerators (ACC200, ACC300, ...) >>>> >>>> Some suggestions on refactoring, >>>> >>>> For the registers, have a common file. >>>> >>>> For the shared functionality, ex/ ldpc encoder, break these out to >>>> its own shared file. >>>> >>>> The public interface, see my earlier comments on the documentation, >>>> should be have the same interfaces and the few differences >>>> highlighted. >>> >>> +1 to have common files, and all in a single directory >>> drivers/baseband/acc100/ >> >> Jus to be sure we are aligned, do you mean to have both drivers in the >> same directory, which will share some common files? That's the way I >> would go. >> > > I think the expectation is that the two drivers will diverge in future, so > having separate directories should be ok, even with common files placed in > one directory are shared with another. With meson include paths its pretty > trivial to manage if it's just header files, and even if there are common C > files, there is always the option of using drivers/common if we want to > split them out. As I understand it, right now it's only headers inluding > functions which can be static inline, so simple sharing via include paths > should work fine. Ok, then I prefer having the common parts in drivers/common/acc, in order to make it clear changes to these common files have impact on other drivers than ACC100. Is that good for you? Thanks, Maxime > /Bruce > ^ permalink raw reply [flat|nested] 50+ messages in thread
* RE: [EXT] Re: [PATCH v1 00/10] baseband/acc200 2022-09-14 13:19 ` Bruce Richardson 2022-09-14 13:27 ` Maxime Coquelin @ 2022-09-14 13:44 ` Akhil Goyal 2022-09-14 14:23 ` Thomas Monjalon 1 sibling, 1 reply; 50+ messages in thread From: Akhil Goyal @ 2022-09-14 13:44 UTC (permalink / raw) To: Bruce Richardson, Maxime Coquelin Cc: Thomas Monjalon, Chautru, Nicolas, dev, hemant.agrawal, Vargas, Hernan, Tom Rix, mdr, david.marchand, stephen > > > > On 9/14/22 12:35, Thomas Monjalon wrote: > > > 06/09/2022 14:51, Tom Rix: > > > > On 9/1/22 1:34 PM, Chautru, Nicolas wrote: > > > > > From: Tom Rix <trix@redhat.com> > > > > > > On 8/31/22 6:26 PM, Chautru, Nicolas wrote: > > > > > > > From: Tom Rix <trix@redhat.com> > > > > > > > > On 8/31/22 3:37 PM, Chautru, Nicolas wrote: > > > > > > > > > > > > > Comparing ACC200 & ACC100 header files, I > > > > > > > > > > > > > understand ACC200 is an evolution of the ACC10x > > > > > > > > > > > > > family. The FEC bits are really close, ACC200 main > > > > > > > > > > > > > addition seems to be FFT acceleration which could > > > > > > > > > > > > > be handled in ACC10x driver based on device ID. > > > > > > > > > > > > > > > > > > > > > > > > > > I think both drivers have to be merged in order to > > > > > > > > > > > > > avoid code duplication. That's how other families > > > > > > > > > > > > > of devices (e.g. i40e) are handled. > > > > > > > > > > > > I haven't seen your reply on this point. Do you > > > > > > > > > > > > confirm you are working on a single driver for ACC > > > > > > > > > > > > family in order to avoid code duplication? > > > > > > > > > > > > > > > > > > > > > > > The implementation is based on distinct ACC100 and > > > > > > > > > > > ACC200 drivers. The 2 > > > > > > > > > > devices are fundamentally different generation, processes > > > > > > > > > > and IP. > > > > > > > > > > > MountBryce is an eASIC device over PCIe while ACC200 is > > > > > > > > > > > an integrated > > > > > > > > > > accelerator on Xeon CPU. > > > > > > > > > > > The actual implementation are not the same, underlying > > > > > > > > > > > IP are all distinct > > > > > > > > > > even if many of the descriptor format have similarities. > > > > > > > > > > > The actual capabilities of the acceleration are > > > > > > > > > > > different and/or new. The workaround and silicon > > > > > > > > > > > errata are also different causing different > > > > > > > > > > limitation and implementation in the driver (see the > > > > > > > > > > serie with ongoing changes for ACC100 in parallel). > > > > > > > > > > > This is fundamentally distinct from ACC101 which was a > > > > > > > > > > > derivative product > > > > > > > > > > from ACC100 and where it made sense to share > > > > > > > > > > implementation > > > > > > between > > > > > > > > > > ACC100 and ACC101. > > > > > > > > > > > So in a nutshell these 2 devices and drivers are 2 > > > > > > > > > > > different beasts and the > > > > > > > > > > intention is to keep them intentionally separate as in > > > > > > > > > > the serie. > > > > > > > > > > > Let me know if unclear, thanks! > > > > > > > > > > Nic, > > > > > > > > > > > > > > > > > > > > I used a similarity checker to compare acc100 and acc200 > > > > > > > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__dickgrune.com_Programs_similarity- > 5Ftester_&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=DnL7Si2wl_PRwpZ9TW > ey3eu68gBzn7DkPwuqhd6WNyo&m=m846AfMSeaddoC0hcxp1mHU4LV3jRUpw > P-Ie_41buNb9nACOR5La8n8LEFBCnn4t&s=9dgMzk9UMFLA- > b1EJbi4lK3GG6mNMgOXZRyDZqe00TU&e= > > > > > > > > > > > > > > > > > > > > l=simum.log if [ -f $l ]; then rm $l fi > > > > > > > > > > > > > > > > > > > > sim_c -s -R -o$l -R -p -P -a . > > > > > > > > > > > > > > > > > > > > There results are > > > > > > > > > > > > > > > > > > > > ./acc200/acc200_pf_enum.h consists for 100 % of > > > > > > > > > > ./acc100/acc100_pf_enum.h material > > > > > > > > > > ./acc100/acc100_pf_enum.h consists for 98 % of > > > > > > > > > > ./acc200/acc200_pf_enum.h material > > > > > > > > > > ./acc100/rte_acc100_pmd.h consists for 98 % of > > > > > > > > > > ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h > > > > > > > > > > consists for 95 % of ./acc100/acc100_pf_enum.h material > > > > > > > > > > ./acc200/acc200_pmd.h consists for 92 % of > > > > > > > > > > ./acc100/rte_acc100_pmd.h material > > > > > > > > > > ./acc200/rte_acc200_cfg.h consists for 92 % of > > > > > > > > > > ./acc100/rte_acc100_cfg.h material > > > > > > > > > > ./acc100/rte_acc100_pmd.c consists for 87 % of > > > > > > > > > > ./acc200/rte_acc200_pmd.c material > > > > > > > > > > ./acc100/acc100_vf_enum.h consists for 80 % of > > > > > > > > > > ./acc200/acc200_pf_enum.h material > > > > > > > > > > ./acc200/rte_acc200_pmd.c consists for 78 % of > > > > > > > > > > ./acc100/rte_acc100_pmd.c material > > > > > > > > > > ./acc100/rte_acc100_cfg.h consists for 75 % of > > > > > > > > > > ./acc200/rte_acc200_cfg.h material > > > > > > > > > > > > > > > > > > > > Spot checking the first *pf_enum.h at 100%, these are the > > > > > > > > > > devices' registers, they are the same. > > > > > > > > > > > > > > > > > > > > I raised this similarity issue with 100 vs 101. > > > > > > > > > > > > > > > > > > > > Having multiple copies is difficult to support and should > > > > > > > > > > be avoided. > > > > > > > > > > > > > > > > > > > > For the end user, they should have to use only one > > > > > > > > > > driver. > > > > > > > > > > > > > > > > > > > There are really different IP and do not have the same > > > > > > > > > interface (PCIe/DDR vs > > > > > > > > integrated) and there is big serie of changes which are > > > > > > > > specific to ACC100 coming in parallel. Any workaround, > > > > > > > > optimization would be > > > > > > different. > > > > > > > > > I agree that for the coming serie of integrated accelerator > > > > > > > > > we will use a > > > > > > > > unified driver approach but for that very case that would be > > > > > > > > quite messy to artificially put them within the same PMD. > > > > > > > > > > > > > > > > How is the IP different when 100% of the registers are the > > > > > > > > same ? > > > > > > > > > > > > > > > These are 2 different HW aspects. The base toplevel > > > > > > > configuration registers > > > > > > are kept similar on purpose but the underlying IP are totally > > > > > > different design and implementation. > > > > > > > Even the registers have differences but not visible here, the > > > > > > > actual RDL file > > > > > > would define more specifically these registers bitfields and > > > > > > implementation including which ones are not implemented (but that > > > > > > is proprietary information), and at bbdev level the interface is > > > > > > not some much register based than processing based on data from > > > > > > DMA. > > > > > > > Basically even if there was a common driver, all these would be > > > > > > > duplicated > > > > > > and they are indeed different IP (including different vendors).. > > > > > > > But I agree with the general intent and to have a common driver > > > > > > > for the > > > > > > integrated driver serie (ACC200, ACC300...) now that we are > > > > > > moving away from PCIe/DDR lookaside acceleration and eASIC/FPGA > > > > > > implementation (ACC100/AC101). > > > > > > > > > > > > Looking a little deeper, at how the driver is lays out some of > > > > > > its bitfields and private data by reviewing the > > > > > > > > > > > > ./acc200/acc200_pmd.h consists for 92 % of > > > > > > ./acc100/rte_acc100_pmd.h > > > > > > > > > > > > There are some minor changes to existing reserved bitfields. A > > > > > > new structure for fft. The acc200_device, the private data for > > > > > > the driver, is an exact copy of acc100_device. > > > > > > > > > > > > acc200_pmd.h is the superset and could be used with little > > > > > > changes as a common acc_pmd.h. acc200 is doing everything the > > > > > > acc100 did in a very similar if not exact way, adding the fft > > > > > > feature. > > > > > > > > > > > > Can you point to some portion of this patchset that is so unique > > > > > > that it could not be abstracted to an if-check or function and so > > > > > > requiring this separate, nearly identical driver ? > > > > > > > > > > > You used a similarity checker really, there are actually way more > > > > > relevent differences than what you imply here. With regards to the > > > > > 2 pf_enum.h file, there are many registers that have same or > > > > > similar names but have now different values being mapped hence you > > > > > just cannot use one for the other. Saying that > > > > > "./acc200/acc200_pmd.h consists for 92 % of > > > > > ./acc100/rte_acc100_pmd.h" is just not correct and really > > > > > irrelevant. Just do a diff side by side please and check, that > > > > > should be extremely obvious, that metrics tells more about the > > > > > similarity checker limitation than anything else. Even when using > > > > > a common driver for ACC200/300 they will have distinct register > > > > > enum files being auto-generated and coming from distinct RDL. > > > > > Again just do a diff of these 2 files. I believe you will agree > > > > > that is not relevant for these files to try to artificially merged > > > > > these together. > > > > > > > > > > With regards to the pmd.h, some structure/defines are indeed common > > > > > and could be moved to a common file (for instance turboencoder and > > > > > LDPC encoder which are more vanilla and unlikely to change for > > > > > future product unlike the decoders which have different feature set > > > > > and behaviour; or some 3GPP constant that can be defined once). We > > > > > can definitely change these to put together shared > > > > > structures/defines, but not intending to try to artificially put > > > > > things together with spaghetti code. We would like to keep 3 > > > > > parallel versions of these PMD for 3 different product lines which > > > > > are indeed fundamentally different designs (including different > > > > > workaround required as can be seen on the parallel ACC100 serie > > > > > under review). - one version for FPGA implementation (support for > > > > > N3000, N6000, ...) - one version for eASIC lookaside card > > > > > implementation (ACC100, ACC101, ...) - one version for the > > > > > integrated Xeon accelerators (ACC200, ACC300, ...) > > > > > > > > Some suggestions on refactoring, > > > > > > > > For the registers, have a common file. > > > > > > > > For the shared functionality, ex/ ldpc encoder, break these out to > > > > its own shared file. > > > > > > > > The public interface, see my earlier comments on the documentation, > > > > should be have the same interfaces and the few differences > > > > highlighted. > > > > > > +1 to have common files, and all in a single directory > > > drivers/baseband/acc100/ > > > > Jus to be sure we are aligned, do you mean to have both drivers in the > > same directory, which will share some common files? That's the way I > > would go. > > > > I think the expectation is that the two drivers will diverge in future, so > having separate directories should be ok, even with common files placed in > one directory are shared with another. With meson include paths its pretty > trivial to manage if it's just header files, and even if there are common C > files, there is always the option of using drivers/common if we want to > split them out. As I understand it, right now it's only headers inluding > functions which can be static inline, so simple sharing via include paths > should work fine. > It can be ok to have 2 separate directories, but - is it not possible to have them in same directory say 'acc' for all affiliated devices. Similar to other vendors' devices (cnxk, i40e, mlx). - Can both the devices - acc100 and acc200 coexist? If not, same directory is good enough. - there can be multiple files or directories in 'acc' which can be named appropriately to denote the actual device(acc100/200). Having cross dependency across different drivers of same type looks a kind of hacking the meson. This was a reason we moved to have a drivers/common/ for some of the drivers. Also including "../acc100/abc.h" does not look appropriate to me. IMO, we should not add unnecessary directories when the code is common and can be managed in a single one. However, technically it is also ok to have 2 separate directories. But, agreeing on this will set a precedence for future next generation devices from the same vendors. It may be a topic of discussion in techboard. -Akhil ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [EXT] Re: [PATCH v1 00/10] baseband/acc200 2022-09-14 13:44 ` [EXT] " Akhil Goyal @ 2022-09-14 14:23 ` Thomas Monjalon 2022-09-14 19:57 ` Chautru, Nicolas 0 siblings, 1 reply; 50+ messages in thread From: Thomas Monjalon @ 2022-09-14 14:23 UTC (permalink / raw) To: Bruce Richardson, Maxime Coquelin, Akhil Goyal, Chautru, Nicolas Cc: dev, hemant.agrawal, Vargas, Hernan, Tom Rix, mdr, david.marchand, stephen 14/09/2022 15:44, Akhil Goyal: > > > On 9/14/22 12:35, Thomas Monjalon wrote: > > > > 06/09/2022 14:51, Tom Rix: > > > > > On 9/1/22 1:34 PM, Chautru, Nicolas wrote: > > > > > > From: Tom Rix <trix@redhat.com> > > > > > > > On 8/31/22 6:26 PM, Chautru, Nicolas wrote: > > > > > > > > From: Tom Rix <trix@redhat.com> > > > > > > > > > On 8/31/22 3:37 PM, Chautru, Nicolas wrote: > > > > > > > > > > > > > > Comparing ACC200 & ACC100 header files, I > > > > > > > > > > > > > > understand ACC200 is an evolution of the ACC10x > > > > > > > > > > > > > > family. The FEC bits are really close, ACC200 main > > > > > > > > > > > > > > addition seems to be FFT acceleration which could > > > > > > > > > > > > > > be handled in ACC10x driver based on device ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think both drivers have to be merged in order to > > > > > > > > > > > > > > avoid code duplication. That's how other families > > > > > > > > > > > > > > of devices (e.g. i40e) are handled. > > > > > > > > > > > > > I haven't seen your reply on this point. Do you > > > > > > > > > > > > > confirm you are working on a single driver for ACC > > > > > > > > > > > > > family in order to avoid code duplication? > > > > > > > > > > > > > > > > > > > > > > > > > The implementation is based on distinct ACC100 and > > > > > > > > > > > > ACC200 drivers. The 2 > > > > > > > > > > > devices are fundamentally different generation, processes > > > > > > > > > > > and IP. > > > > > > > > > > > > MountBryce is an eASIC device over PCIe while ACC200 is > > > > > > > > > > > > an integrated > > > > > > > > > > > accelerator on Xeon CPU. > > > > > > > > > > > > The actual implementation are not the same, underlying > > > > > > > > > > > > IP are all distinct > > > > > > > > > > > even if many of the descriptor format have similarities. > > > > > > > > > > > > The actual capabilities of the acceleration are > > > > > > > > > > > > different and/or new. The workaround and silicon > > > > > > > > > > > > errata are also different causing different > > > > > > > > > > > limitation and implementation in the driver (see the > > > > > > > > > > > serie with ongoing changes for ACC100 in parallel). > > > > > > > > > > > > This is fundamentally distinct from ACC101 which was a > > > > > > > > > > > > derivative product > > > > > > > > > > > from ACC100 and where it made sense to share > > > > > > > > > > > implementation > > > > > > > between > > > > > > > > > > > ACC100 and ACC101. > > > > > > > > > > > > So in a nutshell these 2 devices and drivers are 2 > > > > > > > > > > > > different beasts and the > > > > > > > > > > > intention is to keep them intentionally separate as in > > > > > > > > > > > the serie. > > > > > > > > > > > > Let me know if unclear, thanks! > > > > > > > > > > > Nic, > > > > > > > > > > > > > > > > > > > > > > I used a similarity checker to compare acc100 and acc200 > > > > > > > > > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https- > > 3A__dickgrune.com_Programs_similarity- > > 5Ftester_&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=DnL7Si2wl_PRwpZ9TW > > ey3eu68gBzn7DkPwuqhd6WNyo&m=m846AfMSeaddoC0hcxp1mHU4LV3jRUpw > > P-Ie_41buNb9nACOR5La8n8LEFBCnn4t&s=9dgMzk9UMFLA- > > b1EJbi4lK3GG6mNMgOXZRyDZqe00TU&e= > > > > > > > > > > > > > > > > > > > > > > l=simum.log if [ -f $l ]; then rm $l fi > > > > > > > > > > > > > > > > > > > > > > sim_c -s -R -o$l -R -p -P -a . > > > > > > > > > > > > > > > > > > > > > > There results are > > > > > > > > > > > > > > > > > > > > > > ./acc200/acc200_pf_enum.h consists for 100 % of > > > > > > > > > > > ./acc100/acc100_pf_enum.h material > > > > > > > > > > > ./acc100/acc100_pf_enum.h consists for 98 % of > > > > > > > > > > > ./acc200/acc200_pf_enum.h material > > > > > > > > > > > ./acc100/rte_acc100_pmd.h consists for 98 % of > > > > > > > > > > > ./acc200/acc200_pmd.h material ./acc200/acc200_vf_enum.h > > > > > > > > > > > consists for 95 % of ./acc100/acc100_pf_enum.h material > > > > > > > > > > > ./acc200/acc200_pmd.h consists for 92 % of > > > > > > > > > > > ./acc100/rte_acc100_pmd.h material > > > > > > > > > > > ./acc200/rte_acc200_cfg.h consists for 92 % of > > > > > > > > > > > ./acc100/rte_acc100_cfg.h material > > > > > > > > > > > ./acc100/rte_acc100_pmd.c consists for 87 % of > > > > > > > > > > > ./acc200/rte_acc200_pmd.c material > > > > > > > > > > > ./acc100/acc100_vf_enum.h consists for 80 % of > > > > > > > > > > > ./acc200/acc200_pf_enum.h material > > > > > > > > > > > ./acc200/rte_acc200_pmd.c consists for 78 % of > > > > > > > > > > > ./acc100/rte_acc100_pmd.c material > > > > > > > > > > > ./acc100/rte_acc100_cfg.h consists for 75 % of > > > > > > > > > > > ./acc200/rte_acc200_cfg.h material > > > > > > > > > > > > > > > > > > > > > > Spot checking the first *pf_enum.h at 100%, these are the > > > > > > > > > > > devices' registers, they are the same. > > > > > > > > > > > > > > > > > > > > > > I raised this similarity issue with 100 vs 101. > > > > > > > > > > > > > > > > > > > > > > Having multiple copies is difficult to support and should > > > > > > > > > > > be avoided. > > > > > > > > > > > > > > > > > > > > > > For the end user, they should have to use only one > > > > > > > > > > > driver. > > > > > > > > > > > > > > > > > > > > > There are really different IP and do not have the same > > > > > > > > > > interface (PCIe/DDR vs > > > > > > > > > integrated) and there is big serie of changes which are > > > > > > > > > specific to ACC100 coming in parallel. Any workaround, > > > > > > > > > optimization would be > > > > > > > different. > > > > > > > > > > I agree that for the coming serie of integrated accelerator > > > > > > > > > > we will use a > > > > > > > > > unified driver approach but for that very case that would be > > > > > > > > > quite messy to artificially put them within the same PMD. > > > > > > > > > > > > > > > > > > How is the IP different when 100% of the registers are the > > > > > > > > > same ? > > > > > > > > > > > > > > > > > These are 2 different HW aspects. The base toplevel > > > > > > > > configuration registers > > > > > > > are kept similar on purpose but the underlying IP are totally > > > > > > > different design and implementation. > > > > > > > > Even the registers have differences but not visible here, the > > > > > > > > actual RDL file > > > > > > > would define more specifically these registers bitfields and > > > > > > > implementation including which ones are not implemented (but that > > > > > > > is proprietary information), and at bbdev level the interface is > > > > > > > not some much register based than processing based on data from > > > > > > > DMA. > > > > > > > > Basically even if there was a common driver, all these would be > > > > > > > > duplicated > > > > > > > and they are indeed different IP (including different vendors).. > > > > > > > > But I agree with the general intent and to have a common driver > > > > > > > > for the > > > > > > > integrated driver serie (ACC200, ACC300...) now that we are > > > > > > > moving away from PCIe/DDR lookaside acceleration and eASIC/FPGA > > > > > > > implementation (ACC100/AC101). > > > > > > > > > > > > > > Looking a little deeper, at how the driver is lays out some of > > > > > > > its bitfields and private data by reviewing the > > > > > > > > > > > > > > ./acc200/acc200_pmd.h consists for 92 % of > > > > > > > ./acc100/rte_acc100_pmd.h > > > > > > > > > > > > > > There are some minor changes to existing reserved bitfields. A > > > > > > > new structure for fft. The acc200_device, the private data for > > > > > > > the driver, is an exact copy of acc100_device. > > > > > > > > > > > > > > acc200_pmd.h is the superset and could be used with little > > > > > > > changes as a common acc_pmd.h. acc200 is doing everything the > > > > > > > acc100 did in a very similar if not exact way, adding the fft > > > > > > > feature. > > > > > > > > > > > > > > Can you point to some portion of this patchset that is so unique > > > > > > > that it could not be abstracted to an if-check or function and so > > > > > > > requiring this separate, nearly identical driver ? > > > > > > > > > > > > > You used a similarity checker really, there are actually way more > > > > > > relevent differences than what you imply here. With regards to the > > > > > > 2 pf_enum.h file, there are many registers that have same or > > > > > > similar names but have now different values being mapped hence you > > > > > > just cannot use one for the other. Saying that > > > > > > "./acc200/acc200_pmd.h consists for 92 % of > > > > > > ./acc100/rte_acc100_pmd.h" is just not correct and really > > > > > > irrelevant. Just do a diff side by side please and check, that > > > > > > should be extremely obvious, that metrics tells more about the > > > > > > similarity checker limitation than anything else. Even when using > > > > > > a common driver for ACC200/300 they will have distinct register > > > > > > enum files being auto-generated and coming from distinct RDL. > > > > > > Again just do a diff of these 2 files. I believe you will agree > > > > > > that is not relevant for these files to try to artificially merged > > > > > > these together. > > > > > > > > > > > > With regards to the pmd.h, some structure/defines are indeed common > > > > > > and could be moved to a common file (for instance turboencoder and > > > > > > LDPC encoder which are more vanilla and unlikely to change for > > > > > > future product unlike the decoders which have different feature set > > > > > > and behaviour; or some 3GPP constant that can be defined once). We > > > > > > can definitely change these to put together shared > > > > > > structures/defines, but not intending to try to artificially put > > > > > > things together with spaghetti code. We would like to keep 3 > > > > > > parallel versions of these PMD for 3 different product lines which > > > > > > are indeed fundamentally different designs (including different > > > > > > workaround required as can be seen on the parallel ACC100 serie > > > > > > under review). - one version for FPGA implementation (support for > > > > > > N3000, N6000, ...) - one version for eASIC lookaside card > > > > > > implementation (ACC100, ACC101, ...) - one version for the > > > > > > integrated Xeon accelerators (ACC200, ACC300, ...) > > > > > > > > > > Some suggestions on refactoring, > > > > > > > > > > For the registers, have a common file. > > > > > > > > > > For the shared functionality, ex/ ldpc encoder, break these out to > > > > > its own shared file. > > > > > > > > > > The public interface, see my earlier comments on the documentation, > > > > > should be have the same interfaces and the few differences > > > > > highlighted. > > > > > > > > +1 to have common files, and all in a single directory > > > > drivers/baseband/acc100/ > > > > > > Jus to be sure we are aligned, do you mean to have both drivers in the > > > same directory, which will share some common files? That's the way I > > > would go. > > > > > > > I think the expectation is that the two drivers will diverge in future, so > > having separate directories should be ok, even with common files placed in > > one directory are shared with another. With meson include paths its pretty > > trivial to manage if it's just header files, and even if there are common C > > files, there is always the option of using drivers/common if we want to > > split them out. As I understand it, right now it's only headers inluding > > functions which can be static inline, so simple sharing via include paths > > should work fine. > > > It can be ok to have 2 separate directories, but > - is it not possible to have them in same directory say 'acc' for all affiliated devices. > Similar to other vendors' devices (cnxk, i40e, mlx). > - Can both the devices - acc100 and acc200 coexist? If not, same directory is good enough. > - there can be multiple files or directories in 'acc' which can be named appropriately to > denote the actual device(acc100/200). > > Having cross dependency across different drivers of same type looks a kind of hacking the meson. > This was a reason we moved to have a drivers/common/ for some of the drivers. > Also including "../acc100/abc.h" does not look appropriate to me. > > IMO, we should not add unnecessary directories when the code is common and can be managed in a single one. > > However, technically it is also ok to have 2 separate directories. But, agreeing on this will set a > precedence for future next generation devices from the same vendors. It may be a topic of discussion in techboard. Let me be frank, I don't trust Intel saying the hardware will be too much different in future. For mlx5, we manage to handle very different devices (like DPU and changing processors) in a single driver. So I agree with Maxime and Akhil that a single driver in a single directory should be enough. Having different registers in different devices is not enough to split. The worst case would be to have a common directory acc/ but it may be a bit disappointing. ^ permalink raw reply [flat|nested] 50+ messages in thread
* RE: [EXT] Re: [PATCH v1 00/10] baseband/acc200 2022-09-14 14:23 ` Thomas Monjalon @ 2022-09-14 19:57 ` Chautru, Nicolas 2022-09-14 20:08 ` Maxime Coquelin 0 siblings, 1 reply; 50+ messages in thread From: Chautru, Nicolas @ 2022-09-14 19:57 UTC (permalink / raw) To: Thomas Monjalon, Richardson, Bruce, Maxime Coquelin, Akhil Goyal Cc: dev, hemant.agrawal, Vargas, Hernan, Tom Rix, mdr, david.marchand, stephen Hi Thomas, Akhil, Bruce, Maxime, > -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Wednesday, September 14, 2022 7:23 AM > To: Richardson, Bruce <bruce.richardson@intel.com>; Maxime Coquelin > <maxime.coquelin@redhat.com>; Akhil Goyal <gakhil@marvell.com>; > Chautru, Nicolas <nicolas.chautru@intel.com> > Cc: dev@dpdk.org; hemant.agrawal@nxp.com; Vargas, Hernan > <hernan.vargas@intel.com>; Tom Rix <trix@redhat.com>; mdr@ashroe.eu; > david.marchand@redhat.com; stephen@networkplumber.org > Subject: Re: [EXT] Re: [PATCH v1 00/10] baseband/acc200 > > 14/09/2022 15:44, Akhil Goyal: > > > > On 9/14/22 12:35, Thomas Monjalon wrote: > > > > > 06/09/2022 14:51, Tom Rix: > > > > > > On 9/1/22 1:34 PM, Chautru, Nicolas wrote: > > > > > > > From: Tom Rix <trix@redhat.com> > > > > > > > > On 8/31/22 6:26 PM, Chautru, Nicolas wrote: > > > > > > > > > From: Tom Rix <trix@redhat.com> > > > > > > > > > > On 8/31/22 3:37 PM, Chautru, Nicolas wrote: <snip> > > > > > > > > > > > > > > With regards to the pmd.h, some structure/defines are indeed > > > > > > > common and could be moved to a common file (for instance > > > > > > > turboencoder and LDPC encoder which are more vanilla and > > > > > > > unlikely to change for future product unlike the decoders > > > > > > > which have different feature set and behaviour; or some 3GPP > > > > > > > constant that can be defined once). We can definitely > > > > > > > change these to put together shared structures/defines, but > > > > > > > not intending to try to artificially put things together > > > > > > > with spaghetti code. We would like to keep 3 parallel > > > > > > > versions of these PMD for 3 different product lines which > > > > > > > are indeed fundamentally different designs (including > > > > > > > different workaround required as can be seen on the parallel > > > > > > > ACC100 serie under review). - one version for FPGA > > > > > > > implementation (support for N3000, N6000, ...) - one version > > > > > > > for eASIC lookaside card implementation (ACC100, ACC101, > > > > > > > ...) - one version for the integrated Xeon accelerators > > > > > > > (ACC200, ACC300, ...) > > > > > > > > > > > > Some suggestions on refactoring, > > > > > > > > > > > > For the registers, have a common file. > > > > > > > > > > > > For the shared functionality, ex/ ldpc encoder, break these > > > > > > out to its own shared file. > > > > > > > > > > > > The public interface, see my earlier comments on the > > > > > > documentation, should be have the same interfaces and the few > > > > > > differences highlighted. > > > > > > > > > > +1 to have common files, and all in a single directory > > > > > drivers/baseband/acc100/ > > > > > > > > Jus to be sure we are aligned, do you mean to have both drivers in > > > > the same directory, which will share some common files? That's the > > > > way I would go. > > > > > > > > > > I think the expectation is that the two drivers will diverge in > > > future, so having separate directories should be ok, even with > > > common files placed in one directory are shared with another. With > > > meson include paths its pretty trivial to manage if it's just header > > > files, and even if there are common C files, there is always the > > > option of using drivers/common if we want to split them out. As I > > > understand it, right now it's only headers inluding functions which > > > can be static inline, so simple sharing via include paths should work fine. > > > > > It can be ok to have 2 separate directories, but > > - is it not possible to have them in same directory say 'acc' for all affiliated > devices. > > Similar to other vendors' devices (cnxk, i40e, mlx). > > - Can both the devices - acc100 and acc200 coexist? If not, same directory > is good enough. > > - there can be multiple files or directories in 'acc' which can be > > named appropriately to denote the actual device(acc100/200). > > > > Having cross dependency across different drivers of same type looks a kind > of hacking the meson. > > This was a reason we moved to have a drivers/common/ for some of the > drivers. > > Also including "../acc100/abc.h" does not look appropriate to me. > > > > IMO, we should not add unnecessary directories when the code is common > and can be managed in a single one. > > > > However, technically it is also ok to have 2 separate directories. > > But, agreeing on this will set a precedence for future next generation > devices from the same vendors. It may be a topic of discussion in techboard. > > Let me be frank, I don't trust Intel saying the hardware will be too much > different in future. Thanks for the review and discussion. Let me clarify, this PMD segregation is specific to ACC1xx vs ACC2xxx. There is a clear intent to have a common PMD to encompass the future multiple integrated solutions VRAN accelerators on Xeon (based on ACC200 and future Xeon products in roadmap) but not for ACC1xx. Here we are splitting the ACC1xx and the ACC2xx series (eASIC process with off-die PCIe device with on-card DDR vs a straight integrated Xeon accelerator) which are fundamentally different devices, and notably the ACC100 requiring a lot of SW workaround/mitigations/protections in the code which would not apply moving forward and would clutter the next generations which would be managed and optimized largely independently. Basically these are not just a few registers differences truly. Again future integrated Xeon will shared common driver but always distinct from ACC1xx (only sharing some common code and structure when possible). Here the refactoring effort was to gather all reusable code and structure together; which was useful indeed as there are several common functionalities and structures which could be superseded to be shared relatively seamlessly. > For mlx5, we manage to handle very different devices (like DPU and changing > processors) in a single driver. > So I agree with Maxime and Akhil that a single driver in a single directory > should be enough. > Having different registers in different devices is not enough to split. > > The worst case would be to have a common directory acc/ but it may be a bit > disappointing. > I believe that I hear 2 different options compatible with the 2 PMDs approach: - The one suggested by Akhil and Maxime I think, is to put both ACC100 and ACC200 PMDs under ./baseband/acc/ similarly to what is done for cnxk for instance. In that case the common files are still all in same directory as the 2 PMDs so we don't have do the awkard "includes += include_directories('../acc100')" in meson which was frown upon, since everything in already under /drivers/baseband/acc. - other option suggested by Thomas to put the shared code and structures under ./drivers/common/acc instead of being under ./drivers/acc/acc_common.h which also used for many drivers. My preference may probably be personally for the former option at the moment, but happy to get some form of consensus on this. Thanks and regards, Nic ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [EXT] Re: [PATCH v1 00/10] baseband/acc200 2022-09-14 19:57 ` Chautru, Nicolas @ 2022-09-14 20:08 ` Maxime Coquelin 0 siblings, 0 replies; 50+ messages in thread From: Maxime Coquelin @ 2022-09-14 20:08 UTC (permalink / raw) To: Chautru, Nicolas, Thomas Monjalon, Richardson, Bruce, Akhil Goyal Cc: dev, hemant.agrawal, Vargas, Hernan, Tom Rix, mdr, david.marchand, stephen On 9/14/22 21:57, Chautru, Nicolas wrote: > Hi Thomas, Akhil, Bruce, Maxime, > >> -----Original Message----- >> From: Thomas Monjalon <thomas@monjalon.net> >> Sent: Wednesday, September 14, 2022 7:23 AM >> To: Richardson, Bruce <bruce.richardson@intel.com>; Maxime Coquelin >> <maxime.coquelin@redhat.com>; Akhil Goyal <gakhil@marvell.com>; >> Chautru, Nicolas <nicolas.chautru@intel.com> >> Cc: dev@dpdk.org; hemant.agrawal@nxp.com; Vargas, Hernan >> <hernan.vargas@intel.com>; Tom Rix <trix@redhat.com>; mdr@ashroe.eu; >> david.marchand@redhat.com; stephen@networkplumber.org >> Subject: Re: [EXT] Re: [PATCH v1 00/10] baseband/acc200 >> >> 14/09/2022 15:44, Akhil Goyal: >>>>> On 9/14/22 12:35, Thomas Monjalon wrote: >>>>>> 06/09/2022 14:51, Tom Rix: >>>>>>> On 9/1/22 1:34 PM, Chautru, Nicolas wrote: >>>>>>>> From: Tom Rix <trix@redhat.com> >>>>>>>>> On 8/31/22 6:26 PM, Chautru, Nicolas wrote: >>>>>>>>>> From: Tom Rix <trix@redhat.com> >>>>>>>>>>> On 8/31/22 3:37 PM, Chautru, Nicolas wrote: > <snip> >>>>>>>> >>>>>>>> With regards to the pmd.h, some structure/defines are indeed >>>>>>>> common and could be moved to a common file (for instance >>>>>>>> turboencoder and LDPC encoder which are more vanilla and >>>>>>>> unlikely to change for future product unlike the decoders >>>>>>>> which have different feature set and behaviour; or some 3GPP >>>>>>>> constant that can be defined once). We can definitely >>>>>>>> change these to put together shared structures/defines, but >>>>>>>> not intending to try to artificially put things together >>>>>>>> with spaghetti code. We would like to keep 3 parallel >>>>>>>> versions of these PMD for 3 different product lines which >>>>>>>> are indeed fundamentally different designs (including >>>>>>>> different workaround required as can be seen on the parallel >>>>>>>> ACC100 serie under review). - one version for FPGA >>>>>>>> implementation (support for N3000, N6000, ...) - one version >>>>>>>> for eASIC lookaside card implementation (ACC100, ACC101, >>>>>>>> ...) - one version for the integrated Xeon accelerators >>>>>>>> (ACC200, ACC300, ...) >>>>>>> >>>>>>> Some suggestions on refactoring, >>>>>>> >>>>>>> For the registers, have a common file. >>>>>>> >>>>>>> For the shared functionality, ex/ ldpc encoder, break these >>>>>>> out to its own shared file. >>>>>>> >>>>>>> The public interface, see my earlier comments on the >>>>>>> documentation, should be have the same interfaces and the few >>>>>>> differences highlighted. >>>>>> >>>>>> +1 to have common files, and all in a single directory >>>>>> drivers/baseband/acc100/ >>>>> >>>>> Jus to be sure we are aligned, do you mean to have both drivers in >>>>> the same directory, which will share some common files? That's the >>>>> way I would go. >>>>> >>>> >>>> I think the expectation is that the two drivers will diverge in >>>> future, so having separate directories should be ok, even with >>>> common files placed in one directory are shared with another. With >>>> meson include paths its pretty trivial to manage if it's just header >>>> files, and even if there are common C files, there is always the >>>> option of using drivers/common if we want to split them out. As I >>>> understand it, right now it's only headers inluding functions which >>>> can be static inline, so simple sharing via include paths should work fine. >>>> >>> It can be ok to have 2 separate directories, but >>> - is it not possible to have them in same directory say 'acc' for all affiliated >> devices. >>> Similar to other vendors' devices (cnxk, i40e, mlx). >>> - Can both the devices - acc100 and acc200 coexist? If not, same directory >> is good enough. >>> - there can be multiple files or directories in 'acc' which can be >>> named appropriately to denote the actual device(acc100/200). >>> >>> Having cross dependency across different drivers of same type looks a kind >> of hacking the meson. >>> This was a reason we moved to have a drivers/common/ for some of the >> drivers. >>> Also including "../acc100/abc.h" does not look appropriate to me. >>> >>> IMO, we should not add unnecessary directories when the code is common >> and can be managed in a single one. >>> >>> However, technically it is also ok to have 2 separate directories. >>> But, agreeing on this will set a precedence for future next generation >> devices from the same vendors. It may be a topic of discussion in techboard. >> >> Let me be frank, I don't trust Intel saying the hardware will be too much >> different in future. > > Thanks for the review and discussion. > > Let me clarify, this PMD segregation is specific to ACC1xx vs ACC2xxx. There is a clear intent to have a common PMD to encompass the future multiple integrated solutions VRAN accelerators on Xeon (based on ACC200 and future Xeon products in roadmap) but not for ACC1xx. > Here we are splitting the ACC1xx and the ACC2xx series (eASIC process with off-die PCIe device with on-card DDR vs a straight integrated Xeon accelerator) which are fundamentally different devices, and notably the ACC100 requiring a lot of SW workaround/mitigations/protections in the code which would not apply moving forward and would clutter the next generations which would be managed and optimized largely independently. Basically these are not just a few registers differences truly. > Again future integrated Xeon will shared common driver but always distinct from ACC1xx (only sharing some common code and structure when possible). > Here the refactoring effort was to gather all reusable code and structure together; which was useful indeed as there are several common functionalities and structures which could be superseded to be shared relatively seamlessly. > >> For mlx5, we manage to handle very different devices (like DPU and changing >> processors) in a single driver. >> So I agree with Maxime and Akhil that a single driver in a single directory >> should be enough. >> Having different registers in different devices is not enough to split. >> >> The worst case would be to have a common directory acc/ but it may be a bit >> disappointing. >> > > I believe that I hear 2 different options compatible with the 2 PMDs approach: > - The one suggested by Akhil and Maxime I think, is to put both ACC100 and ACC200 PMDs under ./baseband/acc/ similarly to what is done for cnxk for instance. In that case the common files are still all in same directory as the 2 PMDs so we don't have do the awkard "includes += include_directories('../acc100')" in meson which was frown upon, since everything in already under /drivers/baseband/acc. > - other option suggested by Thomas to put the shared code and structures under ./drivers/common/acc instead of being under ./drivers/acc/acc_common.h which also used for many drivers. > > My preference may probably be personally for the former option at the moment, but happy to get some form of consensus on this. I am fine with former option. drivers/common is especially useful when code is to be shared by different types of devices (e.g. net and crypto). Maxime > > Thanks and regards, > Nic > ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2022-09-14 20:09 UTC | newest] Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-08 0:01 [PATCH v1 00/10] baseband/acc200 Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 01/10] baseband/acc200: introduce PMD for ACC200 Nicolas Chautru 2022-09-12 1:08 ` [PATCH v2 00/11] baseband/acc200 Nic Chautru 2022-09-12 1:08 ` [PATCH v2 01/11] baseband/acc100: refactory to segregate common code Nic Chautru 2022-09-12 15:19 ` Bruce Richardson 2022-09-12 1:08 ` [PATCH v2 02/11] baseband/acc200: introduce PMD for ACC200 Nic Chautru 2022-09-12 15:41 ` Bruce Richardson 2022-09-12 1:08 ` [PATCH v2 03/11] baseband/acc200: add HW register definitions Nic Chautru 2022-09-12 1:08 ` [PATCH v2 04/11] baseband/acc200: add info get function Nic Chautru 2022-09-12 1:08 ` [PATCH v2 05/11] baseband/acc200: add queue configuration Nic Chautru 2022-09-12 1:08 ` [PATCH v2 06/11] baseband/acc200: add LDPC processing functions Nic Chautru 2022-09-12 1:08 ` [PATCH v2 07/11] baseband/acc200: add LTE " Nic Chautru 2022-09-12 1:08 ` [PATCH v2 08/11] baseband/acc200: add support for FFT operations Nic Chautru 2022-09-12 1:08 ` [PATCH v2 09/11] baseband/acc200: support interrupt Nic Chautru 2022-09-12 1:08 ` [PATCH v2 10/11] baseband/acc200: add device status and vf2pf comms Nic Chautru 2022-09-12 1:08 ` [PATCH v2 11/11] baseband/acc200: add PF configure companion function Nic Chautru 2022-07-08 0:01 ` [PATCH v1 02/10] baseband/acc200: add HW register definitions Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 03/10] baseband/acc200: add info get function Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 04/10] baseband/acc200: add queue configuration Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 05/10] baseband/acc200: add LDPC processing functions Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 06/10] baseband/acc200: add LTE " Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 07/10] baseband/acc200: add support for FFT operations Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 08/10] baseband/acc200: support interrupt Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 09/10] baseband/acc200: add device status and vf2pf comms Nicolas Chautru 2022-07-08 0:01 ` [PATCH v1 10/10] baseband/acc200: add PF configure companion function Nicolas Chautru 2022-07-12 13:48 ` [PATCH v1 00/10] baseband/acc200 Maxime Coquelin 2022-07-14 18:49 ` Vargas, Hernan 2022-07-17 13:08 ` Tom Rix 2022-07-22 18:29 ` Vargas, Hernan 2022-07-22 20:19 ` Tom Rix 2022-08-15 17:52 ` Chautru, Nicolas 2022-08-30 7:44 ` Maxime Coquelin 2022-08-30 19:45 ` Chautru, Nicolas 2022-08-31 16:43 ` Maxime Coquelin 2022-08-31 19:20 ` Thomas Monjalon 2022-08-31 19:26 ` Tom Rix 2022-08-31 22:37 ` Chautru, Nicolas 2022-09-01 0:28 ` Tom Rix 2022-09-01 1:26 ` Chautru, Nicolas 2022-09-01 13:49 ` Tom Rix 2022-09-01 20:34 ` Chautru, Nicolas 2022-09-06 12:51 ` Tom Rix 2022-09-14 10:35 ` Thomas Monjalon 2022-09-14 11:50 ` Maxime Coquelin 2022-09-14 13:19 ` Bruce Richardson 2022-09-14 13:27 ` Maxime Coquelin 2022-09-14 13:44 ` [EXT] " Akhil Goyal 2022-09-14 14:23 ` Thomas Monjalon 2022-09-14 19:57 ` Chautru, Nicolas 2022-09-14 20:08 ` Maxime Coquelin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).