DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH v3 00/12] VRB2 bbdev PMD introduction
@ 2023-09-29 16:35 Nicolas Chautru
  2023-09-29 16:35 ` [PATCH v3 01/12] bbdev: add FFT window width member in driver info Nicolas Chautru
                   ` (11 more replies)
  0 siblings, 12 replies; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

v3: updates based on v2 review:
- split into smaller incremental commits
- FFT windowing exposed through a more generic structure
- refactor using wrapper functions to manage device variants
- removed custom dump function
- consider the request unsupport SO option as an error
instead of fall-back. 
- cosmetic and doc update.
Thanks

v2: doc, comments and commit-log updates.

This serie includes changes to the VRB BBDEV PMD for 23.11.

This allows the VRB unified driver to support the new VRB2
implementation variant on GNR-D.

This also include minor change to the dev_info to expose FFT version
flexibility to expose information to the application on what windows
LUT is configured dynamically on the device.

Nicolas Chautru (12):
  bbdev: add FFT window width member in driver info
  baseband/acc: add FFT window width in the VRB PMD
  baseband/acc: remove the 4G SO capability for VRB1
  baseband/acc: allocate FCW memory separately
  baseband/acc: add support for MLD operation
  baseband/acc: refactor to allow unified driver extension
  baseband/acc: adding VRB2 device variant
  baseband/acc: add FEC capabilities for the VRB2 variant
  baseband/acc: add FFT support to VRB2 variant
  baseband/acc: add MLD support in VRB2 variant
  baseband/acc: add support for VRB2 engine error detection
  baseband/acc: add configure helper for VRB2

 doc/guides/bbdevs/features/vrb2.ini    |   14 +
 doc/guides/bbdevs/index.rst            |    1 +
 doc/guides/bbdevs/vrb1.rst             |    4 -
 doc/guides/bbdevs/vrb2.rst             |  206 +++
 doc/guides/rel_notes/release_23_11.rst |    3 +
 drivers/baseband/acc/acc100_pmd.h      |    2 +
 drivers/baseband/acc/acc_common.h      |  172 ++-
 drivers/baseband/acc/rte_acc100_pmd.c  |   10 +-
 drivers/baseband/acc/rte_vrb_pmd.c     | 1801 ++++++++++++++++++++++--
 drivers/baseband/acc/vrb1_pf_enum.h    |   17 +-
 drivers/baseband/acc/vrb2_pf_enum.h    |  124 ++
 drivers/baseband/acc/vrb2_vf_enum.h    |  121 ++
 drivers/baseband/acc/vrb_cfg.h         |   16 +
 drivers/baseband/acc/vrb_pmd.h         |  173 ++-
 lib/bbdev/rte_bbdev.h                  |    2 +
 lib/bbdev/rte_bbdev_op.h               |    2 +
 16 files changed, 2502 insertions(+), 166 deletions(-)
 create mode 100644 doc/guides/bbdevs/features/vrb2.ini
 create mode 100644 doc/guides/bbdevs/vrb2.rst
 create mode 100644 drivers/baseband/acc/vrb2_pf_enum.h
 create mode 100644 drivers/baseband/acc/vrb2_vf_enum.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 01/12] bbdev: add FFT window width member in driver info
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-09-29 16:35 ` [PATCH v3 02/12] baseband/acc: add FFT window width in the VRB PMD Nicolas Chautru
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

This exposes the width of each windowing shape being configured on
the device. This allows to distinguish different version of the
flexible pointwise windowing applied to the FFT and expose
this platform configuration to the application.

The SRS processing chain
(https://doc.dpdk.org/guides/prog_guide/bbdev.html#bbdev-fft-operation)
includes a pointwise multiplication by time window whose shape width
needs to be exposed, notably for accurate SNR estimate.
Using that mechanism user application can retrieve information related
to what has been dynamically programmed on any bbdev device
supporting FFT windowing operation.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 lib/bbdev/rte_bbdev.h    | 2 ++
 lib/bbdev/rte_bbdev_op.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 2985c9f42b..df691c479f 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -349,6 +349,8 @@ struct rte_bbdev_driver_info {
 	const struct rte_bbdev_op_cap *capabilities;
 	/** Device cpu_flag requirements */
 	const enum rte_cpu_flag_t *cpu_flag_reqs;
+	/** FFT width related 2048 FFT for each window. */
+	uint16_t fft_window_width[RTE_BBDEV_MAX_FFT_WIN];
 };
 
 /** Macro used at end of bbdev PMD list */
diff --git a/lib/bbdev/rte_bbdev_op.h b/lib/bbdev/rte_bbdev_op.h
index 693baa8386..9d27226ca6 100644
--- a/lib/bbdev/rte_bbdev_op.h
+++ b/lib/bbdev/rte_bbdev_op.h
@@ -51,6 +51,8 @@ extern "C" {
 /* 12 CS maximum */
 #define RTE_BBDEV_MAX_CS_2 (6)
 #define RTE_BBDEV_MAX_CS   (12)
+/* Up to 16 windows for FFT. */
+#define RTE_BBDEV_MAX_FFT_WIN (16)
 /* MLD-TS up to 4 layers */
 #define RTE_BBDEV_MAX_MLD_LAYERS (4)
 /* 12 SB per RB */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 02/12] baseband/acc: add FFT window width in the VRB PMD
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
  2023-09-29 16:35 ` [PATCH v3 01/12] bbdev: add FFT window width member in driver info Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-10-03 11:52   ` Maxime Coquelin
  2023-09-29 16:35 ` [PATCH v3 03/12] baseband/acc: remove the 4G SO capability for VRB1 Nicolas Chautru
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

This allows to expose the FFT window width being introduced in
previous commit based on what is configured dynamically on the
device platform.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 drivers/baseband/acc/acc_common.h  |  3 +++
 drivers/baseband/acc/rte_vrb_pmd.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/baseband/acc/acc_common.h b/drivers/baseband/acc/acc_common.h
index 5bb00746c3..7d24c644c0 100644
--- a/drivers/baseband/acc/acc_common.h
+++ b/drivers/baseband/acc/acc_common.h
@@ -512,6 +512,8 @@ struct acc_deq_intr_details {
 enum {
 	ACC_VF2PF_STATUS_REQUEST = 1,
 	ACC_VF2PF_USING_VF = 2,
+	ACC_VF2PF_LUT_VER_REQUEST = 3,
+	ACC_VF2PF_FFT_WIN_REQUEST = 4,
 };
 
 
@@ -558,6 +560,7 @@ struct acc_device {
 	queue_offset_fun_t queue_offset;  /* Device specific queue offset */
 	uint16_t num_qgroups;
 	uint16_t num_aqs;
+	uint16_t fft_window_width[RTE_BBDEV_MAX_FFT_WIN]; /* FFT windowing width. */
 };
 
 /* Structure associated with each queue. */
diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index 9e5a73c9c7..c5a74bae11 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -298,6 +298,34 @@ vrb_device_status(struct rte_bbdev *dev)
 	return reg;
 }
 
+/* Request device FFT windowing information. */
+static inline void
+vrb_device_fft_win(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
+{
+	struct acc_device *d = dev->data->dev_private;
+	uint32_t reg, time_out = 0, win;
+
+	if (d->pf_device)
+		return;
+
+	/* Check from the device the first time. */
+	if (d->fft_window_width[0] == 0) {
+		for (win = 0; win < RTE_BBDEV_MAX_FFT_WIN; win++) {
+			vrb_vf2pf(d, ACC_VF2PF_FFT_WIN_REQUEST | win);
+			reg = acc_reg_read(d, d->reg_addr->pf2vf_doorbell);
+			while ((time_out < ACC_STATUS_TO) && (reg == RTE_BBDEV_DEV_NOSTATUS)) {
+				usleep(ACC_STATUS_WAIT); /*< Wait or VF->PF->VF Comms */
+				reg = acc_reg_read(d, d->reg_addr->pf2vf_doorbell);
+				time_out++;
+			}
+			d->fft_window_width[win] = reg;
+		}
+	}
+
+	for (win = 0; win < RTE_BBDEV_MAX_FFT_WIN; win++)
+		dev_info->fft_window_width[win] = d->fft_window_width[win];
+}
+
 /* Checks PF Info Ring to find the interrupt cause and handles it accordingly. */
 static inline void
 vrb_check_ir(struct acc_device *acc_dev)
@@ -1100,6 +1128,7 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
 	fetch_acc_config(dev);
 	/* Check the status of device. */
 	dev_info->device_status = vrb_device_status(dev);
+	vrb_device_fft_win(dev, dev_info);
 
 	/* Exposed number of queues. */
 	dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 03/12] baseband/acc: remove the 4G SO capability for VRB1
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
  2023-09-29 16:35 ` [PATCH v3 01/12] bbdev: add FFT window width member in driver info Nicolas Chautru
  2023-09-29 16:35 ` [PATCH v3 02/12] baseband/acc: add FFT window width in the VRB PMD Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-10-03 12:04   ` Maxime Coquelin
  2023-09-29 16:35 ` [PATCH v3 04/12] baseband/acc: allocate FCW memory separately Nicolas Chautru
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

This removes the specific capability and support of LTE Decoder
Soft Output option on the VRB1 PMD.

This is triggered as a vendor decision to defeature the related optional
capability so that to avoid theoretical risk of race conditions
impacting the device reliability. That optional APP LLR output is
not impacting the actual decoder hard output.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 doc/guides/bbdevs/vrb1.rst         |  4 ----
 drivers/baseband/acc/rte_vrb_pmd.c | 10 ++++++----
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/doc/guides/bbdevs/vrb1.rst b/doc/guides/bbdevs/vrb1.rst
index 9c48d30964..fdefb20651 100644
--- a/doc/guides/bbdevs/vrb1.rst
+++ b/doc/guides/bbdevs/vrb1.rst
@@ -71,11 +71,7 @@ The Intel vRAN Boost v1.0 PMD supports the following bbdev capabilities:
    - ``RTE_BBDEV_TURBO_EARLY_TERMINATION``: set early termination feature.
    - ``RTE_BBDEV_TURBO_DEC_SCATTER_GATHER``: supports scatter-gather for input/output data.
    - ``RTE_BBDEV_TURBO_HALF_ITERATION_EVEN``: set half iteration granularity.
-   - ``RTE_BBDEV_TURBO_SOFT_OUTPUT``: set the APP LLR soft output.
-   - ``RTE_BBDEV_TURBO_EQUALIZER``: set the turbo equalizer feature.
-   - ``RTE_BBDEV_TURBO_SOFT_OUT_SATURATE``: set the soft output saturation.
    - ``RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH``: set to run an extra odd iteration after CRC match.
-   - ``RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT``: set if negative APP LLR output supported.
    - ``RTE_BBDEV_TURBO_MAP_DEC``: supports flexible parallel MAP engine decoding.
 
 * For the FFT operation:
diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index c5a74bae11..f11882f90e 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -1025,15 +1025,11 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
 					RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE |
 					RTE_BBDEV_TURBO_CRC_TYPE_24B |
 					RTE_BBDEV_TURBO_DEC_CRC_24B_DROP |
-					RTE_BBDEV_TURBO_EQUALIZER |
-					RTE_BBDEV_TURBO_SOFT_OUT_SATURATE |
 					RTE_BBDEV_TURBO_HALF_ITERATION_EVEN |
 					RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH |
-					RTE_BBDEV_TURBO_SOFT_OUTPUT |
 					RTE_BBDEV_TURBO_EARLY_TERMINATION |
 					RTE_BBDEV_TURBO_DEC_INTERRUPTS |
 					RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN |
-					RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT |
 					RTE_BBDEV_TURBO_MAP_DEC |
 					RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP |
 					RTE_BBDEV_TURBO_DEC_SCATTER_GATHER,
@@ -1982,6 +1978,12 @@ enqueue_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
 	struct rte_mbuf *input, *h_output_head, *h_output,
 		*s_output_head, *s_output;
 
+	if ((q->d->device_variant == VRB1_VARIANT) &&
+			(check_bit(op->turbo_dec.op_flags, RTE_BBDEV_TURBO_SOFT_OUTPUT))) {
+		/* SO not supported for VRB1. */
+		return -EPERM;
+	}
+
 	desc = acc_desc(q, total_enqueued_cbs);
 	vrb_fcw_td_fill(op, &desc->req.fcw_td);
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 04/12] baseband/acc: allocate FCW memory separately
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
                   ` (2 preceding siblings ...)
  2023-09-29 16:35 ` [PATCH v3 03/12] baseband/acc: remove the 4G SO capability for VRB1 Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-10-03 12:51   ` Maxime Coquelin
  2023-09-29 16:35 ` [PATCH v3 05/12] baseband/acc: add support for MLD operation Nicolas Chautru
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

This allows more flexibility to the FCW size for the
unified driver. No actual functional change.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 drivers/baseband/acc/acc_common.h  |  4 +++-
 drivers/baseband/acc/rte_vrb_pmd.c | 25 ++++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/baseband/acc/acc_common.h b/drivers/baseband/acc/acc_common.h
index 7d24c644c0..2c7425e524 100644
--- a/drivers/baseband/acc/acc_common.h
+++ b/drivers/baseband/acc/acc_common.h
@@ -101,6 +101,7 @@
 #define ACC_NUM_QGRPS_PER_WORD         8
 #define ACC_MAX_NUM_QGRPS              32
 #define ACC_RING_SIZE_GRANULARITY      64
+#define ACC_MAX_FCW_SIZE              128
 
 /* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2 */
 #define ACC_N_ZC_1 66 /* N = 66 Zc for BG 1 */
@@ -584,13 +585,14 @@ struct __rte_cache_aligned acc_queue {
 	uint32_t aq_enqueued;  /* Count how many "batches" have been enqueued */
 	uint32_t aq_dequeued;  /* Count how many "batches" have been dequeued */
 	uint32_t irq_enable;  /* Enable ops dequeue interrupts if set to 1 */
-	struct rte_mempool *fcw_mempool;  /* FCW mempool */
 	enum rte_bbdev_op_type op_type;  /* Type of this Queue: TE or TD */
 	/* Internal Buffers for loopback input */
 	uint8_t *lb_in;
 	uint8_t *lb_out;
+	uint8_t *fcw_ring;
 	rte_iova_t lb_in_addr_iova;
 	rte_iova_t lb_out_addr_iova;
+	rte_iova_t fcw_ring_addr_iova;
 	int8_t *derm_buffer; /* interim buffer for de-rm in SDK */
 	struct acc_device *d;
 };
diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index f11882f90e..cf0551c0c7 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -890,6 +890,25 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t queue_id,
 		goto free_companion_ring_addr;
 	}
 
+	q->fcw_ring = rte_zmalloc_socket(dev->device->driver->name,
+			ACC_MAX_FCW_SIZE * d->sw_ring_max_depth,
+			RTE_CACHE_LINE_SIZE, conf->socket);
+	if (q->fcw_ring == NULL) {
+		rte_bbdev_log(ERR, "Failed to allocate fcw_ring memory");
+		ret = -ENOMEM;
+		goto free_companion_ring_addr;
+	}
+	q->fcw_ring_addr_iova = rte_malloc_virt2iova(q->fcw_ring);
+
+	/* For FFT we need to store the FCW separately */
+	if (conf->op_type == RTE_BBDEV_OP_FFT) {
+		for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) {
+			desc = q->ring_addr + desc_idx;
+			desc->req.data_ptrs[0].address = q->fcw_ring_addr_iova +
+					desc_idx * ACC_MAX_FCW_SIZE;
+		}
+	}
+
 	q->qgrp_id = (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF;
 	q->vf_id = (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F;
 	q->aq_id = q_idx & 0xF;
@@ -1001,6 +1020,7 @@ vrb_queue_release(struct rte_bbdev *dev, uint16_t q_id)
 	if (q != NULL) {
 		/* Mark the Queue as un-assigned. */
 		d->q_assigned_bit_map[q->qgrp_id] &= (~0ULL - (1 << (uint64_t) q->aq_id));
+		rte_free(q->fcw_ring);
 		rte_free(q->companion_ring_addr);
 		rte_free(q->lb_in);
 		rte_free(q->lb_out);
@@ -3234,7 +3254,10 @@ vrb_enqueue_fft_one_op(struct acc_queue *q, struct rte_bbdev_fft_op *op,
 	output = op->fft.base_output.data;
 	in_offset = op->fft.base_input.offset;
 	out_offset = op->fft.base_output.offset;
-	fcw = &desc->req.fcw_fft;
+
+	fcw = (struct acc_fcw_fft *) (q->fcw_ring +
+			((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask)
+			* ACC_MAX_FCW_SIZE);
 
 	vrb1_fcw_fft_fill(op, fcw);
 	vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset, &out_offset);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 05/12] baseband/acc: add support for MLD operation
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
                   ` (3 preceding siblings ...)
  2023-09-29 16:35 ` [PATCH v3 04/12] baseband/acc: allocate FCW memory separately Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-09-29 16:35 ` [PATCH v3 06/12] baseband/acc: refactor to allow unified driver extension Nicolas Chautru
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

There is no functionality related to the MLD operation
but allows the unified PMD to support the operation
being added moving forward.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/baseband/acc/acc_common.h  |  1 +
 drivers/baseband/acc/rte_vrb_pmd.c | 39 ++++++++++++++++++++++++------
 drivers/baseband/acc/vrb_pmd.h     | 12 +++++++++
 3 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/drivers/baseband/acc/acc_common.h b/drivers/baseband/acc/acc_common.h
index 2c7425e524..788abf1a3c 100644
--- a/drivers/baseband/acc/acc_common.h
+++ b/drivers/baseband/acc/acc_common.h
@@ -87,6 +87,7 @@
 #define ACC_FCW_LE_BLEN                32
 #define ACC_FCW_LD_BLEN                36
 #define ACC_FCW_FFT_BLEN               28
+#define ACC_FCW_MLDTS_BLEN             32
 #define ACC_5GUL_SIZE_0                16
 #define ACC_5GUL_SIZE_1                40
 #define ACC_5GUL_OFFSET_0              36
diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index cf0551c0c7..a1de012b40 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -37,7 +37,7 @@ vrb1_queue_offset(bool pf_device, uint8_t vf_id, uint8_t qgrp_id, uint16_t aq_id
 		return ((qgrp_id << 7) + (aq_id << 3) + VRB1_VfQmgrIngressAq);
 }
 
-enum {UL_4G = 0, UL_5G, DL_4G, DL_5G, FFT, NUM_ACC};
+enum {UL_4G = 0, UL_5G, DL_4G, DL_5G, FFT, MLD, NUM_ACC};
 
 /* Return the accelerator enum for a Queue Group Index. */
 static inline int
@@ -53,6 +53,7 @@ accFromQgid(int qg_idx, const struct rte_acc_conf *acc_conf)
 	NumQGroupsPerFn[DL_4G] = acc_conf->q_dl_4g.num_qgroups;
 	NumQGroupsPerFn[DL_5G] = acc_conf->q_dl_5g.num_qgroups;
 	NumQGroupsPerFn[FFT] = acc_conf->q_fft.num_qgroups;
+	NumQGroupsPerFn[MLD] = acc_conf->q_mld.num_qgroups;
 	for (acc = UL_4G;  acc < NUM_ACC; acc++)
 		for (qgIdx = 0; qgIdx < NumQGroupsPerFn[acc]; qgIdx++)
 			accQg[qgIndex++] = acc;
@@ -83,6 +84,9 @@ qtopFromAcc(struct rte_acc_queue_topology **qtop, int acc_enum, struct rte_acc_c
 	case FFT:
 		p_qtop = &(acc_conf->q_fft);
 		break;
+	case MLD:
+		p_qtop = &(acc_conf->q_mld);
+		break;
 	default:
 		/* NOTREACHED. */
 		rte_bbdev_log(ERR, "Unexpected error evaluating %s using %d", __func__, acc_enum);
@@ -139,6 +143,9 @@ initQTop(struct rte_acc_conf *acc_conf)
 	acc_conf->q_fft.num_aqs_per_groups = 0;
 	acc_conf->q_fft.num_qgroups = 0;
 	acc_conf->q_fft.first_qgroup_index = -1;
+	acc_conf->q_mld.num_aqs_per_groups = 0;
+	acc_conf->q_mld.num_qgroups = 0;
+	acc_conf->q_mld.first_qgroup_index = -1;
 }
 
 static inline void
@@ -250,7 +257,7 @@ fetch_acc_config(struct rte_bbdev *dev)
 	}
 
 	rte_bbdev_log_debug(
-			"%s Config LLR SIGN IN/OUT %s %s QG %u %u %u %u %u AQ %u %u %u %u %u Len %u %u %u %u %u\n",
+			"%s Config LLR SIGN IN/OUT %s %s QG %u %u %u %u %u %u AQ %u %u %u %u %u %u Len %u %u %u %u %u %u\n",
 			(d->pf_device) ? "PF" : "VF",
 			(acc_conf->input_pos_llr_1_bit) ? "POS" : "NEG",
 			(acc_conf->output_pos_llr_1_bit) ? "POS" : "NEG",
@@ -259,16 +266,19 @@ fetch_acc_config(struct rte_bbdev *dev)
 			acc_conf->q_ul_5g.num_qgroups,
 			acc_conf->q_dl_5g.num_qgroups,
 			acc_conf->q_fft.num_qgroups,
+			acc_conf->q_mld.num_qgroups,
 			acc_conf->q_ul_4g.num_aqs_per_groups,
 			acc_conf->q_dl_4g.num_aqs_per_groups,
 			acc_conf->q_ul_5g.num_aqs_per_groups,
 			acc_conf->q_dl_5g.num_aqs_per_groups,
 			acc_conf->q_fft.num_aqs_per_groups,
+			acc_conf->q_mld.num_aqs_per_groups,
 			acc_conf->q_ul_4g.aq_depth_log2,
 			acc_conf->q_dl_4g.aq_depth_log2,
 			acc_conf->q_ul_5g.aq_depth_log2,
 			acc_conf->q_dl_5g.aq_depth_log2,
-			acc_conf->q_fft.aq_depth_log2);
+			acc_conf->q_fft.aq_depth_log2,
+			acc_conf->q_mld.aq_depth_log2);
 }
 
 static inline void
@@ -339,7 +349,7 @@ vrb_check_ir(struct acc_device *acc_dev)
 
 	while (ring_data->valid) {
 		if ((ring_data->int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
-				ring_data->int_nb > ACC_PF_INT_DMA_DL5G_DESC_IRQ)) {
+				ring_data->int_nb > ACC_PF_INT_DMA_MLD_DESC_IRQ)) {
 			rte_bbdev_log(WARNING, "InfoRing: ITR:%d Info:0x%x",
 					ring_data->int_nb, ring_data->detailed_info);
 			/* Initialize Info Ring entry and move forward. */
@@ -373,6 +383,7 @@ vrb_dev_interrupt_handler(void *cb_arg)
 			case ACC_PF_INT_DMA_FFT_DESC_IRQ:
 			case ACC_PF_INT_DMA_UL5G_DESC_IRQ:
 			case ACC_PF_INT_DMA_DL5G_DESC_IRQ:
+			case ACC_PF_INT_DMA_MLD_DESC_IRQ:
 				deq_intr_det.queue_id = get_queue_id_from_ring_info(
 						dev->data, *ring_data);
 				if (deq_intr_det.queue_id == UINT16_MAX) {
@@ -400,6 +411,7 @@ vrb_dev_interrupt_handler(void *cb_arg)
 			case ACC_VF_INT_DMA_FFT_DESC_IRQ:
 			case ACC_VF_INT_DMA_UL5G_DESC_IRQ:
 			case ACC_VF_INT_DMA_DL5G_DESC_IRQ:
+			case ACC_VF_INT_DMA_MLD_DESC_IRQ:
 				/* VFs are not aware of their vf_id - it's set to 0.  */
 				ring_data->vf_id = 0;
 				deq_intr_det.queue_id = get_queue_id_from_ring_info(
@@ -748,7 +760,7 @@ vrb_find_free_queue_idx(struct rte_bbdev *dev,
 		const struct rte_bbdev_queue_conf *conf)
 {
 	struct acc_device *d = dev->data->dev_private;
-	int op_2_acc[6] = {0, UL_4G, DL_4G, UL_5G, DL_5G, FFT};
+	int op_2_acc[7] = {0, UL_4G, DL_4G, UL_5G, DL_5G, FFT, MLD};
 	int acc = op_2_acc[conf->op_type];
 	struct rte_acc_queue_topology *qtop = NULL;
 	uint16_t group_idx;
@@ -811,7 +823,8 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t queue_id,
 	int fcw_len = (conf->op_type == RTE_BBDEV_OP_LDPC_ENC ?
 			ACC_FCW_LE_BLEN : (conf->op_type == RTE_BBDEV_OP_TURBO_DEC ?
 			ACC_FCW_TD_BLEN : (conf->op_type == RTE_BBDEV_OP_LDPC_DEC ?
-			ACC_FCW_LD_BLEN : ACC_FCW_FFT_BLEN)));
+			ACC_FCW_LD_BLEN : (conf->op_type == RTE_BBDEV_OP_FFT ?
+			ACC_FCW_FFT_BLEN : ACC_FCW_MLDTS_BLEN))));
 
 	for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) {
 		desc = q->ring_addr + desc_idx;
@@ -923,6 +936,8 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t queue_id,
 		q->aq_depth = (1 << d->acc_conf.q_dl_5g.aq_depth_log2);
 	else if (conf->op_type ==  RTE_BBDEV_OP_FFT)
 		q->aq_depth = (1 << d->acc_conf.q_fft.aq_depth_log2);
+	else if (conf->op_type ==  RTE_BBDEV_OP_MLDTS)
+		q->aq_depth = (1 << d->acc_conf.q_mld.aq_depth_log2);
 
 	q->mmio_reg_enqueue = RTE_PTR_ADD(d->mmio_base,
 			d->queue_offset(d->pf_device, q->vf_id, q->qgrp_id, q->aq_id));
@@ -979,6 +994,13 @@ vrb_print_op(struct rte_bbdev_dec_op *op, enum rte_bbdev_op_type op_type,
 			op_dl->ldpc_enc.n_filler, op_dl->ldpc_enc.cb_params.e,
 			op_dl->ldpc_enc.op_flags, op_dl->ldpc_enc.rv_index
 			);
+	} else if (op_type == RTE_BBDEV_OP_MLDTS) {
+		struct rte_bbdev_mldts_op *op_mldts = (struct rte_bbdev_mldts_op *) op;
+		rte_bbdev_log(INFO, "  Op MLD %d RBs %d NL %d Rp %d %d %x\n",
+				index,
+				op_mldts->mldts.num_rbs, op_mldts->mldts.num_layers,
+				op_mldts->mldts.r_rep,
+				op_mldts->mldts.c_rep, op_mldts->mldts.op_flags);
 	}
 }
 
@@ -1158,13 +1180,16 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
 			d->acc_conf.q_dl_5g.num_qgroups;
 	dev_info->num_queues[RTE_BBDEV_OP_FFT] = d->acc_conf.q_fft.num_aqs_per_groups *
 			d->acc_conf.q_fft.num_qgroups;
+	dev_info->num_queues[RTE_BBDEV_OP_MLDTS] = d->acc_conf.q_mld.num_aqs_per_groups *
+			d->acc_conf.q_mld.num_qgroups;
 	dev_info->queue_priority[RTE_BBDEV_OP_TURBO_DEC] = d->acc_conf.q_ul_4g.num_qgroups;
 	dev_info->queue_priority[RTE_BBDEV_OP_TURBO_ENC] = d->acc_conf.q_dl_4g.num_qgroups;
 	dev_info->queue_priority[RTE_BBDEV_OP_LDPC_DEC] = d->acc_conf.q_ul_5g.num_qgroups;
 	dev_info->queue_priority[RTE_BBDEV_OP_LDPC_ENC] = d->acc_conf.q_dl_5g.num_qgroups;
 	dev_info->queue_priority[RTE_BBDEV_OP_FFT] = d->acc_conf.q_fft.num_qgroups;
+	dev_info->queue_priority[RTE_BBDEV_OP_MLDTS] = d->acc_conf.q_mld.num_qgroups;
 	dev_info->max_num_queues = 0;
-	for (i = RTE_BBDEV_OP_NONE; i <= RTE_BBDEV_OP_FFT; i++)
+	for (i = RTE_BBDEV_OP_NONE; i <= RTE_BBDEV_OP_MLDTS; i++)
 		dev_info->max_num_queues += dev_info->num_queues[i];
 	dev_info->queue_size_lim = ACC_MAX_QUEUE_DEPTH;
 	dev_info->hardware_accelerated = true;
diff --git a/drivers/baseband/acc/vrb_pmd.h b/drivers/baseband/acc/vrb_pmd.h
index 01028273e7..1cabc0b7f4 100644
--- a/drivers/baseband/acc/vrb_pmd.h
+++ b/drivers/baseband/acc/vrb_pmd.h
@@ -101,6 +101,8 @@ struct acc_registry_addr {
 	unsigned int dma_ring_ul4g_lo;
 	unsigned int dma_ring_fft_hi;
 	unsigned int dma_ring_fft_lo;
+	unsigned int dma_ring_mld_hi;
+	unsigned int dma_ring_mld_lo;
 	unsigned int ring_size;
 	unsigned int info_ring_hi;
 	unsigned int info_ring_lo;
@@ -116,6 +118,8 @@ struct acc_registry_addr {
 	unsigned int tail_ptrs_ul4g_lo;
 	unsigned int tail_ptrs_fft_hi;
 	unsigned int tail_ptrs_fft_lo;
+	unsigned int tail_ptrs_mld_hi;
+	unsigned int tail_ptrs_mld_lo;
 	unsigned int depth_log0_offset;
 	unsigned int depth_log1_offset;
 	unsigned int qman_group_func;
@@ -140,6 +144,8 @@ static const struct acc_registry_addr vrb1_pf_reg_addr = {
 	.dma_ring_ul4g_lo = VRB1_PfDmaFec4GulDescBaseLoRegVf,
 	.dma_ring_fft_hi = VRB1_PfDmaFftDescBaseHiRegVf,
 	.dma_ring_fft_lo = VRB1_PfDmaFftDescBaseLoRegVf,
+	.dma_ring_mld_hi = 0,
+	.dma_ring_mld_lo = 0,
 	.ring_size =      VRB1_PfQmgrRingSizeVf,
 	.info_ring_hi = VRB1_PfHiInfoRingBaseHiRegPf,
 	.info_ring_lo = VRB1_PfHiInfoRingBaseLoRegPf,
@@ -155,6 +161,8 @@ static const struct acc_registry_addr vrb1_pf_reg_addr = {
 	.tail_ptrs_ul4g_lo = VRB1_PfDmaFec4GulRespPtrLoRegVf,
 	.tail_ptrs_fft_hi = VRB1_PfDmaFftRespPtrHiRegVf,
 	.tail_ptrs_fft_lo = VRB1_PfDmaFftRespPtrLoRegVf,
+	.tail_ptrs_mld_hi = 0,
+	.tail_ptrs_mld_lo = 0,
 	.depth_log0_offset = VRB1_PfQmgrGrpDepthLog20Vf,
 	.depth_log1_offset = VRB1_PfQmgrGrpDepthLog21Vf,
 	.qman_group_func = VRB1_PfQmgrGrpFunction0,
@@ -179,6 +187,8 @@ static const struct acc_registry_addr vrb1_vf_reg_addr = {
 	.dma_ring_ul4g_lo = VRB1_VfDmaFec4GulDescBaseLoRegVf,
 	.dma_ring_fft_hi = VRB1_VfDmaFftDescBaseHiRegVf,
 	.dma_ring_fft_lo = VRB1_VfDmaFftDescBaseLoRegVf,
+	.dma_ring_mld_hi = 0,
+	.dma_ring_mld_lo = 0,
 	.ring_size = VRB1_VfQmgrRingSizeVf,
 	.info_ring_hi = VRB1_VfHiInfoRingBaseHiVf,
 	.info_ring_lo = VRB1_VfHiInfoRingBaseLoVf,
@@ -194,6 +204,8 @@ static const struct acc_registry_addr vrb1_vf_reg_addr = {
 	.tail_ptrs_ul4g_lo = VRB1_VfDmaFec4GulRespPtrLoRegVf,
 	.tail_ptrs_fft_hi = VRB1_VfDmaFftRespPtrHiRegVf,
 	.tail_ptrs_fft_lo = VRB1_VfDmaFftRespPtrLoRegVf,
+	.tail_ptrs_mld_hi = 0,
+	.tail_ptrs_mld_lo = 0,
 	.depth_log0_offset = VRB1_VfQmgrGrpDepthLog20Vf,
 	.depth_log1_offset = VRB1_VfQmgrGrpDepthLog21Vf,
 	.qman_group_func = VRB1_VfQmgrGrpFunction0Vf,
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 06/12] baseband/acc: refactor to allow unified driver extension
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
                   ` (4 preceding siblings ...)
  2023-09-29 16:35 ` [PATCH v3 05/12] baseband/acc: add support for MLD operation Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-10-03 13:14   ` Maxime Coquelin
  2023-09-29 16:35 ` [PATCH v3 07/12] baseband/acc: adding VRB2 device variant Nicolas Chautru
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

Adding a few functions and common code prior to
extending the VRB driver.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 drivers/baseband/acc/acc_common.h     | 164 +++++++++++++++++++++++---
 drivers/baseband/acc/rte_acc100_pmd.c |   4 +-
 drivers/baseband/acc/rte_vrb_pmd.c    |  62 +++++-----
 3 files changed, 184 insertions(+), 46 deletions(-)

diff --git a/drivers/baseband/acc/acc_common.h b/drivers/baseband/acc/acc_common.h
index 788abf1a3c..89893eae43 100644
--- a/drivers/baseband/acc/acc_common.h
+++ b/drivers/baseband/acc/acc_common.h
@@ -18,6 +18,7 @@
 #define ACC_DMA_BLKID_OUT_HARQ      3
 #define ACC_DMA_BLKID_IN_HARQ       3
 #define ACC_DMA_BLKID_IN_MLD_R      3
+#define ACC_DMA_BLKID_DEWIN_IN      3
 
 /* Values used in filling in decode FCWs */
 #define ACC_FCW_TD_VER              1
@@ -103,6 +104,9 @@
 #define ACC_MAX_NUM_QGRPS              32
 #define ACC_RING_SIZE_GRANULARITY      64
 #define ACC_MAX_FCW_SIZE              128
+#define ACC_IQ_SIZE                    4
+
+#define ACC_FCW_FFT_BLEN_3             28
 
 /* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2 */
 #define ACC_N_ZC_1 66 /* N = 66 Zc for BG 1 */
@@ -132,6 +136,17 @@
 #define ACC_LIM_21 14 /* 0.21 */
 #define ACC_LIM_31 20 /* 0.31 */
 #define ACC_MAX_E (128 * 1024 - 2)
+#define ACC_MAX_CS 12
+
+#define ACC100_VARIANT          0
+#define VRB1_VARIANT		2
+#define VRB2_VARIANT		3
+
+/* Queue Index Hierarchy */
+#define VRB1_GRP_ID_SHIFT    10
+#define VRB1_VF_ID_SHIFT     4
+#define VRB2_GRP_ID_SHIFT    12
+#define VRB2_VF_ID_SHIFT     6
 
 /* Helper macro for logging */
 #define rte_acc_log(level, fmt, ...) \
@@ -332,6 +347,37 @@ struct __rte_packed acc_fcw_fft {
 		res:19;
 };
 
+/* FFT Frame Control Word. */
+struct __rte_packed acc_fcw_fft_3 {
+	uint32_t in_frame_size:16,
+		leading_pad_size:16;
+	uint32_t out_frame_size:16,
+		leading_depad_size:16;
+	uint32_t cs_window_sel;
+	uint32_t cs_window_sel2:16,
+		cs_enable_bmap:16;
+	uint32_t num_antennas:8,
+		idft_size:8,
+		dft_size:8,
+		cs_offset:8;
+	uint32_t idft_shift:8,
+		dft_shift:8,
+		cs_multiplier:16;
+	uint32_t bypass:2,
+		fp16_in:1,
+		fp16_out:1,
+		exp_adj:4,
+		power_shift:4,
+		power_en:1,
+		enable_dewin:1,
+		freq_resample_mode:2,
+		depad_output_size:16;
+	uint16_t cs_theta_0[ACC_MAX_CS];
+	uint32_t cs_theta_d[ACC_MAX_CS];
+	int8_t cs_time_offset[ACC_MAX_CS];
+};
+
+
 /* MLD-TS Frame Control Word */
 struct __rte_packed acc_fcw_mldts {
 	uint32_t fcw_version:4,
@@ -473,14 +519,14 @@ union acc_info_ring_data {
 		uint16_t valid: 1;
 	};
 	struct {
-		uint32_t aq_id_3: 6;
-		uint32_t qg_id_3: 5;
-		uint32_t vf_id_3: 6;
-		uint32_t int_nb_3: 6;
-		uint32_t msi_0_3: 1;
-		uint32_t vf2pf_3: 6;
-		uint32_t loop_3: 1;
-		uint32_t valid_3: 1;
+		uint32_t aq_id_vrb2: 6;
+		uint32_t qg_id_vrb2: 5;
+		uint32_t vf_id_vrb2: 6;
+		uint32_t int_nb_vrb2: 6;
+		uint32_t msi_0_vrb2: 1;
+		uint32_t vf2pf_vrb2: 6;
+		uint32_t loop_vrb2: 1;
+		uint32_t valid_vrb2: 1;
 	};
 } __rte_packed;
 
@@ -761,22 +807,105 @@ alloc_sw_rings_min_mem(struct rte_bbdev *dev, struct acc_device *d,
 	free_base_addresses(base_addrs, i);
 }
 
+/* Wrapper to provide VF index from ring data. */
+static inline uint16_t
+vf_from_ring(const union acc_info_ring_data ring_data, uint16_t device_variant) {
+	if (device_variant == VRB2_VARIANT)
+		return ring_data.vf_id_vrb2;
+	else
+		return ring_data.vf_id;
+}
+
+/* Wrapper to provide QG index from ring data. */
+static inline uint16_t
+qg_from_ring(const union acc_info_ring_data ring_data, uint16_t device_variant) {
+	if (device_variant == VRB2_VARIANT)
+		return ring_data.qg_id_vrb2;
+	else
+		return ring_data.qg_id;
+}
+
+/* Wrapper to provide AQ index from ring data. */
+static inline uint16_t
+aq_from_ring(const union acc_info_ring_data ring_data, uint16_t device_variant) {
+	if (device_variant == VRB2_VARIANT)
+		return ring_data.aq_id_vrb2;
+	else
+		return ring_data.aq_id;
+}
+
+/* Wrapper to provide int index from ring data. */
+static inline uint16_t
+int_from_ring(const union acc_info_ring_data ring_data, uint16_t device_variant) {
+	if (device_variant == VRB2_VARIANT)
+		return ring_data.int_nb_vrb2;
+	else
+		return ring_data.int_nb;
+}
+
+/* Wrapper to provide queue index from group and aq index. */
+static inline int
+queue_index(uint16_t group_idx, uint16_t aq_idx, uint16_t device_variant) {
+	if (device_variant == VRB2_VARIANT)
+		return (group_idx << VRB2_GRP_ID_SHIFT) + aq_idx;
+	else
+		return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx;
+}
+
+/* Wrapper to provide queue group from queue index. */
+static inline int
+qg_from_q(uint32_t q_idx, uint16_t device_variant) {
+	if (device_variant == VRB2_VARIANT)
+		return (q_idx >> VRB2_GRP_ID_SHIFT) & 0x1F;
+	else
+		return (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF;
+}
+
+/* Wrapper to provide vf from queue index. */
+static inline int32_t
+vf_from_q(uint32_t q_idx, uint16_t device_variant) {
+	if (device_variant == VRB2_VARIANT)
+		return (q_idx >> VRB2_VF_ID_SHIFT)  & 0x3F;
+	else
+		return (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F;
+}
+
+/* Wrapper to provide aq index from queue index. */
+static inline int32_t
+aq_from_q(uint32_t q_idx, uint16_t device_variant) {
+	if (device_variant == VRB2_VARIANT)
+		return q_idx & 0x3F;
+	else
+		return q_idx & 0xF;
+}
+
+/* Wrapper to set VF index in ring data. */
+static inline int32_t
+set_vf_in_ring(volatile union acc_info_ring_data *ring_data,
+		uint16_t device_variant, uint16_t value) {
+	if (device_variant == VRB2_VARIANT)
+		return ring_data->vf_id_vrb2 = value;
+	else
+		return ring_data->vf_id = value;
+}
+
 /*
  * Find queue_id of a device queue based on details from the Info Ring.
  * If a queue isn't found UINT16_MAX is returned.
  */
 static inline uint16_t
 get_queue_id_from_ring_info(struct rte_bbdev_data *data,
-		const union acc_info_ring_data ring_data)
+		const union acc_info_ring_data ring_data, uint16_t device_variant)
 {
 	uint16_t queue_id;
+	struct acc_queue *acc_q;
 
 	for (queue_id = 0; queue_id < data->num_queues; ++queue_id) {
-		struct acc_queue *acc_q =
-				data->queues[queue_id].queue_private;
-		if (acc_q != NULL && acc_q->aq_id == ring_data.aq_id &&
-				acc_q->qgrp_id == ring_data.qg_id &&
-				acc_q->vf_id == ring_data.vf_id)
+		acc_q = data->queues[queue_id].queue_private;
+
+		if (acc_q != NULL && acc_q->aq_id == aq_from_ring(ring_data, device_variant) &&
+				acc_q->qgrp_id == qg_from_ring(ring_data, device_variant) &&
+				acc_q->vf_id == vf_from_ring(ring_data, device_variant))
 			return queue_id;
 	}
 
@@ -1438,4 +1567,11 @@ get_num_cbs_in_tb_ldpc_enc(struct rte_bbdev_op_ldpc_enc *ldpc_enc)
 	return cbs_in_tb;
 }
 
+static inline void
+acc_reg_fast_write(struct acc_device *d, uint32_t offset, uint32_t value)
+{
+	void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset);
+	mmio_write(reg_addr, value);
+}
+
 #endif /* _ACC_COMMON_H_ */
diff --git a/drivers/baseband/acc/rte_acc100_pmd.c b/drivers/baseband/acc/rte_acc100_pmd.c
index 5362d39c30..7f8d05b5a9 100644
--- a/drivers/baseband/acc/rte_acc100_pmd.c
+++ b/drivers/baseband/acc/rte_acc100_pmd.c
@@ -294,7 +294,7 @@ acc100_pf_interrupt_handler(struct rte_bbdev *dev)
 		case ACC100_PF_INT_DMA_UL5G_DESC_IRQ:
 		case ACC100_PF_INT_DMA_DL5G_DESC_IRQ:
 			deq_intr_det.queue_id = get_queue_id_from_ring_info(
-					dev->data, *ring_data);
+					dev->data, *ring_data, acc100_dev->device_variant);
 			if (deq_intr_det.queue_id == UINT16_MAX) {
 				rte_bbdev_log(ERR,
 						"Couldn't find queue: aq_id: %u, qg_id: %u, vf_id: %u",
@@ -348,7 +348,7 @@ acc100_vf_interrupt_handler(struct rte_bbdev *dev)
 			 */
 			ring_data->vf_id = 0;
 			deq_intr_det.queue_id = get_queue_id_from_ring_info(
-					dev->data, *ring_data);
+					dev->data, *ring_data, acc100_dev->device_variant);
 			if (deq_intr_det.queue_id == UINT16_MAX) {
 				rte_bbdev_log(ERR,
 						"Couldn't find queue: aq_id: %u, qg_id: %u",
diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index a1de012b40..c89c26c59a 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -341,17 +341,18 @@ static inline void
 vrb_check_ir(struct acc_device *acc_dev)
 {
 	volatile union acc_info_ring_data *ring_data;
-	uint16_t info_ring_head = acc_dev->info_ring_head;
+	uint16_t info_ring_head = acc_dev->info_ring_head, int_nb;
 	if (unlikely(acc_dev->info_ring == NULL))
 		return;
 
 	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head & ACC_INFO_RING_MASK);
 
 	while (ring_data->valid) {
-		if ((ring_data->int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
-				ring_data->int_nb > ACC_PF_INT_DMA_MLD_DESC_IRQ)) {
+		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
+		if ((int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
+				int_nb > ACC_PF_INT_DMA_MLD_DESC_IRQ)) {
 			rte_bbdev_log(WARNING, "InfoRing: ITR:%d Info:0x%x",
-					ring_data->int_nb, ring_data->detailed_info);
+					int_nb, ring_data->detailed_info);
 			/* Initialize Info Ring entry and move forward. */
 			ring_data->val = 0;
 		}
@@ -368,16 +369,21 @@ vrb_dev_interrupt_handler(void *cb_arg)
 	struct acc_device *acc_dev = dev->data->dev_private;
 	volatile union acc_info_ring_data *ring_data;
 	struct acc_deq_intr_details deq_intr_det;
+	uint16_t vf_id, aq_id, qg_id, int_nb;
 
 	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head & ACC_INFO_RING_MASK);
 
 	while (ring_data->valid) {
+		vf_id = vf_from_ring(*ring_data, acc_dev->device_variant);
+		aq_id = aq_from_ring(*ring_data, acc_dev->device_variant);
+		qg_id = qg_from_ring(*ring_data, acc_dev->device_variant);
+		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
 		if (acc_dev->pf_device) {
 			rte_bbdev_log_debug(
-					"VRB1 PF Interrupt received, Info Ring data: 0x%x -> %d",
-					ring_data->val, ring_data->int_nb);
+					"PF Interrupt received, Info Ring data: 0x%x -> %d",
+					ring_data->val, int_nb);
 
-			switch (ring_data->int_nb) {
+			switch (int_nb) {
 			case ACC_PF_INT_DMA_DL_DESC_IRQ:
 			case ACC_PF_INT_DMA_UL_DESC_IRQ:
 			case ACC_PF_INT_DMA_FFT_DESC_IRQ:
@@ -385,13 +391,11 @@ vrb_dev_interrupt_handler(void *cb_arg)
 			case ACC_PF_INT_DMA_DL5G_DESC_IRQ:
 			case ACC_PF_INT_DMA_MLD_DESC_IRQ:
 				deq_intr_det.queue_id = get_queue_id_from_ring_info(
-						dev->data, *ring_data);
+						dev->data, *ring_data, acc_dev->device_variant);
 				if (deq_intr_det.queue_id == UINT16_MAX) {
 					rte_bbdev_log(ERR,
 							"Couldn't find queue: aq_id: %u, qg_id: %u, vf_id: %u",
-							ring_data->aq_id,
-							ring_data->qg_id,
-							ring_data->vf_id);
+							aq_id, qg_id, vf_id);
 					return;
 				}
 				rte_bbdev_pmd_callback_process(dev,
@@ -403,9 +407,9 @@ vrb_dev_interrupt_handler(void *cb_arg)
 			}
 		} else {
 			rte_bbdev_log_debug(
-					"VRB1 VF Interrupt received, Info Ring data: 0x%x\n",
+					"VRB VF Interrupt received, Info Ring data: 0x%x\n",
 					ring_data->val);
-			switch (ring_data->int_nb) {
+			switch (int_nb) {
 			case ACC_VF_INT_DMA_DL_DESC_IRQ:
 			case ACC_VF_INT_DMA_UL_DESC_IRQ:
 			case ACC_VF_INT_DMA_FFT_DESC_IRQ:
@@ -413,14 +417,13 @@ vrb_dev_interrupt_handler(void *cb_arg)
 			case ACC_VF_INT_DMA_DL5G_DESC_IRQ:
 			case ACC_VF_INT_DMA_MLD_DESC_IRQ:
 				/* VFs are not aware of their vf_id - it's set to 0.  */
-				ring_data->vf_id = 0;
+				set_vf_in_ring(ring_data, acc_dev->device_variant, 0);
 				deq_intr_det.queue_id = get_queue_id_from_ring_info(
-						dev->data, *ring_data);
+						dev->data, *ring_data, acc_dev->device_variant);
 				if (deq_intr_det.queue_id == UINT16_MAX) {
 					rte_bbdev_log(ERR,
 							"Couldn't find queue: aq_id: %u, qg_id: %u",
-							ring_data->aq_id,
-							ring_data->qg_id);
+							aq_id, qg_id);
 					return;
 				}
 				rte_bbdev_pmd_callback_process(dev,
@@ -435,8 +438,7 @@ vrb_dev_interrupt_handler(void *cb_arg)
 		/* Initialize Info Ring entry and move forward. */
 		ring_data->val = 0;
 		++acc_dev->info_ring_head;
-		ring_data = acc_dev->info_ring +
-				(acc_dev->info_ring_head & ACC_INFO_RING_MASK);
+		ring_data = acc_dev->info_ring + (acc_dev->info_ring_head & ACC_INFO_RING_MASK);
 	}
 }
 
@@ -556,8 +558,7 @@ vrb_setup_queues(struct rte_bbdev *dev, uint16_t num_queues, int socket_id)
 
 	/* Configure tail pointer for use when SDONE enabled. */
 	if (d->tail_ptrs == NULL)
-		d->tail_ptrs = rte_zmalloc_socket(
-				dev->device->driver->name,
+		d->tail_ptrs = rte_zmalloc_socket(dev->device->driver->name,
 				VRB_MAX_QGRPS * VRB_MAX_AQS * sizeof(uint32_t),
 				RTE_CACHE_LINE_SIZE, socket_id);
 	if (d->tail_ptrs == NULL) {
@@ -783,7 +784,7 @@ vrb_find_free_queue_idx(struct rte_bbdev *dev,
 			/* Mark the Queue as assigned. */
 			d->q_assigned_bit_map[group_idx] |= (1ULL << aq_idx);
 			/* Report the AQ Index. */
-			return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx;
+			return queue_index(group_idx, aq_idx, d->device_variant);
 		}
 	}
 	rte_bbdev_log(INFO, "Failed to find free queue on %s, priority %u",
@@ -922,9 +923,10 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t queue_id,
 		}
 	}
 
-	q->qgrp_id = (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF;
-	q->vf_id = (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F;
-	q->aq_id = q_idx & 0xF;
+	q->qgrp_id = qg_from_q(q_idx, d->device_variant);
+	q->vf_id = vf_from_q(q_idx, d->device_variant);
+	q->aq_id = aq_from_q(q_idx, d->device_variant);
+
 	q->aq_depth = 0;
 	if (conf->op_type ==  RTE_BBDEV_OP_TURBO_DEC)
 		q->aq_depth = (1 << d->acc_conf.q_ul_4g.aq_depth_log2);
@@ -1311,7 +1313,7 @@ vrb_fcw_td_fill(const struct rte_bbdev_dec_op *op, struct acc_fcw_td *fcw)
 		fcw->bypass_teq = 0;
 	}
 
-	fcw->code_block_mode = 1; /* FIXME */
+	fcw->code_block_mode = 1;
 	fcw->turbo_crc_type = check_bit(op->turbo_dec.op_flags,
 			RTE_BBDEV_TURBO_CRC_TYPE_24B);
 
@@ -1471,8 +1473,8 @@ vrb_dma_desc_td_fill(struct rte_bbdev_dec_op *op,
 	if (op->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) {
 		k = op->turbo_dec.tb_params.k_pos;
 		e = (r < op->turbo_dec.tb_params.cab)
-			? op->turbo_dec.tb_params.ea
-			: op->turbo_dec.tb_params.eb;
+				? op->turbo_dec.tb_params.ea
+				: op->turbo_dec.tb_params.eb;
 	} else {
 		k = op->turbo_dec.cb_params.k;
 		e = op->turbo_dec.cb_params.e;
@@ -1726,7 +1728,7 @@ vrb_dma_desc_ld_update(struct rte_bbdev_dec_op *op,
 	desc->op_addr = op;
 }
 
-/* Enqueue one encode operations for device in CB mode */
+/* Enqueue one encode operations for device in CB mode. */
 static inline int
 enqueue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op *op,
 		uint16_t total_enqueued_cbs)
@@ -2263,7 +2265,7 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
 	return current_enqueued_cbs;
 }
 
-/* Enqueue one decode operations for device in TB mode */
+/* Enqueue one decode operations for device in TB mode. */
 static inline int
 enqueue_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
 		uint16_t total_enqueued_cbs, uint8_t cbs_in_tb)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 07/12] baseband/acc: adding VRB2 device variant
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
                   ` (5 preceding siblings ...)
  2023-09-29 16:35 ` [PATCH v3 06/12] baseband/acc: refactor to allow unified driver extension Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-10-03 13:41   ` Maxime Coquelin
  2023-09-29 16:35 ` [PATCH v3 08/12] baseband/acc: add FEC capabilities for the VRB2 variant Nicolas Chautru
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

No functionality exposed only device enumeration and
configuration.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 doc/guides/bbdevs/features/vrb2.ini    |  14 ++
 doc/guides/bbdevs/index.rst            |   1 +
 doc/guides/bbdevs/vrb2.rst             | 206 +++++++++++++++++++++++++
 doc/guides/rel_notes/release_23_11.rst |   3 +
 drivers/baseband/acc/rte_vrb_pmd.c     | 156 +++++++++++++++----
 drivers/baseband/acc/vrb2_pf_enum.h    | 124 +++++++++++++++
 drivers/baseband/acc/vrb2_vf_enum.h    | 121 +++++++++++++++
 drivers/baseband/acc/vrb_pmd.h         | 161 ++++++++++++++++++-
 8 files changed, 751 insertions(+), 35 deletions(-)
 create mode 100644 doc/guides/bbdevs/features/vrb2.ini
 create mode 100644 doc/guides/bbdevs/vrb2.rst
 create mode 100644 drivers/baseband/acc/vrb2_pf_enum.h
 create mode 100644 drivers/baseband/acc/vrb2_vf_enum.h

diff --git a/doc/guides/bbdevs/features/vrb2.ini b/doc/guides/bbdevs/features/vrb2.ini
new file mode 100644
index 0000000000..23ca6990b7
--- /dev/null
+++ b/doc/guides/bbdevs/features/vrb2.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'Intel vRAN Boost v2' baseband driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Turbo Decoder (4G)     = Y
+Turbo Encoder (4G)     = Y
+LDPC Decoder (5G)      = Y
+LDPC Encoder (5G)      = Y
+LLR/HARQ Compression   = Y
+FFT/SRS                = Y
+External DDR Access    = N
+HW Accelerated         = Y
diff --git a/doc/guides/bbdevs/index.rst b/doc/guides/bbdevs/index.rst
index 77d4c54664..269157d77f 100644
--- a/doc/guides/bbdevs/index.rst
+++ b/doc/guides/bbdevs/index.rst
@@ -15,4 +15,5 @@ Baseband Device Drivers
     fpga_5gnr_fec
     acc100
     vrb1
+    vrb2
     la12xx
diff --git a/doc/guides/bbdevs/vrb2.rst b/doc/guides/bbdevs/vrb2.rst
new file mode 100644
index 0000000000..2a30002e05
--- /dev/null
+++ b/doc/guides/bbdevs/vrb2.rst
@@ -0,0 +1,206 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2023 Intel Corporation
+
+.. include:: <isonum.txt>
+
+Intel\ |reg| vRAN Boost v2 Poll Mode Driver (PMD)
+=================================================
+
+The Intel\ |reg| vRAN Boost integrated accelerator enables
+cost-effective 4G and 5G next-generation virtualized Radio Access Network (vRAN)
+solutions.
+The Intel vRAN Boost v2.0 (VRB2 in the code) is specifically integrated on the
+Intel\ |reg| Xeon\ |reg| Granite Rapids-D Process (GNR-D).
+
+Features
+--------
+
+Intel vRAN Boost v2.0 includes a 5G Low Density Parity Check (LDPC) encoder/decoder,
+rate match/dematch, Hybrid Automatic Repeat Request (HARQ) with access to DDR
+memory for buffer management, a 4G Turbo encoder/decoder,
+a Fast Fourier Transform (FFT) block providing DFT/iDFT processing offload
+for the 5G Sounding Reference Signal (SRS), a MLD-TS accelerator, a Queue Manager (QMGR),
+and a DMA subsystem.
+There is no dedicated on-card memory for HARQ, the coherent memory on the CPU side is being used.
+
+These hardware blocks provide the following features exposed by the PMD:
+
+- LDPC Encode in the Downlink (5GNR)
+- LDPC Decode in the Uplink (5GNR)
+- Turbo Encode in the Downlink (4G)
+- Turbo Decode in the Uplink (4G)
+- FFT processing
+- MLD-TS processing
+- Single Root I/O Virtualization (SR-IOV) with 16 Virtual Functions (VFs) per Physical Function (PF)
+- Maximum of 2048 queues per VF
+- Message Signaled Interrupts (MSIs)
+
+The Intel vRAN Boost v2.0 PMD supports the following bbdev capabilities:
+
+* For the LDPC encode operation:
+   - ``RTE_BBDEV_LDPC_CRC_24B_ATTACH``: set to attach CRC24B to CB(s).
+   - ``RTE_BBDEV_LDPC_RATE_MATCH``: if set then do not do Rate Match bypass.
+   - ``RTE_BBDEV_LDPC_INTERLEAVER_BYPASS``: if set then bypass interleaver.
+   - ``RTE_BBDEV_LDPC_ENC_SCATTER_GATHER``: supports scatter-gather for input/output data.
+   - ``RTE_BBDEV_LDPC_ENC_CONCATENATION``: concatenate code blocks with bit granularity.
+
+* For the LDPC decode operation:
+   - ``RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK``: check CRC24B from CB(s).
+   - ``RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP``: drops CRC24B bits appended while decoding.
+   - ``RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK``: check CRC24A from CB(s).
+   - ``RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK``: check CRC16 from CB(s).
+   - ``RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE``: provides an input for HARQ combining.
+   - ``RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE``: provides an input for HARQ combining.
+   - ``RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE``: disable early termination.
+   - ``RTE_BBDEV_LDPC_DEC_SCATTER_GATHER``: supports scatter-gather for input/output data.
+   - ``RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION``: supports compression of the HARQ input/output.
+   - ``RTE_BBDEV_LDPC_LLR_COMPRESSION``: supports LLR input compression.
+   - ``RTE_BBDEV_LDPC_HARQ_4BIT_COMPRESSION``: supports compression of the HARQ input/output.
+   - ``RTE_BBDEV_LDPC_SOFT_OUT_ENABLE``: set the APP LLR soft output.
+   - ``RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS``: set the APP LLR soft output after rate-matching.
+   - ``RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS``: disables the de-interleaver.
+
+* For the turbo encode operation:
+   - ``RTE_BBDEV_TURBO_CRC_24B_ATTACH``: set to attach CRC24B to CB(s).
+   - ``RTE_BBDEV_TURBO_RATE_MATCH``: if set then do not do Rate Match bypass.
+   - ``RTE_BBDEV_TURBO_ENC_INTERRUPTS``: set for encoder dequeue interrupts.
+   - ``RTE_BBDEV_TURBO_RV_INDEX_BYPASS``: set to bypass RV index.
+   - ``RTE_BBDEV_TURBO_ENC_SCATTER_GATHER``: supports scatter-gather for input/output data.
+
+* For the turbo decode operation:
+   - ``RTE_BBDEV_TURBO_CRC_TYPE_24B``: check CRC24B from CB(s).
+   - ``RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE``: perform subblock de-interleave.
+   - ``RTE_BBDEV_TURBO_DEC_INTERRUPTS``: set for decoder dequeue interrupts.
+   - ``RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN``: set if negative LLR input is supported.
+   - ``RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP``: keep CRC24B bits appended while decoding.
+   - ``RTE_BBDEV_TURBO_DEC_CRC_24B_DROP``: option to drop the code block CRC after decoding.
+   - ``RTE_BBDEV_TURBO_EARLY_TERMINATION``: set early termination feature.
+   - ``RTE_BBDEV_TURBO_DEC_SCATTER_GATHER``: supports scatter-gather for input/output data.
+   - ``RTE_BBDEV_TURBO_HALF_ITERATION_EVEN``: set half iteration granularity.
+   - ``RTE_BBDEV_TURBO_SOFT_OUTPUT``: set the APP LLR soft output.
+   - ``RTE_BBDEV_TURBO_EQUALIZER``: set the turbo equalizer feature.
+   - ``RTE_BBDEV_TURBO_SOFT_OUT_SATURATE``: set the soft output saturation.
+   - ``RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH``: set to run an extra odd iteration after CRC match.
+   - ``RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT``: set if negative APP LLR output supported.
+   - ``RTE_BBDEV_TURBO_MAP_DEC``: supports flexible parallel MAP engine decoding.
+
+* For the FFT operation:
+   - ``RTE_BBDEV_FFT_WINDOWING``: flexible windowing capability.
+   - ``RTE_BBDEV_FFT_CS_ADJUSTMENT``: flexible adjustment of Cyclic Shift time offset.
+   - ``RTE_BBDEV_FFT_DFT_BYPASS``: set for bypass the DFT and get directly into iDFT input.
+   - ``RTE_BBDEV_FFT_IDFT_BYPASS``: set for bypass the IDFT and get directly the DFT output.
+   - ``RTE_BBDEV_FFT_WINDOWING_BYPASS``: set for bypass time domain windowing.
+
+* For the MLD-TS operation:
+   - ``RTE_BBDEV_MLDTS_REP``: set to repeat and reuse channel across operations.
+
+Installation
+------------
+
+Section 3 of the DPDK manual provides instructions on installing and compiling DPDK.
+
+DPDK requires hugepages to be configured as detailed in section 2 of the DPDK manual.
+The bbdev test application has been tested with a configuration 40 x 1GB hugepages.
+The hugepage configuration of a server may be examined using:
+
+.. code-block:: console
+
+   grep Huge* /proc/meminfo
+
+
+Initialization
+--------------
+
+When the device first powers up, its PCI Physical Functions (PF)
+can be listed through these commands for Intel vRAN Boost v2:
+
+.. code-block:: console
+
+   sudo lspci -vd8086:57c2
+
+The physical and virtual functions are compatible with Linux UIO drivers:
+``vfio`` (preferred) and ``igb_uio`` (legacy).
+However, in order to work the 5G/4G FEC device first needs to be bound
+to one of these Linux drivers through DPDK.
+
+
+Configure the VFs through PF
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The PCI virtual functions must be configured before working or getting assigned
+to VMs/Containers.
+The configuration involves allocating the number of hardware queues, priorities,
+load balance, bandwidth and other settings necessary for the device
+to perform FEC functions.
+
+This configuration needs to be executed at least once after reboot or PCI FLR
+and can be achieved by using the functions ``rte_acc_configure()``,
+which sets up the parameters defined in the compatible ``rte_acc_conf`` structure.
+
+
+Test Application
+----------------
+
+The bbdev class is provided with a test application, ``test-bbdev.py``
+and range of test data for testing the functionality of the device,
+depending on the device's capabilities.
+The test application is located under app/test-bbdev folder
+and has the following options:
+
+.. code-block:: console
+
+   "-p", "--testapp-path": specifies path to the bbdev test app.
+   "-e", "--eal-params": EAL arguments which are passed to the test app.
+   "-t", "--timeout": Timeout in seconds (default=300).
+   "-c", "--test-cases": Defines test cases to run. Run all if not specified.
+   "-v", "--test-vector": Test vector path.
+   "-n", "--num-ops": Number of operations to process on device (default=32).
+   "-b", "--burst-size": Operations enqueue/dequeue burst size (default=32).
+   "-s", "--snr": SNR in dB used when generating LLRs for bler tests.
+   "-s", "--iter_max": Number of iterations for LDPC decoder.
+   "-l", "--num-lcores": Number of lcores to run (default=16).
+   "-i", "--init-device": Initialise PF device with default values.
+
+
+To execute the test application tool using simple decode or encode data,
+type one of the following:
+
+.. code-block:: console
+
+  ./test-bbdev.py -c validation -n 64 -b 1 -v ./ldpc_dec_default.data
+  ./test-bbdev.py -c validation -n 64 -b 1 -v ./ldpc_enc_default.data
+
+
+The test application ``test-bbdev.py``, supports the ability to configure the
+PF device with a default set of values, if the "-i" or "- -init-device" option
+is included. The default values are defined in test_bbdev_perf.c.
+
+
+Test Vectors
+~~~~~~~~~~~~
+
+In addition to the simple LDPC decoder and LDPC encoder tests,
+bbdev also provides a range of additional tests under the test_vectors folder,
+which may be useful.
+The results of these tests will depend on the device capabilities which may
+cause some test cases to be skipped, but no failure should be reported.
+
+
+Alternate Baseband Device configuration tool
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+On top of the embedded configuration feature supported in test-bbdev using
+"- -init-device" option mentioned above, there is also a tool available
+to perform that device configuration using a companion application.
+The ``pf_bb_config`` application notably enables then to run bbdev-test
+from the VF and not only limited to the PF as captured above.
+
+See for more details: https://github.com/intel/pf-bb-config
+
+Specifically for the bbdev Intel vRAN Boost v2 PMD, the command below can be used
+(note that ACC200 was used previously to refer to VRB2):
+
+.. code-block:: console
+
+   pf_bb_config VRB2 -c ./vrb2/vrb2_config_vf_5g.cfg
+   test-bbdev.py -e="-c 0xff0 -a${VF_PCI_ADDR}" -c validation -n 64 -b 64 -l 1 -v ./ldpc_dec_default.data
diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst
index 333e1d95a2..668dd58ee3 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -78,6 +78,9 @@ New Features
 * build: Optional libraries can now be selected with the new ``enable_libs``
   build option similarly to the existing ``enable_drivers`` build option.
 
+* **Updated Intel vRAN Boost bbdev PMD.**
+
+  Added support for the new Intel vRAN Boost v2 device variant (GNR-D) within the unified driver.
 
 Removed Items
 -------------
diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index c89c26c59a..48e779ce77 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -37,6 +37,15 @@ vrb1_queue_offset(bool pf_device, uint8_t vf_id, uint8_t qgrp_id, uint16_t aq_id
 		return ((qgrp_id << 7) + (aq_id << 3) + VRB1_VfQmgrIngressAq);
 }
 
+static inline uint32_t
+vrb2_queue_offset(bool pf_device, uint8_t vf_id, uint8_t qgrp_id, uint16_t aq_id)
+{
+	if (pf_device)
+		return ((vf_id << 14) + (qgrp_id << 9) + (aq_id << 3) + VRB2_PfQmgrIngressAq);
+	else
+		return ((qgrp_id << 9) + (aq_id << 3) + VRB2_VfQmgrIngressAq);
+}
+
 enum {UL_4G = 0, UL_5G, DL_4G, DL_5G, FFT, MLD, NUM_ACC};
 
 /* Return the accelerator enum for a Queue Group Index. */
@@ -197,7 +206,7 @@ fetch_acc_config(struct rte_bbdev *dev)
 	struct acc_device *d = dev->data->dev_private;
 	struct rte_acc_conf *acc_conf = &d->acc_conf;
 	uint8_t acc, qg;
-	uint32_t reg_aq, reg_len0, reg_len1, reg0, reg1;
+	uint32_t reg_aq, reg_len0, reg_len1, reg_len2, reg_len3, reg0, reg1, reg2, reg3;
 	uint32_t reg_mode, idx;
 	struct rte_acc_queue_topology *q_top = NULL;
 	int qman_func_id[VRB_NUM_ACCS] = {ACC_ACCMAP_0, ACC_ACCMAP_1,
@@ -219,32 +228,81 @@ fetch_acc_config(struct rte_bbdev *dev)
 	acc_conf->num_vf_bundles = 1;
 	initQTop(acc_conf);
 
-	reg0 = acc_reg_read(d, d->reg_addr->qman_group_func);
-	reg1 = acc_reg_read(d, d->reg_addr->qman_group_func + 4);
-	for (qg = 0; qg < d->num_qgroups; qg++) {
-		reg_aq = acc_reg_read(d, d->queue_offset(d->pf_device, 0, qg, 0));
-		if (reg_aq & ACC_QUEUE_ENABLE) {
-			if (qg < ACC_NUM_QGRPS_PER_WORD)
-				idx = (reg0 >> (qg * 4)) & 0x7;
+	if (d->device_variant == VRB1_VARIANT) {
+		reg0 = acc_reg_read(d, d->reg_addr->qman_group_func);
+		reg1 = acc_reg_read(d, d->reg_addr->qman_group_func + 4);
+		for (qg = 0; qg < d->num_qgroups; qg++) {
+			reg_aq = acc_reg_read(d, d->queue_offset(d->pf_device, 0, qg, 0));
+			if (reg_aq & ACC_QUEUE_ENABLE) {
+				if (qg < ACC_NUM_QGRPS_PER_WORD)
+					idx = (reg0 >> (qg * 4)) & 0x7;
+				else
+					idx = (reg1 >> ((qg - ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
+				if (idx < VRB1_NUM_ACCS) {
+					acc = qman_func_id[idx];
+					updateQtop(acc, qg, acc_conf, d);
+				}
+			}
+		}
+
+		/* Check the depth of the AQs. */
+		reg_len0 = acc_reg_read(d, d->reg_addr->depth_log0_offset);
+		reg_len1 = acc_reg_read(d, d->reg_addr->depth_log1_offset);
+		for (acc = 0; acc < NUM_ACC; acc++) {
+			qtopFromAcc(&q_top, acc, acc_conf);
+			if (q_top->first_qgroup_index < ACC_NUM_QGRPS_PER_WORD)
+				q_top->aq_depth_log2 =
+						(reg_len0 >> (q_top->first_qgroup_index * 4)) & 0xF;
 			else
-				idx = (reg1 >> ((qg - ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
-			if (idx < VRB1_NUM_ACCS) {
-				acc = qman_func_id[idx];
-				updateQtop(acc, qg, acc_conf, d);
+				q_top->aq_depth_log2 = (reg_len1 >> ((q_top->first_qgroup_index -
+						ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
+		}
+	} else {
+		reg0 = acc_reg_read(d, d->reg_addr->qman_group_func);
+		reg1 = acc_reg_read(d, d->reg_addr->qman_group_func + 4);
+		reg2 = acc_reg_read(d, d->reg_addr->qman_group_func + 8);
+		reg3 = acc_reg_read(d, d->reg_addr->qman_group_func + 12);
+		/* printf("Debug Function %08x %08x %08x %08x\n", reg0, reg1, reg2, reg3);*/
+		for (qg = 0; qg < VRB2_NUM_QGRPS; qg++) {
+			reg_aq = acc_reg_read(d, vrb2_queue_offset(d->pf_device, 0, qg, 0));
+			if (reg_aq & ACC_QUEUE_ENABLE) {
+				/* printf("Qg enabled %d %x\n", qg, reg_aq);*/
+				if (qg / ACC_NUM_QGRPS_PER_WORD == 0)
+					idx = (reg0 >> ((qg % ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
+				else if (qg / ACC_NUM_QGRPS_PER_WORD == 1)
+					idx = (reg1 >> ((qg % ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
+				else if (qg / ACC_NUM_QGRPS_PER_WORD == 2)
+					idx = (reg2 >> ((qg % ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
+				else
+					idx = (reg3 >> ((qg % ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
+				if (idx < VRB_NUM_ACCS) {
+					acc = qman_func_id[idx];
+					updateQtop(acc, qg, acc_conf, d);
+				}
 			}
 		}
-	}
 
-	/* Check the depth of the AQs. */
-	reg_len0 = acc_reg_read(d, d->reg_addr->depth_log0_offset);
-	reg_len1 = acc_reg_read(d, d->reg_addr->depth_log1_offset);
-	for (acc = 0; acc < NUM_ACC; acc++) {
-		qtopFromAcc(&q_top, acc, acc_conf);
-		if (q_top->first_qgroup_index < ACC_NUM_QGRPS_PER_WORD)
-			q_top->aq_depth_log2 = (reg_len0 >> (q_top->first_qgroup_index * 4)) & 0xF;
-		else
-			q_top->aq_depth_log2 = (reg_len1 >> ((q_top->first_qgroup_index -
-					ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
+		/* Check the depth of the AQs. */
+		reg_len0 = acc_reg_read(d, d->reg_addr->depth_log0_offset);
+		reg_len1 = acc_reg_read(d, d->reg_addr->depth_log0_offset + 4);
+		reg_len2 = acc_reg_read(d, d->reg_addr->depth_log0_offset + 8);
+		reg_len3 = acc_reg_read(d, d->reg_addr->depth_log0_offset + 12);
+
+		for (acc = 0; acc < NUM_ACC; acc++) {
+			qtopFromAcc(&q_top, acc, acc_conf);
+			if (q_top->first_qgroup_index / ACC_NUM_QGRPS_PER_WORD == 0)
+				q_top->aq_depth_log2 = (reg_len0 >> ((q_top->first_qgroup_index %
+						ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
+			else if (q_top->first_qgroup_index / ACC_NUM_QGRPS_PER_WORD == 1)
+				q_top->aq_depth_log2 = (reg_len1 >> ((q_top->first_qgroup_index %
+						ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
+			else if (q_top->first_qgroup_index / ACC_NUM_QGRPS_PER_WORD == 2)
+				q_top->aq_depth_log2 = (reg_len2 >> ((q_top->first_qgroup_index %
+						ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
+			else
+				q_top->aq_depth_log2 = (reg_len3 >> ((q_top->first_qgroup_index %
+						ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
+		}
 	}
 
 	/* Read PF mode. */
@@ -470,7 +528,10 @@ allocate_info_ring(struct rte_bbdev *dev)
 	phys_low  = (uint32_t)(info_ring_iova);
 	acc_reg_write(d, d->reg_addr->info_ring_hi, phys_high);
 	acc_reg_write(d, d->reg_addr->info_ring_lo, phys_low);
-	acc_reg_write(d, d->reg_addr->info_ring_en, VRB1_REG_IRQ_EN_ALL);
+	if (d->device_variant == VRB1_VARIANT)
+		acc_reg_write(d, d->reg_addr->info_ring_en, VRB1_REG_IRQ_EN_ALL);
+	else
+		acc_reg_write(d, d->reg_addr->info_ring_en, VRB2_REG_IRQ_EN_ALL);
 	d->info_ring_head = (acc_reg_read(d, d->reg_addr->info_ring_ptr) &
 			0xFFF) / sizeof(union acc_info_ring_data);
 	return 0;
@@ -549,6 +610,10 @@ vrb_setup_queues(struct rte_bbdev *dev, uint16_t num_queues, int socket_id)
 	acc_reg_write(d, d->reg_addr->dma_ring_dl4g_lo, phys_low);
 	acc_reg_write(d, d->reg_addr->dma_ring_fft_hi, phys_high);
 	acc_reg_write(d, d->reg_addr->dma_ring_fft_lo, phys_low);
+	if (d->device_variant == VRB2_VARIANT) {
+		acc_reg_write(d, d->reg_addr->dma_ring_mld_hi, phys_high);
+		acc_reg_write(d, d->reg_addr->dma_ring_mld_lo, phys_low);
+	}
 	/*
 	 * Configure Ring Size to the max queue ring size
 	 * (used for wrapping purpose).
@@ -582,6 +647,10 @@ vrb_setup_queues(struct rte_bbdev *dev, uint16_t num_queues, int socket_id)
 	acc_reg_write(d, d->reg_addr->tail_ptrs_dl4g_lo, phys_low);
 	acc_reg_write(d, d->reg_addr->tail_ptrs_fft_hi, phys_high);
 	acc_reg_write(d, d->reg_addr->tail_ptrs_fft_lo, phys_low);
+	if (d->device_variant == VRB2_VARIANT) {
+		acc_reg_write(d, d->reg_addr->tail_ptrs_mld_hi, phys_high);
+		acc_reg_write(d, d->reg_addr->tail_ptrs_mld_lo, phys_low);
+	}
 
 	ret = allocate_info_ring(dev);
 	if (ret < 0) {
@@ -679,10 +748,17 @@ vrb_intr_enable(struct rte_bbdev *dev)
 			return ret;
 		}
 
-		if (acc_dev->pf_device)
-			max_queues = VRB1_MAX_PF_MSIX;
-		else
-			max_queues = VRB1_MAX_VF_MSIX;
+		if (d->device_variant == VRB1_VARIANT) {
+			if (acc_dev->pf_device)
+				max_queues = VRB1_MAX_PF_MSIX;
+			else
+				max_queues = VRB1_MAX_VF_MSIX;
+		} else {
+			if (acc_dev->pf_device)
+				max_queues = VRB2_MAX_PF_MSIX;
+			else
+				max_queues = VRB2_MAX_VF_MSIX;
+		}
 
 		if (rte_intr_efd_enable(dev->intr_handle, max_queues)) {
 			rte_bbdev_log(ERR, "Failed to create fds for %u queues",
@@ -1158,6 +1234,10 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
 		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
 	};
 
+	static const struct rte_bbdev_op_cap vrb2_bbdev_capabilities[] = {
+		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
+	};
+
 	static struct rte_bbdev_queue_conf default_queue_conf;
 	default_queue_conf.socket = dev->data->socket_id;
 	default_queue_conf.queue_size = ACC_MAX_QUEUE_DEPTH;
@@ -1202,7 +1282,10 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
 	dev_info->default_queue_conf = default_queue_conf;
 	dev_info->cpu_flag_reqs = NULL;
 	dev_info->min_alignment = 1;
-	dev_info->capabilities = vrb1_bbdev_capabilities;
+	if (d->device_variant == VRB1_VARIANT)
+		dev_info->capabilities = vrb1_bbdev_capabilities;
+	else
+		dev_info->capabilities = vrb2_bbdev_capabilities;
 	dev_info->harq_buffer_size = 0;
 
 	vrb_check_ir(d);
@@ -1251,6 +1334,9 @@ static struct rte_pci_id pci_id_vrb_pf_map[] = {
 	{
 		RTE_PCI_DEVICE(RTE_VRB1_VENDOR_ID, RTE_VRB1_PF_DEVICE_ID)
 	},
+	{
+		RTE_PCI_DEVICE(RTE_VRB2_VENDOR_ID, RTE_VRB2_PF_DEVICE_ID)
+	},
 	{.device_id = 0},
 };
 
@@ -1259,6 +1345,9 @@ static struct rte_pci_id pci_id_vrb_vf_map[] = {
 	{
 		RTE_PCI_DEVICE(RTE_VRB1_VENDOR_ID, RTE_VRB1_VF_DEVICE_ID)
 	},
+	{
+		RTE_PCI_DEVICE(RTE_VRB2_VENDOR_ID, RTE_VRB2_VF_DEVICE_ID)
+	},
 	{.device_id = 0},
 };
 
@@ -3444,6 +3533,15 @@ vrb_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
 			d->reg_addr = &vrb1_pf_reg_addr;
 		else
 			d->reg_addr = &vrb1_vf_reg_addr;
+	} else {
+		d->device_variant = VRB2_VARIANT;
+		d->queue_offset = vrb2_queue_offset;
+		d->num_qgroups = VRB2_NUM_QGRPS;
+		d->num_aqs = VRB2_NUM_AQS;
+		if (d->pf_device)
+			d->reg_addr = &vrb2_pf_reg_addr;
+		else
+			d->reg_addr = &vrb2_vf_reg_addr;
 	}
 
 	rte_bbdev_log_debug("Init device %s [%s] @ vaddr %p paddr %#"PRIx64"",
diff --git a/drivers/baseband/acc/vrb2_pf_enum.h b/drivers/baseband/acc/vrb2_pf_enum.h
new file mode 100644
index 0000000000..28f10dc35b
--- /dev/null
+++ b/drivers/baseband/acc/vrb2_pf_enum.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2023 Intel Corporation
+ */
+
+#ifndef VRB2_PF_ENUM_H
+#define VRB2_PF_ENUM_H
+
+/*
+ * VRB2 Register mapping on PF BAR0
+ * This is automatically generated from RDL, format may change with new RDL
+ * Release.
+ * Variable names are as is
+ */
+enum {
+	VRB2_PfQmgrEgressQueuesTemplate             = 0x0007FC00,
+	VRB2_PfQmgrIngressAq                        = 0x00100000,
+	VRB2_PfQmgrSoftReset                        = 0x00A00034,
+	VRB2_PfQmgrAramAllocEn	                    = 0x00A000a0,
+	VRB2_PfQmgrAramAllocSetupN0                 = 0x00A000b0,
+	VRB2_PfQmgrAramAllocSetupN1                 = 0x00A000b4,
+	VRB2_PfQmgrAramAllocSetupN2                 = 0x00A000b8,
+	VRB2_PfQmgrAramAllocSetupN3                 = 0x00A000bc,
+	VRB2_PfQmgrDepthLog2Grp                     = 0x00A00200,
+	VRB2_PfQmgrTholdGrp                         = 0x00A00300,
+	VRB2_PfQmgrGrpTmplateReg0Indx               = 0x00A00600,
+	VRB2_PfQmgrGrpTmplateReg1Indx               = 0x00A00700,
+	VRB2_PfQmgrGrpTmplateReg2Indx               = 0x00A00800,
+	VRB2_PfQmgrGrpTmplateReg3Indx               = 0x00A00900,
+	VRB2_PfQmgrGrpTmplateReg4Indx               = 0x00A00A00,
+	VRB2_PfQmgrGrpTmplateReg5Indx               = 0x00A00B00,
+	VRB2_PfQmgrGrpTmplateReg6Indx               = 0x00A00C00,
+	VRB2_PfQmgrGrpTmplateReg7Indx               = 0x00A00D00,
+	VRB2_PfQmgrGrpTmplateEnRegIndx              = 0x00A00E00,
+	VRB2_PfQmgrArbQDepthGrp                     = 0x00A02F00,
+	VRB2_PfQmgrGrpFunction0                     = 0x00A02F80,
+	VRB2_PfQmgrGrpPriority                      = 0x00A02FC0,
+	VRB2_PfQmgrVfBaseAddr                       = 0x00A08000,
+	VRB2_PfQmgrAqEnableVf                       = 0x00A10000,
+	VRB2_PfQmgrRingSizeVf                       = 0x00A20010,
+	VRB2_PfQmgrGrpDepthLog20Vf                  = 0x00A20020,
+	VRB2_PfQmgrGrpDepthLog21Vf                  = 0x00A20024,
+	VRB2_PfFabricM2iBufferReg                   = 0x00B30000,
+	VRB2_PfFecUl5gIbDebug0Reg                   = 0x00B401FC,
+	VRB2_PfFftConfig0                           = 0x00B58004,
+	VRB2_PfFftParityMask8                       = 0x00B5803C,
+	VRB2_PfDmaConfig0Reg                        = 0x00B80000,
+	VRB2_PfDmaConfig1Reg                        = 0x00B80004,
+	VRB2_PfDmaQmgrAddrReg                       = 0x00B80008,
+	VRB2_PfDmaAxcacheReg                        = 0x00B80010,
+	VRB2_PfDmaAxiControl                        = 0x00B8002C,
+	VRB2_PfDmaQmanen                            = 0x00B80040,
+	VRB2_PfDmaQmanenSelect                      = 0x00B80044,
+	VRB2_PfDmaCfgRrespBresp                     = 0x00B80814,
+	VRB2_PfDmaDescriptorSignature               = 0x00B80868,
+	VRB2_PfDmaErrorDetectionEn                  = 0x00B80870,
+	VRB2_PfDmaFec5GulDescBaseLoRegVf            = 0x00B88020,
+	VRB2_PfDmaFec5GulDescBaseHiRegVf            = 0x00B88024,
+	VRB2_PfDmaFec5GulRespPtrLoRegVf             = 0x00B88028,
+	VRB2_PfDmaFec5GulRespPtrHiRegVf             = 0x00B8802C,
+	VRB2_PfDmaFec5GdlDescBaseLoRegVf            = 0x00B88040,
+	VRB2_PfDmaFec5GdlDescBaseHiRegVf            = 0x00B88044,
+	VRB2_PfDmaFec5GdlRespPtrLoRegVf             = 0x00B88048,
+	VRB2_PfDmaFec5GdlRespPtrHiRegVf             = 0x00B8804C,
+	VRB2_PfDmaFec4GulDescBaseLoRegVf            = 0x00B88060,
+	VRB2_PfDmaFec4GulDescBaseHiRegVf            = 0x00B88064,
+	VRB2_PfDmaFec4GulRespPtrLoRegVf             = 0x00B88068,
+	VRB2_PfDmaFec4GulRespPtrHiRegVf             = 0x00B8806C,
+	VRB2_PfDmaFec4GdlDescBaseLoRegVf            = 0x00B88080,
+	VRB2_PfDmaFec4GdlDescBaseHiRegVf            = 0x00B88084,
+	VRB2_PfDmaFec4GdlRespPtrLoRegVf             = 0x00B88088,
+	VRB2_PfDmaFec4GdlRespPtrHiRegVf             = 0x00B8808C,
+	VRB2_PfDmaFftDescBaseLoRegVf                = 0x00B880A0,
+	VRB2_PfDmaFftDescBaseHiRegVf                = 0x00B880A4,
+	VRB2_PfDmaFftRespPtrLoRegVf                 = 0x00B880A8,
+	VRB2_PfDmaFftRespPtrHiRegVf                 = 0x00B880AC,
+	VRB2_PfDmaMldDescBaseLoRegVf                = 0x00B880C0,
+	VRB2_PfDmaMldDescBaseHiRegVf                = 0x00B880C4,
+	VRB2_PfQosmonAEvalOverflow0                 = 0x00B90008,
+	VRB2_PfPermonACntrlRegVf                    = 0x00B98000,
+	VRB2_PfQosmonBEvalOverflow0                 = 0x00BA0008,
+	VRB2_PfPermonBCntrlRegVf                    = 0x00BA8000,
+	VRB2_PfPermonCCntrlRegVf                    = 0x00BB8000,
+	VRB2_PfHiInfoRingBaseLoRegPf                = 0x00C84014,
+	VRB2_PfHiInfoRingBaseHiRegPf                = 0x00C84018,
+	VRB2_PfHiInfoRingPointerRegPf               = 0x00C8401C,
+	VRB2_PfHiInfoRingIntWrEnRegPf               = 0x00C84020,
+	VRB2_PfHiBlockTransmitOnErrorEn             = 0x00C84038,
+	VRB2_PfHiCfgMsiIntWrEnRegPf                 = 0x00C84040,
+	VRB2_PfHiMsixVectorMapperPf                 = 0x00C84060,
+	VRB2_PfHiPfMode                             = 0x00C84108,
+	VRB2_PfHiClkGateHystReg                     = 0x00C8410C,
+	VRB2_PfHiMsiDropEnableReg                   = 0x00C84114,
+	VRB2_PfHiSectionPowerGatingReq              = 0x00C84128,
+	VRB2_PfHiSectionPowerGatingAck              = 0x00C8412C,
+};
+
+/* TIP PF Interrupt numbers */
+enum {
+	VRB2_PF_INT_QMGR_AQ_OVERFLOW = 0,
+	VRB2_PF_INT_DOORBELL_VF_2_PF = 1,
+	VRB2_PF_INT_ILLEGAL_FORMAT = 2,
+	VRB2_PF_INT_QMGR_DISABLED_ACCESS = 3,
+	VRB2_PF_INT_QMGR_AQ_OVERTHRESHOLD = 4,
+	VRB2_PF_INT_DMA_DL_DESC_IRQ = 5,
+	VRB2_PF_INT_DMA_UL_DESC_IRQ = 6,
+	VRB2_PF_INT_DMA_FFT_DESC_IRQ = 7,
+	VRB2_PF_INT_DMA_UL5G_DESC_IRQ = 8,
+	VRB2_PF_INT_DMA_DL5G_DESC_IRQ = 9,
+	VRB2_PF_INT_DMA_MLD_DESC_IRQ = 10,
+	VRB2_PF_INT_ARAM_ACCESS_ERR = 11,
+	VRB2_PF_INT_ARAM_ECC_1BIT_ERR = 12,
+	VRB2_PF_INT_PARITY_ERR = 13,
+	VRB2_PF_INT_QMGR_OVERFLOW = 14,
+	VRB2_PF_INT_QMGR_ERR = 15,
+	VRB2_PF_INT_ATS_ERR = 22,
+	VRB2_PF_INT_ARAM_FUUL = 23,
+	VRB2_PF_INT_EXTRA_READ = 24,
+	VRB2_PF_INT_COMPLETION_TIMEOUT = 25,
+	VRB2_PF_INT_CORE_HANG = 26,
+	VRB2_PF_INT_DMA_HANG = 28,
+	VRB2_PF_INT_DS_HANG = 27,
+};
+
+#endif /* VRB2_PF_ENUM_H */
diff --git a/drivers/baseband/acc/vrb2_vf_enum.h b/drivers/baseband/acc/vrb2_vf_enum.h
new file mode 100644
index 0000000000..9c6e451010
--- /dev/null
+++ b/drivers/baseband/acc/vrb2_vf_enum.h
@@ -0,0 +1,121 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2023 Intel Corporation
+ */
+
+#ifndef VRB2_VF_ENUM_H
+#define VRB2_VF_ENUM_H
+
+/*
+ * VRB2 Register mapping on VF BAR0
+ * This is automatically generated from RDL, format may change with new RDL
+ */
+enum {
+	VRB2_VfHiVfToPfDbellVf           = 0x00000000,
+	VRB2_VfHiPfToVfDbellVf           = 0x00000008,
+	VRB2_VfHiInfoRingBaseLoVf        = 0x00000010,
+	VRB2_VfHiInfoRingBaseHiVf        = 0x00000014,
+	VRB2_VfHiInfoRingPointerVf       = 0x00000018,
+	VRB2_VfHiInfoRingIntWrEnVf       = 0x00000020,
+	VRB2_VfHiInfoRingPf2VfWrEnVf     = 0x00000024,
+	VRB2_VfHiMsixVectorMapperVf      = 0x00000060,
+	VRB2_VfHiDeviceStatus            = 0x00000068,
+	VRB2_VfHiInterruptSrc            = 0x00000070,
+	VRB2_VfDmaFec5GulDescBaseLoRegVf = 0x00000120,
+	VRB2_VfDmaFec5GulDescBaseHiRegVf = 0x00000124,
+	VRB2_VfDmaFec5GulRespPtrLoRegVf  = 0x00000128,
+	VRB2_VfDmaFec5GulRespPtrHiRegVf  = 0x0000012C,
+	VRB2_VfDmaFec5GdlDescBaseLoRegVf = 0x00000140,
+	VRB2_VfDmaFec5GdlDescBaseHiRegVf = 0x00000144,
+	VRB2_VfDmaFec5GdlRespPtrLoRegVf  = 0x00000148,
+	VRB2_VfDmaFec5GdlRespPtrHiRegVf  = 0x0000014C,
+	VRB2_VfDmaFec4GulDescBaseLoRegVf = 0x00000160,
+	VRB2_VfDmaFec4GulDescBaseHiRegVf = 0x00000164,
+	VRB2_VfDmaFec4GulRespPtrLoRegVf  = 0x00000168,
+	VRB2_VfDmaFec4GulRespPtrHiRegVf  = 0x0000016C,
+	VRB2_VfDmaFec4GdlDescBaseLoRegVf = 0x00000180,
+	VRB2_VfDmaFec4GdlDescBaseHiRegVf = 0x00000184,
+	VRB2_VfDmaFec4GdlRespPtrLoRegVf  = 0x00000188,
+	VRB2_VfDmaFec4GdlRespPtrHiRegVf  = 0x0000018C,
+	VRB2_VfDmaFftDescBaseLoRegVf     = 0x000001A0,
+	VRB2_VfDmaFftDescBaseHiRegVf     = 0x000001A4,
+	VRB2_VfDmaFftRespPtrLoRegVf      = 0x000001A8,
+	VRB2_VfDmaFftRespPtrHiRegVf      = 0x000001AC,
+	VRB2_VfDmaMldDescBaseLoRegVf     = 0x000001C0,
+	VRB2_VfDmaMldDescBaseHiRegVf     = 0x000001C4,
+	VRB2_VfDmaMldRespPtrLoRegVf      = 0x000001C8,
+	VRB2_VfDmaMldRespPtrHiRegVf      = 0x000001CC,
+	VRB2_VfPmACntrlRegVf             = 0x00000200,
+	VRB2_VfPmACountVf                = 0x00000208,
+	VRB2_VfPmAKCntLoVf               = 0x00000210,
+	VRB2_VfPmAKCntHiVf               = 0x00000214,
+	VRB2_VfPmADeltaCntLoVf           = 0x00000220,
+	VRB2_VfPmADeltaCntHiVf           = 0x00000224,
+	VRB2_VfPmBCntrlRegVf             = 0x00000240,
+	VRB2_VfPmBCountVf                = 0x00000248,
+	VRB2_VfPmBKCntLoVf               = 0x00000250,
+	VRB2_VfPmBKCntHiVf               = 0x00000254,
+	VRB2_VfPmBDeltaCntLoVf           = 0x00000260,
+	VRB2_VfPmBDeltaCntHiVf           = 0x00000264,
+	VRB2_VfPmCCntrlRegVf             = 0x00000280,
+	VRB2_VfPmCCountVf                = 0x00000288,
+	VRB2_VfPmCKCntLoVf               = 0x00000290,
+	VRB2_VfPmCKCntHiVf               = 0x00000294,
+	VRB2_VfPmCDeltaCntLoVf           = 0x000002A0,
+	VRB2_VfPmCDeltaCntHiVf           = 0x000002A4,
+	VRB2_VfPmDCntrlRegVf             = 0x000002C0,
+	VRB2_VfPmDCountVf                = 0x000002C8,
+	VRB2_VfPmDKCntLoVf               = 0x000002D0,
+	VRB2_VfPmDKCntHiVf               = 0x000002D4,
+	VRB2_VfPmDDeltaCntLoVf           = 0x000002E0,
+	VRB2_VfPmDDeltaCntHiVf           = 0x000002E4,
+	VRB2_VfPmECntrlRegVf             = 0x00000300,
+	VRB2_VfPmECountVf                = 0x00000308,
+	VRB2_VfPmEKCntLoVf               = 0x00000310,
+	VRB2_VfPmEKCntHiVf               = 0x00000314,
+	VRB2_VfPmEDeltaCntLoVf           = 0x00000320,
+	VRB2_VfPmEDeltaCntHiVf           = 0x00000324,
+	VRB2_VfPmFCntrlRegVf             = 0x00000340,
+	VRB2_VfPmFCountVf                = 0x00000348,
+	VRB2_VfPmFKCntLoVf               = 0x00000350,
+	VRB2_VfPmFKCntHiVf               = 0x00000354,
+	VRB2_VfPmFDeltaCntLoVf           = 0x00000360,
+	VRB2_VfPmFDeltaCntHiVf           = 0x00000364,
+	VRB2_VfQmgrAqReset0              = 0x00000600,
+	VRB2_VfQmgrAqReset1              = 0x00000604,
+	VRB2_VfQmgrAqReset2              = 0x00000608,
+	VRB2_VfQmgrAqReset3              = 0x0000060C,
+	VRB2_VfQmgrRingSizeVf            = 0x00000610,
+	VRB2_VfQmgrGrpDepthLog20Vf       = 0x00000620,
+	VRB2_VfQmgrGrpDepthLog21Vf       = 0x00000624,
+	VRB2_VfQmgrGrpDepthLog22Vf       = 0x00000628,
+	VRB2_VfQmgrGrpDepthLog23Vf       = 0x0000062C,
+	VRB2_VfQmgrGrpFunction0Vf        = 0x00000630,
+	VRB2_VfQmgrGrpFunction1Vf        = 0x00000634,
+	VRB2_VfQmgrAramUsageN0           = 0x00000640,
+	VRB2_VfQmgrAramUsageN1           = 0x00000644,
+	VRB2_VfQmgrAramUsageN2           = 0x00000648,
+	VRB2_VfQmgrAramUsageN3           = 0x0000064C,
+	VRB2_VfHiMSIXBaseLoRegVf         = 0x00001000,
+	VRB2_VfHiMSIXBaseHiRegVf         = 0x00001004,
+	VRB2_VfHiMSIXBaseDataRegVf       = 0x00001008,
+	VRB2_VfHiMSIXBaseMaskRegVf       = 0x0000100C,
+	VRB2_VfHiMSIXPBABaseLoRegVf      = 0x00003000,
+	VRB2_VfQmgrIngressAq             = 0x00004000,
+};
+
+/* TIP VF Interrupt numbers */
+enum {
+	VRB2_VF_INT_QMGR_AQ_OVERFLOW = 0,
+	VRB2_VF_INT_DOORBELL_PF_2_VF = 1,
+	VRB2_VF_INT_ILLEGAL_FORMAT = 2,
+	VRB2_VF_INT_QMGR_DISABLED_ACCESS = 3,
+	VRB2_VF_INT_QMGR_AQ_OVERTHRESHOLD = 4,
+	VRB2_VF_INT_DMA_DL_DESC_IRQ = 5,
+	VRB2_VF_INT_DMA_UL_DESC_IRQ = 6,
+	VRB2_VF_INT_DMA_FFT_DESC_IRQ = 7,
+	VRB2_VF_INT_DMA_UL5G_DESC_IRQ = 8,
+	VRB2_VF_INT_DMA_DL5G_DESC_IRQ = 9,
+	VRB2_VF_INT_DMA_MLD_DESC_IRQ = 10,
+};
+
+#endif /* VRB2_VF_ENUM_H */
diff --git a/drivers/baseband/acc/vrb_pmd.h b/drivers/baseband/acc/vrb_pmd.h
index 1cabc0b7f4..0371db9972 100644
--- a/drivers/baseband/acc/vrb_pmd.h
+++ b/drivers/baseband/acc/vrb_pmd.h
@@ -8,6 +8,8 @@
 #include "acc_common.h"
 #include "vrb1_pf_enum.h"
 #include "vrb1_vf_enum.h"
+#include "vrb2_pf_enum.h"
+#include "vrb2_vf_enum.h"
 #include "vrb_cfg.h"
 
 /* Helper macro for logging */
@@ -31,12 +33,13 @@
 #define RTE_VRB1_VENDOR_ID           (0x8086)
 #define RTE_VRB1_PF_DEVICE_ID        (0x57C0)
 #define RTE_VRB1_VF_DEVICE_ID        (0x57C1)
-
-#define VRB1_VARIANT               2
+#define RTE_VRB2_VENDOR_ID           (0x8086)
+#define RTE_VRB2_PF_DEVICE_ID        (0x57C2)
+#define RTE_VRB2_VF_DEVICE_ID        (0x57C3)
 
 #define VRB_NUM_ACCS                 6
 #define VRB_MAX_QGRPS                32
-#define VRB_MAX_AQS                  32
+#define VRB_MAX_AQS                  64
 
 #define ACC_STATUS_WAIT      10
 #define ACC_STATUS_TO        100
@@ -46,8 +49,6 @@
 #define VRB1_NUM_VFS                  16
 #define VRB1_NUM_QGRPS                16
 #define VRB1_NUM_AQS                  16
-#define VRB1_GRP_ID_SHIFT    10 /* Queue Index Hierarchy */
-#define VRB1_VF_ID_SHIFT     4  /* Queue Index Hierarchy */
 #define VRB1_WORDS_IN_ARAM_SIZE (256 * 1024 / 4)
 
 /* VRB1 Mapping of signals for the available engines */
@@ -61,7 +62,6 @@
 #define VRB1_SIG_DL_4G_LAST 23
 #define VRB1_SIG_FFT        24
 #define VRB1_SIG_FFT_LAST   24
-
 #define VRB1_NUM_ACCS       5
 
 /* VRB1 Configuration */
@@ -90,6 +90,67 @@
 #define VRB1_MAX_PF_MSIX            (256+32)
 #define VRB1_MAX_VF_MSIX            (256+7)
 
+/* VRB2 specific flags */
+
+#define VRB2_NUM_VFS        64
+#define VRB2_NUM_QGRPS      32
+#define VRB2_NUM_AQS        64
+#define VRB2_WORDS_IN_ARAM_SIZE (512 * 1024 / 4)
+#define VRB2_NUM_ACCS        6
+#define VRB2_AQ_REG_NUM      4
+
+/* VRB2 Mapping of signals for the available engines */
+#define VRB2_SIG_UL_5G       0
+#define VRB2_SIG_UL_5G_LAST  5
+#define VRB2_SIG_DL_5G       9
+#define VRB2_SIG_DL_5G_LAST 11
+#define VRB2_SIG_UL_4G      12
+#define VRB2_SIG_UL_4G_LAST 16
+#define VRB2_SIG_DL_4G      21
+#define VRB2_SIG_DL_4G_LAST 23
+#define VRB2_SIG_FFT        24
+#define VRB2_SIG_FFT_LAST   26
+#define VRB2_SIG_MLD        30
+#define VRB2_SIG_MLD_LAST   31
+#define VRB2_FFT_NUM        3
+
+#define VRB2_FCW_MLDTS_BLEN 32
+#define VRB2_MLD_MIN_LAYER   2
+#define VRB2_MLD_MAX_LAYER   4
+#define VRB2_MLD_MAX_RREP    5
+#define VRB2_MLD_LAY_SIZE    3
+#define VRB2_MLD_RREP_SIZE   6
+#define VRB2_MLD_M2DLEN      3
+
+#define VRB2_MAX_PF_MSIX      (256+32)
+#define VRB2_MAX_VF_MSIX      (64+7)
+#define VRB2_REG_IRQ_EN_ALL   0xFFFFFFFF  /* Enable all interrupts */
+#define VRB2_FABRIC_MODE      0x8000103
+#define VRB2_CFG_DMA_ERROR    0x7DF
+#define VRB2_CFG_AXI_CACHE    0x11
+#define VRB2_CFG_QMGR_HI_P    0x0F0F
+#define VRB2_RESET_HARD       0x1FF
+#define VRB2_ENGINES_MAX      9
+#define VRB2_GPEX_AXIMAP_NUM  17
+#define VRB2_CLOCK_GATING_EN  0x30000
+#define VRB2_FFT_CFG_0        0x2001
+#define VRB2_FFT_ECC          0x60
+#define VRB2_FFT_RAM_EN       0x80008000
+#define VRB2_FFT_RAM_DIS      0x0
+#define VRB2_FFT_RAM_SIZE     512
+#define VRB2_CLK_EN           0x00010A01
+#define VRB2_CLK_DIS          0x01F10A01
+#define VRB2_PG_MASK_0        0x1F
+#define VRB2_PG_MASK_1        0xF
+#define VRB2_PG_MASK_2        0x1
+#define VRB2_PG_MASK_3        0x0
+#define VRB2_PG_MASK_FFT      1
+#define VRB2_PG_MASK_4GUL     4
+#define VRB2_PG_MASK_5GUL     8
+#define VRB2_PF_PM_REG_OFFSET 0x10000
+#define VRB2_VF_PM_REG_OFFSET 0x40
+#define VRB2_PM_START         0x2
+
 struct acc_registry_addr {
 	unsigned int dma_ring_dl5g_hi;
 	unsigned int dma_ring_dl5g_lo;
@@ -218,4 +279,92 @@ static const struct acc_registry_addr vrb1_vf_reg_addr = {
 	.pf2vf_doorbell = VRB1_VfHiPfToVfDbellVf,
 };
 
+
+/* Structure holding registry addresses for PF */
+static const struct acc_registry_addr vrb2_pf_reg_addr = {
+	.dma_ring_dl5g_hi =  VRB2_PfDmaFec5GdlDescBaseHiRegVf,
+	.dma_ring_dl5g_lo =  VRB2_PfDmaFec5GdlDescBaseLoRegVf,
+	.dma_ring_ul5g_hi =  VRB2_PfDmaFec5GulDescBaseHiRegVf,
+	.dma_ring_ul5g_lo =  VRB2_PfDmaFec5GulDescBaseLoRegVf,
+	.dma_ring_dl4g_hi =  VRB2_PfDmaFec4GdlDescBaseHiRegVf,
+	.dma_ring_dl4g_lo =  VRB2_PfDmaFec4GdlDescBaseLoRegVf,
+	.dma_ring_ul4g_hi =  VRB2_PfDmaFec4GulDescBaseHiRegVf,
+	.dma_ring_ul4g_lo =  VRB2_PfDmaFec4GulDescBaseLoRegVf,
+	.dma_ring_fft_hi =   VRB2_PfDmaFftDescBaseHiRegVf,
+	.dma_ring_fft_lo =   VRB2_PfDmaFftDescBaseLoRegVf,
+	.dma_ring_mld_hi =   VRB2_PfDmaMldDescBaseHiRegVf,
+	.dma_ring_mld_lo =   VRB2_PfDmaMldDescBaseLoRegVf,
+	.ring_size =         VRB2_PfQmgrRingSizeVf,
+	.info_ring_hi =      VRB2_PfHiInfoRingBaseHiRegPf,
+	.info_ring_lo =      VRB2_PfHiInfoRingBaseLoRegPf,
+	.info_ring_en =      VRB2_PfHiInfoRingIntWrEnRegPf,
+	.info_ring_ptr =     VRB2_PfHiInfoRingPointerRegPf,
+	.tail_ptrs_dl5g_hi = VRB2_PfDmaFec5GdlRespPtrHiRegVf,
+	.tail_ptrs_dl5g_lo = VRB2_PfDmaFec5GdlRespPtrLoRegVf,
+	.tail_ptrs_ul5g_hi = VRB2_PfDmaFec5GulRespPtrHiRegVf,
+	.tail_ptrs_ul5g_lo = VRB2_PfDmaFec5GulRespPtrLoRegVf,
+	.tail_ptrs_dl4g_hi = VRB2_PfDmaFec4GdlRespPtrHiRegVf,
+	.tail_ptrs_dl4g_lo = VRB2_PfDmaFec4GdlRespPtrLoRegVf,
+	.tail_ptrs_ul4g_hi = VRB2_PfDmaFec4GulRespPtrHiRegVf,
+	.tail_ptrs_ul4g_lo = VRB2_PfDmaFec4GulRespPtrLoRegVf,
+	.tail_ptrs_fft_hi =  VRB2_PfDmaFftRespPtrHiRegVf,
+	.tail_ptrs_fft_lo =  VRB2_PfDmaFftRespPtrLoRegVf,
+	.tail_ptrs_mld_hi =  VRB2_PfDmaFftRespPtrHiRegVf,
+	.tail_ptrs_mld_lo =  VRB2_PfDmaFftRespPtrLoRegVf,
+	.depth_log0_offset = VRB2_PfQmgrGrpDepthLog20Vf,
+	.depth_log1_offset = VRB2_PfQmgrGrpDepthLog21Vf,
+	.qman_group_func =   VRB2_PfQmgrGrpFunction0,
+	.hi_mode =           VRB2_PfHiMsixVectorMapperPf,
+	.pf_mode =           VRB2_PfHiPfMode,
+	.pmon_ctrl_a =       VRB2_PfPermonACntrlRegVf,
+	.pmon_ctrl_b =       VRB2_PfPermonBCntrlRegVf,
+	.pmon_ctrl_c =       VRB2_PfPermonCCntrlRegVf,
+	.vf2pf_doorbell =    0,
+	.pf2vf_doorbell =    0,
+};
+
+/* Structure holding registry addresses for VF */
+static const struct acc_registry_addr vrb2_vf_reg_addr = {
+	.dma_ring_dl5g_hi =  VRB2_VfDmaFec5GdlDescBaseHiRegVf,
+	.dma_ring_dl5g_lo =  VRB2_VfDmaFec5GdlDescBaseLoRegVf,
+	.dma_ring_ul5g_hi =  VRB2_VfDmaFec5GulDescBaseHiRegVf,
+	.dma_ring_ul5g_lo =  VRB2_VfDmaFec5GulDescBaseLoRegVf,
+	.dma_ring_dl4g_hi =  VRB2_VfDmaFec4GdlDescBaseHiRegVf,
+	.dma_ring_dl4g_lo =  VRB2_VfDmaFec4GdlDescBaseLoRegVf,
+	.dma_ring_ul4g_hi =  VRB2_VfDmaFec4GulDescBaseHiRegVf,
+	.dma_ring_ul4g_lo =  VRB2_VfDmaFec4GulDescBaseLoRegVf,
+	.dma_ring_fft_hi =   VRB2_VfDmaFftDescBaseHiRegVf,
+	.dma_ring_fft_lo =   VRB2_VfDmaFftDescBaseLoRegVf,
+	.dma_ring_mld_hi =   VRB2_VfDmaMldDescBaseHiRegVf,
+	.dma_ring_mld_lo =   VRB2_VfDmaMldDescBaseLoRegVf,
+	.ring_size =         VRB2_VfQmgrRingSizeVf,
+	.info_ring_hi =      VRB2_VfHiInfoRingBaseHiVf,
+	.info_ring_lo =      VRB2_VfHiInfoRingBaseLoVf,
+	.info_ring_en =      VRB2_VfHiInfoRingIntWrEnVf,
+	.info_ring_ptr =     VRB2_VfHiInfoRingPointerVf,
+	.tail_ptrs_dl5g_hi = VRB2_VfDmaFec5GdlRespPtrHiRegVf,
+	.tail_ptrs_dl5g_lo = VRB2_VfDmaFec5GdlRespPtrLoRegVf,
+	.tail_ptrs_ul5g_hi = VRB2_VfDmaFec5GulRespPtrHiRegVf,
+	.tail_ptrs_ul5g_lo = VRB2_VfDmaFec5GulRespPtrLoRegVf,
+	.tail_ptrs_dl4g_hi = VRB2_VfDmaFec4GdlRespPtrHiRegVf,
+	.tail_ptrs_dl4g_lo = VRB2_VfDmaFec4GdlRespPtrLoRegVf,
+	.tail_ptrs_ul4g_hi = VRB2_VfDmaFec4GulRespPtrHiRegVf,
+	.tail_ptrs_ul4g_lo = VRB2_VfDmaFec4GulRespPtrLoRegVf,
+	.tail_ptrs_fft_hi =  VRB2_VfDmaFftRespPtrHiRegVf,
+	.tail_ptrs_fft_lo =  VRB2_VfDmaFftRespPtrLoRegVf,
+	.tail_ptrs_mld_hi =  VRB2_VfDmaMldRespPtrHiRegVf,
+	.tail_ptrs_mld_lo =  VRB2_VfDmaMldRespPtrLoRegVf,
+	.depth_log0_offset = VRB2_VfQmgrGrpDepthLog20Vf,
+	.depth_log1_offset = VRB2_VfQmgrGrpDepthLog21Vf,
+	.qman_group_func =   VRB2_VfQmgrGrpFunction0Vf,
+	.hi_mode =           VRB2_VfHiMsixVectorMapperVf,
+	.pf_mode =           0,
+	.pmon_ctrl_a =       VRB2_VfPmACntrlRegVf,
+	.pmon_ctrl_b =       VRB2_VfPmBCntrlRegVf,
+	.pmon_ctrl_c =       VRB2_VfPmCCntrlRegVf,
+	.vf2pf_doorbell =    VRB2_VfHiVfToPfDbellVf,
+	.pf2vf_doorbell =    VRB2_VfHiPfToVfDbellVf,
+};
+
+
 #endif /* _VRB_PMD_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 08/12] baseband/acc: add FEC capabilities for the VRB2 variant
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
                   ` (6 preceding siblings ...)
  2023-09-29 16:35 ` [PATCH v3 07/12] baseband/acc: adding VRB2 device variant Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-10-03 14:28   ` Maxime Coquelin
  2023-09-29 16:35 ` [PATCH v3 09/12] baseband/acc: add FFT support to " Nicolas Chautru
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

New implementation for some of the FEC features
specific to the VRB2 variant.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 drivers/baseband/acc/rte_vrb_pmd.c | 567 ++++++++++++++++++++++++++++-
 1 file changed, 548 insertions(+), 19 deletions(-)

diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index 48e779ce77..93add82947 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -1235,6 +1235,94 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
 	};
 
 	static const struct rte_bbdev_op_cap vrb2_bbdev_capabilities[] = {
+		{
+			.type = RTE_BBDEV_OP_TURBO_DEC,
+			.cap.turbo_dec = {
+				.capability_flags =
+					RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE |
+					RTE_BBDEV_TURBO_CRC_TYPE_24B |
+					RTE_BBDEV_TURBO_DEC_CRC_24B_DROP |
+					RTE_BBDEV_TURBO_EQUALIZER |
+					RTE_BBDEV_TURBO_SOFT_OUT_SATURATE |
+					RTE_BBDEV_TURBO_HALF_ITERATION_EVEN |
+					RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH |
+					RTE_BBDEV_TURBO_SOFT_OUTPUT |
+					RTE_BBDEV_TURBO_EARLY_TERMINATION |
+					RTE_BBDEV_TURBO_DEC_INTERRUPTS |
+					RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN |
+					RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT |
+					RTE_BBDEV_TURBO_MAP_DEC |
+					RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP |
+					RTE_BBDEV_TURBO_DEC_SCATTER_GATHER,
+				.max_llr_modulus = INT8_MAX,
+				.num_buffers_src =
+						RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
+				.num_buffers_hard_out =
+						RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
+				.num_buffers_soft_out =
+						RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
+			}
+		},
+		{
+			.type = RTE_BBDEV_OP_TURBO_ENC,
+			.cap.turbo_enc = {
+				.capability_flags =
+					RTE_BBDEV_TURBO_CRC_24B_ATTACH |
+					RTE_BBDEV_TURBO_RV_INDEX_BYPASS |
+					RTE_BBDEV_TURBO_RATE_MATCH |
+					RTE_BBDEV_TURBO_ENC_INTERRUPTS |
+					RTE_BBDEV_TURBO_ENC_SCATTER_GATHER,
+				.num_buffers_src =
+						RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
+				.num_buffers_dst =
+						RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
+			}
+		},
+		{
+			.type   = RTE_BBDEV_OP_LDPC_ENC,
+			.cap.ldpc_enc = {
+				.capability_flags =
+					RTE_BBDEV_LDPC_RATE_MATCH |
+					RTE_BBDEV_LDPC_CRC_24B_ATTACH |
+					RTE_BBDEV_LDPC_INTERLEAVER_BYPASS |
+					RTE_BBDEV_LDPC_ENC_INTERRUPTS |
+					RTE_BBDEV_LDPC_ENC_SCATTER_GATHER |
+					RTE_BBDEV_LDPC_ENC_CONCATENATION,
+				.num_buffers_src =
+						RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
+				.num_buffers_dst =
+						RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
+			}
+		},
+		{
+			.type   = RTE_BBDEV_OP_LDPC_DEC,
+			.cap.ldpc_dec = {
+			.capability_flags =
+				RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK |
+				RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP |
+				RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK |
+				RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK |
+				RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE |
+				RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE |
+				RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE |
+				RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS |
+				RTE_BBDEV_LDPC_DEC_SCATTER_GATHER |
+				RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION |
+				RTE_BBDEV_LDPC_HARQ_4BIT_COMPRESSION |
+				RTE_BBDEV_LDPC_LLR_COMPRESSION |
+				RTE_BBDEV_LDPC_SOFT_OUT_ENABLE |
+				RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS |
+				RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS |
+				RTE_BBDEV_LDPC_DEC_INTERRUPTS,
+			.llr_size = 8,
+			.llr_decimals = 2,
+			.num_buffers_src =
+					RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
+			.num_buffers_hard_out =
+					RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
+			.num_buffers_soft_out = 0,
+			}
+		},
 		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
 	};
 
@@ -1774,6 +1862,141 @@ vrb1_dma_desc_ld_fill(struct rte_bbdev_dec_op *op,
 	return 0;
 }
 
+/* Fill in a frame control word for LDPC decoding. */
+static inline void
+vrb2_fcw_ld_fill(struct rte_bbdev_dec_op *op, struct acc_fcw_ld *fcw,
+		union acc_harq_layout_data *harq_layout)
+{
+	uint16_t harq_out_length, harq_in_length, ncb_p, k0_p, parity_offset;
+	uint32_t harq_index;
+	uint32_t l;
+
+	fcw->qm = op->ldpc_dec.q_m;
+	fcw->nfiller = op->ldpc_dec.n_filler;
+	fcw->BG = (op->ldpc_dec.basegraph - 1);
+	fcw->Zc = op->ldpc_dec.z_c;
+	fcw->ncb = op->ldpc_dec.n_cb;
+	fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_dec.basegraph,
+			op->ldpc_dec.rv_index);
+	if (op->ldpc_dec.code_block_mode == RTE_BBDEV_CODE_BLOCK)
+		fcw->rm_e = op->ldpc_dec.cb_params.e;
+	else
+		fcw->rm_e = (op->ldpc_dec.tb_params.r <
+				op->ldpc_dec.tb_params.cab) ?
+						op->ldpc_dec.tb_params.ea :
+						op->ldpc_dec.tb_params.eb;
+
+	if (unlikely(check_bit(op->ldpc_dec.op_flags,
+			RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE) &&
+			(op->ldpc_dec.harq_combined_input.length == 0))) {
+		rte_bbdev_log(WARNING, "Null HARQ input size provided");
+		/* Disable HARQ input in that case to carry forward. */
+		op->ldpc_dec.op_flags ^= RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE;
+	}
+	if (unlikely(fcw->rm_e == 0)) {
+		rte_bbdev_log(WARNING, "Null E input provided");
+		fcw->rm_e = 2;
+	}
+
+	fcw->hcin_en = check_bit(op->ldpc_dec.op_flags,
+			RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE);
+	fcw->hcout_en = check_bit(op->ldpc_dec.op_flags,
+			RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE);
+	fcw->crc_select = check_bit(op->ldpc_dec.op_flags,
+			RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK);
+	fcw->so_en = check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_SOFT_OUT_ENABLE);
+	fcw->so_bypass_intlv = check_bit(op->ldpc_dec.op_flags,
+			RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS);
+	fcw->so_bypass_rm = check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS);
+	fcw->bypass_dec = 0;
+	fcw->bypass_intlv = check_bit(op->ldpc_dec.op_flags,
+			RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS);
+	if (op->ldpc_dec.q_m == 1) {
+		fcw->bypass_intlv = 1;
+		fcw->qm = 2;
+	}
+	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION)) {
+		fcw->hcin_decomp_mode = 1;
+		fcw->hcout_comp_mode = 1;
+	} else if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_HARQ_4BIT_COMPRESSION)) {
+		fcw->hcin_decomp_mode = 4;
+		fcw->hcout_comp_mode = 4;
+	} else {
+		fcw->hcin_decomp_mode = 0;
+		fcw->hcout_comp_mode = 0;
+	}
+
+	fcw->llr_pack_mode = check_bit(op->ldpc_dec.op_flags,
+			RTE_BBDEV_LDPC_LLR_COMPRESSION);
+	harq_index = hq_index(op->ldpc_dec.harq_combined_output.offset);
+	if (fcw->hcin_en > 0) {
+		harq_in_length = op->ldpc_dec.harq_combined_input.length;
+		if (fcw->hcin_decomp_mode == 1)
+			harq_in_length = harq_in_length * 8 / 6;
+		else if (fcw->hcin_decomp_mode == 4)
+			harq_in_length = harq_in_length * 2;
+		harq_in_length = RTE_MIN(harq_in_length, op->ldpc_dec.n_cb
+				- op->ldpc_dec.n_filler);
+		harq_in_length = RTE_ALIGN_CEIL(harq_in_length, 64);
+		fcw->hcin_size0 = harq_in_length;
+		fcw->hcin_offset = 0;
+		fcw->hcin_size1 = 0;
+	} else {
+		fcw->hcin_size0 = 0;
+		fcw->hcin_offset = 0;
+		fcw->hcin_size1 = 0;
+	}
+
+	fcw->itmax = op->ldpc_dec.iter_max;
+	fcw->so_it = op->ldpc_dec.iter_max;
+	fcw->itstop = check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE);
+	fcw->cnu_algo = ACC_ALGO_MSA;
+	fcw->synd_precoder = fcw->itstop;
+
+	fcw->minsum_offset = 1;
+	fcw->dec_llrclip   = 2;
+
+	/*
+	 * These are all implicitly set
+	 * fcw->synd_post = 0;
+	 * fcw->dec_convllr = 0;
+	 * fcw->hcout_convllr = 0;
+	 * fcw->hcout_size1 = 0;
+	 * fcw->hcout_offset = 0;
+	 * fcw->negstop_th = 0;
+	 * fcw->negstop_it = 0;
+	 * fcw->negstop_en = 0;
+	 * fcw->gain_i = 1;
+	 * fcw->gain_h = 1;
+	 */
+	if (fcw->hcout_en > 0) {
+		parity_offset = (op->ldpc_dec.basegraph == 1 ? 20 : 8)
+			* op->ldpc_dec.z_c - op->ldpc_dec.n_filler;
+		k0_p = (fcw->k0 > parity_offset) ?
+				fcw->k0 - op->ldpc_dec.n_filler : fcw->k0;
+		ncb_p = fcw->ncb - op->ldpc_dec.n_filler;
+		l = k0_p + fcw->rm_e;
+		harq_out_length = (uint16_t) fcw->hcin_size0;
+		harq_out_length = RTE_MIN(RTE_MAX(harq_out_length, l), ncb_p);
+		harq_out_length = RTE_ALIGN_CEIL(harq_out_length, 64);
+		fcw->hcout_size0 = harq_out_length;
+		fcw->hcout_size1 = 0;
+		fcw->hcout_offset = 0;
+		harq_layout[harq_index].offset = fcw->hcout_offset;
+		harq_layout[harq_index].size0 = fcw->hcout_size0;
+	} else {
+		fcw->hcout_size0 = 0;
+		fcw->hcout_size1 = 0;
+		fcw->hcout_offset = 0;
+	}
+
+	fcw->tb_crc_select = 0;
+	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK))
+		fcw->tb_crc_select = 2;
+	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK))
+		fcw->tb_crc_select = 1;
+}
+
 static inline void
 vrb_dma_desc_ld_update(struct rte_bbdev_dec_op *op,
 		struct acc_dma_req_desc *desc,
@@ -1817,6 +2040,139 @@ vrb_dma_desc_ld_update(struct rte_bbdev_dec_op *op,
 	desc->op_addr = op;
 }
 
+static inline int
+vrb2_dma_desc_ld_fill(struct rte_bbdev_dec_op *op,
+		struct acc_dma_req_desc *desc,
+		struct rte_mbuf **input, struct rte_mbuf *h_output,
+		uint32_t *in_offset, uint32_t *h_out_offset,
+		uint32_t *h_out_length, uint32_t *mbuf_total_left,
+		uint32_t *seg_total_left, struct acc_fcw_ld *fcw)
+{
+	struct rte_bbdev_op_ldpc_dec *dec = &op->ldpc_dec;
+	int next_triplet = 1; /* FCW already done. */
+	uint32_t input_length;
+	uint16_t output_length, crc24_overlap = 0;
+	uint16_t sys_cols, K, h_p_size, h_np_size;
+
+	acc_header_init(desc);
+
+	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP))
+		crc24_overlap = 24;
+
+	/* Compute some LDPC BG lengths. */
+	input_length = fcw->rm_e;
+	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_LLR_COMPRESSION))
+		input_length = (input_length * 3 + 3) / 4;
+	sys_cols = (dec->basegraph == 1) ? 22 : 10;
+	K = sys_cols * dec->z_c;
+	output_length = K - dec->n_filler - crc24_overlap;
+
+	if (unlikely((*mbuf_total_left == 0) || (*mbuf_total_left < input_length))) {
+		rte_bbdev_log(ERR,
+				"Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u",
+				*mbuf_total_left, input_length);
+		return -1;
+	}
+
+	next_triplet = acc_dma_fill_blk_type_in(desc, input,
+			in_offset, input_length,
+			seg_total_left, next_triplet,
+			check_bit(op->ldpc_dec.op_flags,
+			RTE_BBDEV_LDPC_DEC_SCATTER_GATHER));
+
+	if (unlikely(next_triplet < 0)) {
+		rte_bbdev_log(ERR,
+				"Mismatch between data to process and mbuf data length in bbdev_op: %p",
+				op);
+		return -1;
+	}
+
+	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE)) {
+		if (op->ldpc_dec.harq_combined_input.data == 0) {
+			rte_bbdev_log(ERR, "HARQ input is not defined");
+			return -1;
+		}
+		h_p_size = fcw->hcin_size0 + fcw->hcin_size1;
+		if (fcw->hcin_decomp_mode == 1)
+			h_p_size = (h_p_size * 3 + 3) / 4;
+		else if (fcw->hcin_decomp_mode == 4)
+			h_p_size = h_p_size / 2;
+		if (op->ldpc_dec.harq_combined_input.data == 0) {
+			rte_bbdev_log(ERR, "HARQ input is not defined");
+			return -1;
+		}
+		acc_dma_fill_blk_type(
+				desc,
+				op->ldpc_dec.harq_combined_input.data,
+				op->ldpc_dec.harq_combined_input.offset,
+				h_p_size,
+				next_triplet,
+				ACC_DMA_BLKID_IN_HARQ);
+		next_triplet++;
+	}
+
+	desc->data_ptrs[next_triplet - 1].last = 1;
+	desc->m2dlen = next_triplet;
+	*mbuf_total_left -= input_length;
+
+	next_triplet = acc_dma_fill_blk_type(desc, h_output,
+			*h_out_offset, output_length >> 3, next_triplet,
+			ACC_DMA_BLKID_OUT_HARD);
+
+	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_SOFT_OUT_ENABLE)) {
+		if (op->ldpc_dec.soft_output.data == 0) {
+			rte_bbdev_log(ERR, "Soft output is not defined");
+			return -1;
+		}
+		dec->soft_output.length = fcw->rm_e;
+		acc_dma_fill_blk_type(desc, dec->soft_output.data, dec->soft_output.offset,
+				fcw->rm_e, next_triplet, ACC_DMA_BLKID_OUT_SOFT);
+		next_triplet++;
+	}
+
+	if (check_bit(op->ldpc_dec.op_flags,
+				RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE)) {
+		if (op->ldpc_dec.harq_combined_output.data == 0) {
+			rte_bbdev_log(ERR, "HARQ output is not defined");
+			return -1;
+		}
+
+		/* Pruned size of the HARQ */
+		h_p_size = fcw->hcout_size0 + fcw->hcout_size1;
+		/* Non-Pruned size of the HARQ */
+		h_np_size = fcw->hcout_offset > 0 ?
+				fcw->hcout_offset + fcw->hcout_size1 :
+				h_p_size;
+		if (fcw->hcin_decomp_mode == 1) {
+			h_np_size = (h_np_size * 3 + 3) / 4;
+			h_p_size = (h_p_size * 3 + 3) / 4;
+		} else if (fcw->hcin_decomp_mode == 4) {
+			h_np_size = h_np_size / 2;
+			h_p_size = h_p_size / 2;
+		}
+		dec->harq_combined_output.length = h_np_size;
+		acc_dma_fill_blk_type(
+				desc,
+				dec->harq_combined_output.data,
+				dec->harq_combined_output.offset,
+				h_p_size,
+				next_triplet,
+				ACC_DMA_BLKID_OUT_HARQ);
+
+		next_triplet++;
+	}
+
+	*h_out_length = output_length >> 3;
+	dec->hard_output.length += *h_out_length;
+	*h_out_offset += *h_out_length;
+	desc->data_ptrs[next_triplet - 1].last = 1;
+	desc->d2mlen = next_triplet - desc->m2dlen;
+
+	desc->op_addr = op;
+
+	return 0;
+}
+
 /* Enqueue one encode operations for device in CB mode. */
 static inline int
 enqueue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op *op,
@@ -1877,6 +2233,7 @@ enqueue_ldpc_enc_n_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ops,
 	/** This could be done at polling. */
 	acc_header_init(&desc->req);
 	desc->req.numCBs = num;
+	desc->req.dltb = 0;
 
 	in_length_in_bytes = ops[0]->ldpc_enc.input.data->data_len;
 	out_length = (enc->cb_params.e + 7) >> 3;
@@ -2102,6 +2459,105 @@ vrb1_enqueue_ldpc_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op *op
 	return return_descs;
 }
 
+/* Fill in a frame control word for LDPC encoding. */
+static inline void
+vrb2_fcw_letb_fill(const struct rte_bbdev_enc_op *op, struct acc_fcw_le *fcw)
+{
+	fcw->qm = op->ldpc_enc.q_m;
+	fcw->nfiller = op->ldpc_enc.n_filler;
+	fcw->BG = (op->ldpc_enc.basegraph - 1);
+	fcw->Zc = op->ldpc_enc.z_c;
+	fcw->ncb = op->ldpc_enc.n_cb;
+	fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_enc.basegraph,
+			op->ldpc_enc.rv_index);
+	fcw->rm_e = op->ldpc_enc.tb_params.ea;
+	fcw->rm_e_b = op->ldpc_enc.tb_params.eb;
+	fcw->crc_select = check_bit(op->ldpc_enc.op_flags,
+			RTE_BBDEV_LDPC_CRC_24B_ATTACH);
+	fcw->bypass_intlv = 0;
+	if (op->ldpc_enc.tb_params.c > 1) {
+		fcw->mcb_count = 0;
+		fcw->C = op->ldpc_enc.tb_params.c;
+		fcw->Cab = op->ldpc_enc.tb_params.cab;
+	} else {
+		fcw->mcb_count = 1;
+		fcw->C = 0;
+	}
+}
+
+/* Enqueue one encode operations for device in TB mode.
+ * returns the number of descs used.
+ */
+static inline int
+vrb2_enqueue_ldpc_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op *op,
+		uint16_t enq_descs)
+{
+	union acc_dma_desc *desc = NULL;
+	uint32_t in_offset, out_offset, out_length, seg_total_left;
+	struct rte_mbuf *input, *output_head, *output;
+
+	uint16_t desc_idx = ((q->sw_ring_head + enq_descs) & q->sw_ring_wrap_mask);
+	desc = q->ring_addr + desc_idx;
+	vrb2_fcw_letb_fill(op, &desc->req.fcw_le);
+	struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc;
+	int next_triplet = 1; /* FCW already done */
+	uint32_t in_length_in_bytes;
+	uint16_t K, in_length_in_bits;
+
+	input = enc->input.data;
+	output_head = output = enc->output.data;
+	in_offset = enc->input.offset;
+	out_offset = enc->output.offset;
+	seg_total_left = rte_pktmbuf_data_len(enc->input.data) - in_offset;
+
+	acc_header_init(&desc->req);
+	K = (enc->basegraph == 1 ? 22 : 10) * enc->z_c;
+	in_length_in_bits = K - enc->n_filler;
+	if ((enc->op_flags & RTE_BBDEV_LDPC_CRC_24A_ATTACH) ||
+			(enc->op_flags & RTE_BBDEV_LDPC_CRC_24B_ATTACH))
+		in_length_in_bits -= 24;
+	in_length_in_bytes = (in_length_in_bits >> 3) * enc->tb_params.c;
+
+	next_triplet = acc_dma_fill_blk_type_in(&desc->req, &input, &in_offset,
+			in_length_in_bytes, &seg_total_left, next_triplet,
+			check_bit(enc->op_flags, RTE_BBDEV_LDPC_ENC_SCATTER_GATHER));
+	if (unlikely(next_triplet < 0)) {
+		rte_bbdev_log(ERR,
+				"Mismatch between data to process and mbuf data length in bbdev_op: %p",
+				op);
+		return -1;
+	}
+	desc->req.data_ptrs[next_triplet - 1].last = 1;
+	desc->req.m2dlen = next_triplet;
+
+	/* Set output length */
+	/* Integer round up division by 8 */
+	out_length = (enc->tb_params.ea * enc->tb_params.cab +
+			enc->tb_params.eb * (enc->tb_params.c - enc->tb_params.cab)  + 7) >> 3;
+
+	next_triplet = acc_dma_fill_blk_type(&desc->req, output, out_offset,
+			out_length, next_triplet, ACC_DMA_BLKID_OUT_ENC);
+	enc->output.length = out_length;
+	out_offset += out_length;
+	desc->req.data_ptrs[next_triplet - 1].last = 1;
+	desc->req.data_ptrs[next_triplet - 1].dma_ext = 0;
+	desc->req.d2mlen = next_triplet - desc->req.m2dlen;
+	desc->req.numCBs = enc->tb_params.c;
+	if (desc->req.numCBs > 1)
+		desc->req.dltb = 1;
+	desc->req.op_addr = op;
+
+	if (out_length < ACC_MAX_E_MBUF)
+		mbuf_append(output_head, output, out_length);
+
+#ifdef RTE_LIBRTE_BBDEV_DEBUG
+	rte_memdump(stderr, "FCW", &desc->req.fcw_le, sizeof(desc->req.fcw_le));
+	rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
+#endif
+	/* One CB (one op) was successfully prepared to enqueue */
+	return 1;
+}
+
 /** Enqueue one decode operations for device in CB mode. */
 static inline int
 enqueue_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
@@ -2215,10 +2671,16 @@ vrb_enqueue_ldpc_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
 		else
 			seg_total_left = fcw->rm_e;
 
-		ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input, h_output,
-				&in_offset, &h_out_offset,
-				&h_out_length, &mbuf_total_left,
-				&seg_total_left, fcw);
+		if (q->d->device_variant == VRB1_VARIANT)
+			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input, h_output,
+					&in_offset, &h_out_offset,
+					&h_out_length, &mbuf_total_left,
+					&seg_total_left, fcw);
+		else
+			ret = vrb2_dma_desc_ld_fill(op, &desc->req, &input, h_output,
+					&in_offset, &h_out_offset,
+					&h_out_length, &mbuf_total_left,
+					&seg_total_left, fcw);
 		if (unlikely(ret < 0))
 			return ret;
 	}
@@ -2308,11 +2770,18 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
 		rte_memcpy(&desc->req.fcw_ld, &desc_first->req.fcw_ld, ACC_FCW_LD_BLEN);
 		desc->req.fcw_ld.tb_trailer_size = (c - r - 1) * trail_len;
 
-		ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
-				h_output, &in_offset, &h_out_offset,
-				&h_out_length,
-				&mbuf_total_left, &seg_total_left,
-				&desc->req.fcw_ld);
+		if (q->d->device_variant == VRB1_VARIANT)
+			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
+					h_output, &in_offset, &h_out_offset,
+					&h_out_length,
+					&mbuf_total_left, &seg_total_left,
+					&desc->req.fcw_ld);
+		else
+			ret = vrb2_dma_desc_ld_fill(op, &desc->req, &input,
+					h_output, &in_offset, &h_out_offset,
+					&h_out_length,
+					&mbuf_total_left, &seg_total_left,
+					&desc->req.fcw_ld);
 
 		if (unlikely(ret < 0))
 			return ret;
@@ -2576,14 +3045,22 @@ vrb_enqueue_ldpc_enc_tb(struct rte_bbdev_queue_data *q_data,
 	int descs_used;
 
 	for (i = 0; i < num; ++i) {
-		cbs_in_tb = get_num_cbs_in_tb_ldpc_enc(&ops[i]->ldpc_enc);
-		/* Check if there are available space for further processing. */
-		if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) {
-			acc_enqueue_ring_full(q_data);
-			break;
+		if (q->d->device_variant == VRB1_VARIANT) {
+			cbs_in_tb = get_num_cbs_in_tb_ldpc_enc(&ops[i]->ldpc_enc);
+			/* Check if there are available space for further processing. */
+			if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) {
+				acc_enqueue_ring_full(q_data);
+				break;
+			}
+			descs_used = vrb1_enqueue_ldpc_enc_one_op_tb(q, ops[i],
+					enqueued_descs, cbs_in_tb);
+		} else {
+			if (unlikely(avail < 1)) {
+				acc_enqueue_ring_full(q_data);
+				break;
+			}
+			descs_used = vrb2_enqueue_ldpc_enc_one_op_tb(q, ops[i], enqueued_descs);
 		}
-
-		descs_used = vrb1_enqueue_ldpc_enc_one_op_tb(q, ops[i], enqueued_descs, cbs_in_tb);
 		if (descs_used < 0) {
 			acc_enqueue_invalid(q_data);
 			break;
@@ -2865,6 +3342,52 @@ vrb_dequeue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op,
 	return desc->req.numCBs;
 }
 
+/* Dequeue one LDPC encode operations from VRB2 device in TB mode. */
+static inline int
+vrb2_dequeue_ldpc_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op,
+		uint16_t *dequeued_ops, uint32_t *aq_dequeued,
+		uint16_t *dequeued_descs)
+{
+	union acc_dma_desc *desc, atom_desc;
+	union acc_dma_rsp_desc rsp;
+	struct rte_bbdev_enc_op *op;
+	int desc_idx = ((q->sw_ring_tail + *dequeued_descs) & q->sw_ring_wrap_mask);
+
+	desc = q->ring_addr + desc_idx;
+	atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, __ATOMIC_RELAXED);
+
+	/* Check fdone bit. */
+	if (!(atom_desc.rsp.val & ACC_FDONE))
+		return -1;
+
+	rsp.val = atom_desc.rsp.val;
+	rte_bbdev_log_debug("Resp. desc %p: %x", desc, rsp.val);
+
+	/* Dequeue. */
+	op = desc->req.op_addr;
+
+	/* Clearing status, it will be set based on response. */
+	op->status = 0;
+	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
+	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
+	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
+	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
+
+	if (desc->req.last_desc_in_batch) {
+		(*aq_dequeued)++;
+		desc->req.last_desc_in_batch = 0;
+	}
+	desc->rsp.val = ACC_DMA_DESC_TYPE;
+	desc->rsp.add_info_0 = 0; /* Reserved bits. */
+	desc->rsp.add_info_1 = 0; /* Reserved bits. */
+
+	/* One op was successfully dequeued */
+	ref_op[0] = op;
+	(*dequeued_descs)++;
+	(*dequeued_ops)++;
+	return 1;
+}
+
 /* Dequeue one LDPC encode operations from device in TB mode.
  * That operation may cover multiple descriptors.
  */
@@ -3189,9 +3712,14 @@ vrb_dequeue_ldpc_enc(struct rte_bbdev_queue_data *q_data,
 
 	for (i = 0; i < avail; i++) {
 		if (cbm == RTE_BBDEV_TRANSPORT_BLOCK)
-			ret = vrb_dequeue_enc_one_op_tb(q, &ops[dequeued_ops],
-					&dequeued_ops, &aq_dequeued,
-					&dequeued_descs, num);
+			if (q->d->device_variant == VRB1_VARIANT)
+				ret = vrb_dequeue_enc_one_op_tb(q, &ops[dequeued_ops],
+						&dequeued_ops, &aq_dequeued,
+						&dequeued_descs, num);
+			else
+				ret = vrb2_dequeue_ldpc_enc_one_op_tb(q, &ops[dequeued_ops],
+						&dequeued_ops, &aq_dequeued,
+						&dequeued_descs);
 		else
 			ret = vrb_dequeue_enc_one_op_cb(q, &ops[dequeued_ops],
 					&dequeued_ops, &aq_dequeued,
@@ -3536,6 +4064,7 @@ vrb_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
 	} else {
 		d->device_variant = VRB2_VARIANT;
 		d->queue_offset = vrb2_queue_offset;
+		d->fcw_ld_fill = vrb2_fcw_ld_fill;
 		d->num_qgroups = VRB2_NUM_QGRPS;
 		d->num_aqs = VRB2_NUM_AQS;
 		if (d->pf_device)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
                   ` (7 preceding siblings ...)
  2023-09-29 16:35 ` [PATCH v3 08/12] baseband/acc: add FEC capabilities for the VRB2 variant Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-10-03 14:36   ` Maxime Coquelin
  2023-09-29 16:35 ` [PATCH v3 10/12] baseband/acc: add MLD support in " Nicolas Chautru
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

Support for the FFT the processing specific to the
VRB2 variant.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 drivers/baseband/acc/rte_vrb_pmd.c | 132 ++++++++++++++++++++++++++++-
 1 file changed, 128 insertions(+), 4 deletions(-)

diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index 93add82947..ce4b90d8e7 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -903,6 +903,9 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t queue_id,
 			ACC_FCW_LD_BLEN : (conf->op_type == RTE_BBDEV_OP_FFT ?
 			ACC_FCW_FFT_BLEN : ACC_FCW_MLDTS_BLEN))));
 
+	if ((q->d->device_variant == VRB2_VARIANT) && (conf->op_type == RTE_BBDEV_OP_FFT))
+		fcw_len = ACC_FCW_FFT_BLEN_3;
+
 	for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) {
 		desc = q->ring_addr + desc_idx;
 		desc->req.word0 = ACC_DMA_DESC_TYPE;
@@ -1323,6 +1326,24 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
 			.num_buffers_soft_out = 0,
 			}
 		},
+		{
+			.type	= RTE_BBDEV_OP_FFT,
+			.cap.fft = {
+				.capability_flags =
+						RTE_BBDEV_FFT_WINDOWING |
+						RTE_BBDEV_FFT_CS_ADJUSTMENT |
+						RTE_BBDEV_FFT_DFT_BYPASS |
+						RTE_BBDEV_FFT_IDFT_BYPASS |
+						RTE_BBDEV_FFT_FP16_INPUT |
+						RTE_BBDEV_FFT_FP16_OUTPUT |
+						RTE_BBDEV_FFT_POWER_MEAS |
+						RTE_BBDEV_FFT_WINDOWING_BYPASS,
+				.num_buffers_src =
+						1,
+				.num_buffers_dst =
+						1,
+			}
+		},
 		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
 	};
 
@@ -3849,6 +3870,47 @@ vrb1_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct acc_fcw_fft *fcw)
 		fcw->bypass = 0;
 }
 
+/* Fill in a frame control word for FFT processing. */
+static inline void
+vrb2_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct acc_fcw_fft_3 *fcw)
+{
+	fcw->in_frame_size = op->fft.input_sequence_size;
+	fcw->leading_pad_size = op->fft.input_leading_padding;
+	fcw->out_frame_size = op->fft.output_sequence_size;
+	fcw->leading_depad_size = op->fft.output_leading_depadding;
+	fcw->cs_window_sel = op->fft.window_index[0] +
+			(op->fft.window_index[1] << 8) +
+			(op->fft.window_index[2] << 16) +
+			(op->fft.window_index[3] << 24);
+	fcw->cs_window_sel2 = op->fft.window_index[4] +
+			(op->fft.window_index[5] << 8);
+	fcw->cs_enable_bmap = op->fft.cs_bitmap;
+	fcw->num_antennas = op->fft.num_antennas_log2;
+	fcw->idft_size = op->fft.idft_log2;
+	fcw->dft_size = op->fft.dft_log2;
+	fcw->cs_offset = op->fft.cs_time_adjustment;
+	fcw->idft_shift = op->fft.idft_shift;
+	fcw->dft_shift = op->fft.dft_shift;
+	fcw->cs_multiplier = op->fft.ncs_reciprocal;
+	fcw->power_shift = op->fft.power_shift;
+	fcw->exp_adj = op->fft.fp16_exp_adjust;
+	fcw->fp16_in = check_bit(op->fft.op_flags, RTE_BBDEV_FFT_FP16_INPUT);
+	fcw->fp16_out = check_bit(op->fft.op_flags, RTE_BBDEV_FFT_FP16_OUTPUT);
+	fcw->power_en = check_bit(op->fft.op_flags, RTE_BBDEV_FFT_POWER_MEAS);
+	if (check_bit(op->fft.op_flags,
+			RTE_BBDEV_FFT_IDFT_BYPASS)) {
+		if (check_bit(op->fft.op_flags,
+				RTE_BBDEV_FFT_WINDOWING_BYPASS))
+			fcw->bypass = 2;
+		else
+			fcw->bypass = 1;
+	} else if (check_bit(op->fft.op_flags,
+			RTE_BBDEV_FFT_DFT_BYPASS))
+		fcw->bypass = 3;
+	else
+		fcw->bypass = 0;
+}
+
 static inline int
 vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
 		struct acc_dma_req_desc *desc,
@@ -3882,6 +3944,58 @@ vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
 	return 0;
 }
 
+static inline int
+vrb2_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
+		struct acc_dma_req_desc *desc,
+		struct rte_mbuf *input, struct rte_mbuf *output, struct rte_mbuf *win_input,
+		struct rte_mbuf *pwr, uint32_t *in_offset, uint32_t *out_offset,
+		uint32_t *win_offset, uint32_t *pwr_offset)
+{
+	bool pwr_en = check_bit(op->fft.op_flags, RTE_BBDEV_FFT_POWER_MEAS);
+	bool win_en = check_bit(op->fft.op_flags, RTE_BBDEV_FFT_DEWINDOWING);
+	int num_cs = 0, i, bd_idx = 1;
+
+	/* FCW already done */
+	acc_header_init(desc);
+
+	RTE_SET_USED(win_input);
+	RTE_SET_USED(win_offset);
+
+	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(input, *in_offset);
+	desc->data_ptrs[bd_idx].blen = op->fft.input_sequence_size * ACC_IQ_SIZE;
+	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_IN;
+	desc->data_ptrs[bd_idx].last = 1;
+	desc->data_ptrs[bd_idx].dma_ext = 0;
+	bd_idx++;
+
+	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(output, *out_offset);
+	desc->data_ptrs[bd_idx].blen = op->fft.output_sequence_size * ACC_IQ_SIZE;
+	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_HARD;
+	desc->data_ptrs[bd_idx].last = pwr_en ? 0 : 1;
+	desc->data_ptrs[bd_idx].dma_ext = 0;
+	desc->m2dlen = win_en ? 3 : 2;
+	desc->d2mlen = pwr_en ? 2 : 1;
+	desc->ib_ant_offset = op->fft.input_sequence_size;
+	desc->num_ant = op->fft.num_antennas_log2 - 3;
+
+	for (i = 0; i < RTE_BBDEV_MAX_CS; i++)
+		if (check_bit(op->fft.cs_bitmap, 1 << i))
+			num_cs++;
+	desc->num_cs = num_cs;
+
+	if (pwr_en && pwr) {
+		bd_idx++;
+		desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(pwr, *pwr_offset);
+		desc->data_ptrs[bd_idx].blen = num_cs * (1 << op->fft.num_antennas_log2) * 4;
+		desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_SOFT;
+		desc->data_ptrs[bd_idx].last = 1;
+		desc->data_ptrs[bd_idx].dma_ext = 0;
+	}
+	desc->ob_cyc_offset = op->fft.output_sequence_size;
+	desc->ob_ant_offset = op->fft.output_sequence_size * num_cs;
+	desc->op_addr = op;
+	return 0;
+}
 
 /** Enqueue one FFT operation for device. */
 static inline int
@@ -3889,22 +4003,32 @@ vrb_enqueue_fft_one_op(struct acc_queue *q, struct rte_bbdev_fft_op *op,
 		uint16_t total_enqueued_cbs)
 {
 	union acc_dma_desc *desc;
-	struct rte_mbuf *input, *output;
-	uint32_t in_offset, out_offset;
+	struct rte_mbuf *input, *output, *pwr, *win;
+	uint32_t in_offset, out_offset, pwr_offset, win_offset;
 	struct acc_fcw_fft *fcw;
 
 	desc = acc_desc(q, total_enqueued_cbs);
 	input = op->fft.base_input.data;
 	output = op->fft.base_output.data;
+	pwr = op->fft.power_meas_output.data;
+	win = op->fft.dewindowing_input.data;
 	in_offset = op->fft.base_input.offset;
 	out_offset = op->fft.base_output.offset;
+	pwr_offset = op->fft.power_meas_output.offset;
+	win_offset = op->fft.dewindowing_input.offset;
 
 	fcw = (struct acc_fcw_fft *) (q->fcw_ring +
 			((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask)
 			* ACC_MAX_FCW_SIZE);
 
-	vrb1_fcw_fft_fill(op, fcw);
-	vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset, &out_offset);
+	if (q->d->device_variant == VRB1_VARIANT) {
+		vrb1_fcw_fft_fill(op, fcw);
+		vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset, &out_offset);
+	} else {
+		vrb2_fcw_fft_fill(op, (struct acc_fcw_fft_3 *) fcw);
+		vrb2_dma_desc_fft_fill(op, &desc->req, input, output, win, pwr,
+				&in_offset, &out_offset, &win_offset, &pwr_offset);
+	}
 #ifdef RTE_LIBRTE_BBDEV_DEBUG
 	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
 			sizeof(desc->req.fcw_fft));
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 10/12] baseband/acc: add MLD support in VRB2 variant
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
                   ` (8 preceding siblings ...)
  2023-09-29 16:35 ` [PATCH v3 09/12] baseband/acc: add FFT support to " Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-10-03 15:12   ` Maxime Coquelin
  2023-09-29 16:35 ` [PATCH v3 11/12] baseband/acc: add support for VRB2 engine error detection Nicolas Chautru
  2023-09-29 16:35 ` [PATCH v3 12/12] baseband/acc: add configure helper for VRB2 Nicolas Chautru
  11 siblings, 1 reply; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

Adding the capability for the MLD-TS processing specific to
the VRB2 variant.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 drivers/baseband/acc/rte_vrb_pmd.c | 378 +++++++++++++++++++++++++++++
 1 file changed, 378 insertions(+)

diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index ce4b90d8e7..a9d3db86e6 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -1344,6 +1344,17 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
 						1,
 			}
 		},
+		{
+			.type	= RTE_BBDEV_OP_MLDTS,
+			.cap.mld = {
+				.capability_flags =
+						RTE_BBDEV_MLDTS_REP,
+				.num_buffers_src =
+						1,
+				.num_buffers_dst =
+						1,
+			}
+		},
 		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
 	};
 
@@ -4151,6 +4162,371 @@ vrb_dequeue_fft(struct rte_bbdev_queue_data *q_data,
 	return i;
 }
 
+/* Fill in a frame control word for MLD-TS processing. */
+static inline void
+vrb2_fcw_mldts_fill(struct rte_bbdev_mldts_op *op, struct acc_fcw_mldts *fcw)
+{
+	fcw->nrb = op->mldts.num_rbs;
+	fcw->NLayers = op->mldts.num_layers - 1;
+	fcw->Qmod0 = (op->mldts.q_m[0] >> 1) - 1;
+	fcw->Qmod1 = (op->mldts.q_m[1] >> 1) - 1;
+	fcw->Qmod2 = (op->mldts.q_m[2] >> 1) - 1;
+	fcw->Qmod3 = (op->mldts.q_m[3] >> 1) - 1;
+	/* Mark some layers as disabled */
+	if (op->mldts.num_layers == 2) {
+		fcw->Qmod2 = 3;
+		fcw->Qmod3 = 3;
+	}
+	if (op->mldts.num_layers == 3)
+		fcw->Qmod3 = 3;
+	fcw->Rrep = op->mldts.r_rep;
+	fcw->Crep = op->mldts.c_rep;
+}
+
+/* Fill in descriptor for one MLD-TS processing operation. */
+static inline int
+vrb2_dma_desc_mldts_fill(struct rte_bbdev_mldts_op *op,
+		struct acc_dma_req_desc *desc,
+		struct rte_mbuf *input_q, struct rte_mbuf *input_r,
+		struct rte_mbuf *output,
+		uint32_t *in_offset, uint32_t *out_offset)
+{
+	uint16_t qsize_per_re[VRB2_MLD_LAY_SIZE] = {8, 12, 16}; /* Layer 2 to 4. */
+	uint16_t rsize_per_re[VRB2_MLD_LAY_SIZE] = {14, 26, 42};
+	uint16_t sc_factor_per_rrep[VRB2_MLD_RREP_SIZE] = {12, 6, 4, 3, 0, 2};
+	uint16_t i, outsize_per_re = 0;
+	uint32_t sc_num, r_num, q_size, r_size, out_size;
+
+	/* Prevent out of range access. */
+	if (op->mldts.r_rep > 5)
+		op->mldts.r_rep = 5;
+	if (op->mldts.num_layers < 2)
+		op->mldts.num_layers = 2;
+	if (op->mldts.num_layers > 4)
+		op->mldts.num_layers = 4;
+	for (i = 0; i < op->mldts.num_layers; i++)
+		outsize_per_re += op->mldts.q_m[i];
+	sc_num = op->mldts.num_rbs * RTE_BBDEV_SCPERRB * (op->mldts.c_rep + 1);
+	r_num = op->mldts.num_rbs * sc_factor_per_rrep[op->mldts.r_rep];
+	q_size = qsize_per_re[op->mldts.num_layers - 2] * sc_num;
+	r_size = rsize_per_re[op->mldts.num_layers - 2] * r_num;
+	out_size =  sc_num * outsize_per_re;
+	/* printf("Sc %d R num %d Size %d %d %d\n", sc_num, r_num, q_size, r_size, out_size); */
+
+	/* FCW already done. */
+	acc_header_init(desc);
+	desc->data_ptrs[1].address = rte_pktmbuf_iova_offset(input_q, *in_offset);
+	desc->data_ptrs[1].blen = q_size;
+	desc->data_ptrs[1].blkid = ACC_DMA_BLKID_IN;
+	desc->data_ptrs[1].last = 0;
+	desc->data_ptrs[1].dma_ext = 0;
+	desc->data_ptrs[2].address = rte_pktmbuf_iova_offset(input_r, *in_offset);
+	desc->data_ptrs[2].blen = r_size;
+	desc->data_ptrs[2].blkid = ACC_DMA_BLKID_IN_MLD_R;
+	desc->data_ptrs[2].last = 1;
+	desc->data_ptrs[2].dma_ext = 0;
+	desc->data_ptrs[3].address = rte_pktmbuf_iova_offset(output, *out_offset);
+	desc->data_ptrs[3].blen = out_size;
+	desc->data_ptrs[3].blkid = ACC_DMA_BLKID_OUT_HARD;
+	desc->data_ptrs[3].last = 1;
+	desc->data_ptrs[3].dma_ext = 0;
+	desc->m2dlen = 3;
+	desc->d2mlen = 1;
+	desc->op_addr = op;
+	desc->cbs_in_tb = 1;
+
+	return 0;
+}
+
+/* Check whether the MLD operation can be processed as a single operation. */
+static inline bool
+vrb2_check_mld_r_constraint(struct rte_bbdev_mldts_op *op) {
+	uint8_t layer_idx, rrep_idx;
+	uint16_t max_rb[VRB2_MLD_LAY_SIZE][VRB2_MLD_RREP_SIZE] = {
+			{188, 275, 275, 275, 0, 275},
+			{101, 202, 275, 275, 0, 275},
+			{62, 124, 186, 248, 0, 275} };
+
+	if (op->mldts.c_rep == 0)
+		return true;
+
+	layer_idx = RTE_MIN(op->mldts.num_layers - VRB2_MLD_MIN_LAYER,
+			VRB2_MLD_MAX_LAYER - VRB2_MLD_MIN_LAYER);
+	rrep_idx = RTE_MIN(op->mldts.r_rep, VRB2_MLD_MAX_RREP);
+	rte_bbdev_log_debug("RB %d index %d %d max %d\n", op->mldts.num_rbs, layer_idx, rrep_idx,
+			max_rb[layer_idx][rrep_idx]);
+
+	return (op->mldts.num_rbs <= max_rb[layer_idx][rrep_idx]);
+}
+
+/** Enqueue MLDTS operation split across symbols. */
+static inline int
+enqueue_mldts_split_op(struct acc_queue *q, struct rte_bbdev_mldts_op *op,
+		uint16_t total_enqueued_descs)
+{
+	uint16_t qsize_per_re[VRB2_MLD_LAY_SIZE] = {8, 12, 16}; /* Layer 2 to 4. */
+	uint16_t rsize_per_re[VRB2_MLD_LAY_SIZE] = {14, 26, 42};
+	uint16_t sc_factor_per_rrep[VRB2_MLD_RREP_SIZE] = {12, 6, 4, 3, 0, 2};
+	uint32_t i, outsize_per_re = 0, sc_num, r_num, q_size, r_size, out_size, num_syms;
+	union acc_dma_desc *desc, *first_desc;
+	uint16_t desc_idx, symb;
+	struct rte_mbuf *input_q, *input_r, *output;
+	uint32_t in_offset, out_offset;
+	struct acc_fcw_mldts *fcw;
+
+	desc_idx = ((q->sw_ring_head + total_enqueued_descs) & q->sw_ring_wrap_mask);
+	first_desc = q->ring_addr + desc_idx;
+	input_q = op->mldts.qhy_input.data;
+	input_r = op->mldts.r_input.data;
+	output = op->mldts.output.data;
+	in_offset = op->mldts.qhy_input.offset;
+	out_offset = op->mldts.output.offset;
+	num_syms = op->mldts.c_rep + 1;
+	fcw = &first_desc->req.fcw_mldts;
+	vrb2_fcw_mldts_fill(op, fcw);
+	fcw->Crep = 0; /* C rep forced to zero. */
+
+	/* Prevent out of range access. */
+	if (op->mldts.r_rep > 5)
+		op->mldts.r_rep = 5;
+	if (op->mldts.num_layers < 2)
+		op->mldts.num_layers = 2;
+	if (op->mldts.num_layers > 4)
+		op->mldts.num_layers = 4;
+
+	for (i = 0; i < op->mldts.num_layers; i++)
+		outsize_per_re += op->mldts.q_m[i];
+	sc_num = op->mldts.num_rbs * RTE_BBDEV_SCPERRB; /* C rep forced to zero. */
+	r_num = op->mldts.num_rbs * sc_factor_per_rrep[op->mldts.r_rep];
+	q_size = qsize_per_re[op->mldts.num_layers - 2] * sc_num;
+	r_size = rsize_per_re[op->mldts.num_layers - 2] * r_num;
+	out_size =  sc_num * outsize_per_re;
+
+	for (symb = 0; symb < num_syms; symb++) {
+		desc_idx = ((q->sw_ring_head + total_enqueued_descs + symb) & q->sw_ring_wrap_mask);
+		desc = q->ring_addr + desc_idx;
+		acc_header_init(&desc->req);
+		if (symb == 0)
+			desc->req.cbs_in_tb = num_syms;
+		else
+			rte_memcpy(&desc->req.fcw_mldts, fcw, ACC_FCW_MLDTS_BLEN);
+		desc->req.data_ptrs[1].address = rte_pktmbuf_iova_offset(input_q, in_offset);
+		desc->req.data_ptrs[1].blen = q_size;
+		in_offset += q_size;
+		desc->req.data_ptrs[1].blkid = ACC_DMA_BLKID_IN;
+		desc->req.data_ptrs[1].last = 0;
+		desc->req.data_ptrs[1].dma_ext = 0;
+		desc->req.data_ptrs[2].address = rte_pktmbuf_iova_offset(input_r, 0);
+		desc->req.data_ptrs[2].blen = r_size;
+		desc->req.data_ptrs[2].blkid = ACC_DMA_BLKID_IN_MLD_R;
+		desc->req.data_ptrs[2].last = 1;
+		desc->req.data_ptrs[2].dma_ext = 0;
+		desc->req.data_ptrs[3].address = rte_pktmbuf_iova_offset(output, out_offset);
+		desc->req.data_ptrs[3].blen = out_size;
+		out_offset += out_size;
+		desc->req.data_ptrs[3].blkid = ACC_DMA_BLKID_OUT_HARD;
+		desc->req.data_ptrs[3].last = 1;
+		desc->req.data_ptrs[3].dma_ext = 0;
+		desc->req.m2dlen = VRB2_MLD_M2DLEN;
+		desc->req.d2mlen = 1;
+		desc->req.op_addr = op;
+
+#ifdef RTE_LIBRTE_BBDEV_DEBUG
+		rte_memdump(stderr, "FCW", &desc->req.fcw_mldts, sizeof(desc->req.fcw_mldts));
+		rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
+#endif
+	}
+	desc->req.sdone_enable = 0;
+
+	return num_syms;
+}
+
+/** Enqueue one MLDTS operation. */
+static inline int
+enqueue_mldts_one_op(struct acc_queue *q, struct rte_bbdev_mldts_op *op,
+		uint16_t total_enqueued_descs)
+{
+	union acc_dma_desc *desc;
+	uint16_t desc_idx;
+	struct rte_mbuf *input_q, *input_r, *output;
+	uint32_t in_offset, out_offset;
+	struct acc_fcw_mldts *fcw;
+
+	desc_idx = ((q->sw_ring_head + total_enqueued_descs) & q->sw_ring_wrap_mask);
+	desc = q->ring_addr + desc_idx;
+	input_q = op->mldts.qhy_input.data;
+	input_r = op->mldts.r_input.data;
+	output = op->mldts.output.data;
+	in_offset = op->mldts.qhy_input.offset;
+	out_offset = op->mldts.output.offset;
+	fcw = &desc->req.fcw_mldts;
+	vrb2_fcw_mldts_fill(op, fcw);
+	vrb2_dma_desc_mldts_fill(op, &desc->req, input_q, input_r, output,
+			&in_offset, &out_offset);
+#ifdef RTE_LIBRTE_BBDEV_DEBUG
+	rte_memdump(stderr, "FCW", &desc->req.fcw_mldts, sizeof(desc->req.fcw_mldts));
+	rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
+#endif
+	return 1;
+}
+
+/* Enqueue MLDTS operations. */
+static uint16_t
+vrb2_enqueue_mldts(struct rte_bbdev_queue_data *q_data,
+		struct rte_bbdev_mldts_op **ops, uint16_t num)
+{
+	int32_t aq_avail, avail;
+	struct acc_queue *q = q_data->queue_private;
+	uint16_t i, enqueued_descs = 0, descs_in_op;
+	int ret;
+	bool as_one_op;
+
+	aq_avail = acc_aq_avail(q_data, num);
+	if (unlikely((aq_avail <= 0) || (num == 0)))
+		return 0;
+	avail = acc_ring_avail_enq(q);
+
+	for (i = 0; i < num; ++i) {
+		as_one_op = vrb2_check_mld_r_constraint(ops[i]);
+		descs_in_op = as_one_op ? 1 : ops[i]->mldts.c_rep + 1;
+
+		/* Check if there are available space for further processing. */
+		if (unlikely(avail < descs_in_op)) {
+			acc_enqueue_ring_full(q_data);
+			break;
+		}
+		avail -= descs_in_op;
+
+		if (as_one_op)
+			ret = enqueue_mldts_one_op(q, ops[i], enqueued_descs);
+		else
+			ret = enqueue_mldts_split_op(q, ops[i], enqueued_descs);
+
+		if (ret < 0) {
+			acc_enqueue_invalid(q_data);
+			break;
+		}
+
+		enqueued_descs += ret;
+	}
+
+	if (unlikely(i == 0))
+		return 0; /* Nothing to enqueue. */
+
+	acc_dma_enqueue(q, enqueued_descs, &q_data->queue_stats);
+
+	/* Update stats. */
+	q_data->queue_stats.enqueued_count += i;
+	q_data->queue_stats.enqueue_err_count += num - i;
+	return i;
+}
+
+/*
+ * Dequeue one MLDTS operation.
+ * This may have been split over multiple descriptors.
+ */
+static inline int
+dequeue_mldts_one_op(struct rte_bbdev_queue_data *q_data,
+		struct acc_queue *q, struct rte_bbdev_mldts_op **ref_op,
+		uint16_t dequeued_ops, uint32_t *aq_dequeued)
+{
+	union acc_dma_desc *desc, atom_desc, *last_desc;
+	union acc_dma_rsp_desc rsp;
+	struct rte_bbdev_mldts_op *op;
+	uint8_t descs_in_op, i;
+
+	desc = acc_desc_tail(q, dequeued_ops);
+	atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, __ATOMIC_RELAXED);
+
+	/* Check fdone bit. */
+	if (!(atom_desc.rsp.val & ACC_FDONE))
+		return -1;
+
+	descs_in_op = desc->req.cbs_in_tb;
+	if (descs_in_op > 1) {
+		/* Get last CB. */
+		last_desc = q->ring_addr + ((q->sw_ring_tail + dequeued_ops + descs_in_op - 1)
+				& q->sw_ring_wrap_mask);
+		/* Check if last op is ready to dequeue by checking fdone bit. If not exit. */
+		atom_desc.atom_hdr = __atomic_load_n((uint64_t *)last_desc, __ATOMIC_RELAXED);
+		if (!(atom_desc.rsp.val & ACC_FDONE))
+			return -1;
+#ifdef RTE_LIBRTE_BBDEV_DEBUG
+		rte_memdump(stderr, "Last Resp", &last_desc->rsp.val, sizeof(desc->rsp.val));
+#endif
+		/* Check each operation iteratively using fdone. */
+		for (i = 1; i < descs_in_op - 1; i++) {
+			last_desc = q->ring_addr + ((q->sw_ring_tail + dequeued_ops + i)
+					& q->sw_ring_wrap_mask);
+			atom_desc.atom_hdr = __atomic_load_n((uint64_t *)last_desc,
+					__ATOMIC_RELAXED);
+			if (!(atom_desc.rsp.val & ACC_FDONE))
+				return -1;
+		}
+	}
+#ifdef RTE_LIBRTE_BBDEV_DEBUG
+	rte_memdump(stderr, "Resp", &desc->rsp.val, sizeof(desc->rsp.val));
+#endif
+	/* Dequeue. */
+	op = desc->req.op_addr;
+
+	/* Clearing status, it will be set based on response. */
+	op->status = 0;
+
+	for (i = 0; i < descs_in_op; i++) {
+		desc = q->ring_addr + ((q->sw_ring_tail + dequeued_ops + i) & q->sw_ring_wrap_mask);
+		atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, __ATOMIC_RELAXED);
+		rsp.val = atom_desc.rsp.val;
+		op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
+		op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
+		op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
+		op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
+	}
+
+	if (op->status != 0)
+		q_data->queue_stats.dequeue_err_count++;
+	if (op->status & (1 << RTE_BBDEV_DRV_ERROR))
+		vrb_check_ir(q->d);
+
+	/* Check if this is the last desc in batch (Atomic Queue). */
+	if (desc->req.last_desc_in_batch) {
+		(*aq_dequeued)++;
+		desc->req.last_desc_in_batch = 0;
+	}
+	desc->rsp.val = ACC_DMA_DESC_TYPE;
+	desc->rsp.add_info_0 = 0;
+	*ref_op = op;
+
+	return descs_in_op;
+}
+
+/* Dequeue MLDTS operations from VRB2 device. */
+static uint16_t
+vrb2_dequeue_mldts(struct rte_bbdev_queue_data *q_data,
+		struct rte_bbdev_mldts_op **ops, uint16_t num)
+{
+	struct acc_queue *q = q_data->queue_private;
+	uint16_t dequeue_num, i, dequeued_cbs = 0;
+	uint32_t avail = acc_ring_avail_deq(q);
+	uint32_t aq_dequeued = 0;
+	int ret;
+
+	dequeue_num = RTE_MIN(avail, num);
+
+	for (i = 0; i < dequeue_num; ++i) {
+		ret = dequeue_mldts_one_op(q_data, q, &ops[i], dequeued_cbs, &aq_dequeued);
+		if (ret <= 0)
+			break;
+		dequeued_cbs += ret;
+	}
+
+	q->aq_dequeued += aq_dequeued;
+	q->sw_ring_tail += dequeued_cbs;
+	/* Update enqueue stats. */
+	q_data->queue_stats.dequeued_count += i;
+	return i;
+}
+
 /* Initialization Function */
 static void
 vrb_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
@@ -4169,6 +4545,8 @@ vrb_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
 	dev->dequeue_ldpc_dec_ops = vrb_dequeue_ldpc_dec;
 	dev->enqueue_fft_ops = vrb_enqueue_fft;
 	dev->dequeue_fft_ops = vrb_dequeue_fft;
+	dev->enqueue_mldts_ops = vrb2_enqueue_mldts;
+	dev->dequeue_mldts_ops = vrb2_dequeue_mldts;
 
 	d->pf_device = !strcmp(drv->driver.name, RTE_STR(VRB_PF_DRIVER_NAME));
 	d->mmio_base = pci_dev->mem_resource[0].addr;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 11/12] baseband/acc: add support for VRB2 engine error detection
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
                   ` (9 preceding siblings ...)
  2023-09-29 16:35 ` [PATCH v3 10/12] baseband/acc: add MLD support in " Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-10-03 15:16   ` Maxime Coquelin
  2023-09-29 16:35 ` [PATCH v3 12/12] baseband/acc: add configure helper for VRB2 Nicolas Chautru
  11 siblings, 1 reply; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

Adding missing incremental functionality for the VRB2
variant. Notably detection of engine error during the
dequeue. Minor cosmetic edits.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 drivers/baseband/acc/rte_vrb_pmd.c  | 20 ++++++++++++--------
 drivers/baseband/acc/vrb1_pf_enum.h | 17 ++++++++++++-----
 2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index a9d3db86e6..3eb1a380fc 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -1504,6 +1504,7 @@ vrb_fcw_td_fill(const struct rte_bbdev_dec_op *op, struct acc_fcw_td *fcw)
 				fcw->ea = op->turbo_dec.cb_params.e;
 				fcw->eb = op->turbo_dec.cb_params.e;
 			}
+
 			if (op->turbo_dec.rv_index == 0)
 				fcw->k0_start_col = ACC_FCW_TD_RVIDX_0;
 			else if (op->turbo_dec.rv_index == 1)
@@ -2304,7 +2305,7 @@ enqueue_ldpc_enc_n_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ops,
 	return num;
 }
 
-/* Enqueue one encode operations for device for a partial TB
+/* Enqueue one encode operations for VRB1 device for a partial TB
  * all codes blocks have same configuration multiplexed on the same descriptor.
  */
 static inline void
@@ -2649,7 +2650,7 @@ enqueue_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
 	return 1;
 }
 
-/** Enqueue one decode operations for device in CB mode */
+/** Enqueue one decode operations for device in CB mode. */
 static inline int
 vrb_enqueue_ldpc_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
 		uint16_t total_enqueued_cbs, bool same_op)
@@ -2801,7 +2802,6 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
 		desc->req.data_ptrs[0].blen = ACC_FCW_LD_BLEN;
 		rte_memcpy(&desc->req.fcw_ld, &desc_first->req.fcw_ld, ACC_FCW_LD_BLEN);
 		desc->req.fcw_ld.tb_trailer_size = (c - r - 1) * trail_len;
-
 		if (q->d->device_variant == VRB1_VARIANT)
 			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
 					h_output, &in_offset, &h_out_offset,
@@ -3226,7 +3226,6 @@ vrb_enqueue_ldpc_dec_cb(struct rte_bbdev_queue_data *q_data,
 			break;
 		}
 		avail -= 1;
-
 		rte_bbdev_log(INFO, "Op %d %d %d %d %d %d %d %d %d %d %d %d\n",
 			i, ops[i]->ldpc_dec.op_flags, ops[i]->ldpc_dec.rv_index,
 			ops[i]->ldpc_dec.iter_max, ops[i]->ldpc_dec.iter_count,
@@ -3354,6 +3353,7 @@ vrb_dequeue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op,
 	op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
 	op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
 	op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
+	op->status |= ((rsp.engine_hung) ? (1 << RTE_BBDEV_ENGINE_ERROR) : 0);
 
 	if (desc->req.last_desc_in_batch) {
 		(*aq_dequeued)++;
@@ -3470,6 +3470,7 @@ vrb_dequeue_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op,
 		op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
 		op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
 		op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
+		op->status |= ((rsp.engine_hung) ? (1 << RTE_BBDEV_ENGINE_ERROR) : 0);
 
 		if (desc->req.last_desc_in_batch) {
 			(*aq_dequeued)++;
@@ -3516,6 +3517,8 @@ vrb_dequeue_dec_one_op_cb(struct rte_bbdev_queue_data *q_data,
 	op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
 	op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
 	op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
+	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
+
 	if (op->status != 0) {
 		/* These errors are not expected. */
 		q_data->queue_stats.dequeue_err_count++;
@@ -3569,6 +3572,7 @@ vrb_dequeue_ldpc_dec_one_op_cb(struct rte_bbdev_queue_data *q_data,
 	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
 	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
 	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
+	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
 	if (op->status != 0)
 		q_data->queue_stats.dequeue_err_count++;
 
@@ -3650,6 +3654,7 @@ vrb_dequeue_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op **ref_op,
 		op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
 		op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
 		op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
+		op->status |= ((rsp.engine_hung) ? (1 << RTE_BBDEV_ENGINE_ERROR) : 0);
 
 		if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK))
 			tb_crc_check ^= desc->rsp.add_info_1;
@@ -3701,7 +3706,6 @@ vrb_dequeue_enc(struct rte_bbdev_queue_data *q_data,
 	if (avail == 0)
 		return 0;
 	op = acc_op_tail(q, 0);
-
 	cbm = op->turbo_enc.code_block_mode;
 
 	for (i = 0; i < avail; i++) {
@@ -4041,9 +4045,8 @@ vrb_enqueue_fft_one_op(struct acc_queue *q, struct rte_bbdev_fft_op *op,
 				&in_offset, &out_offset, &win_offset, &pwr_offset);
 	}
 #ifdef RTE_LIBRTE_BBDEV_DEBUG
-	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
-			sizeof(desc->req.fcw_fft));
-	rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
+	rte_memdump(stderr, "FCW", fcw, 128);
+	rte_memdump(stderr, "Req Desc.", desc, 128);
 #endif
 	return 1;
 }
@@ -4116,6 +4119,7 @@ vrb_dequeue_fft_one_op(struct rte_bbdev_queue_data *q_data,
 	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
 	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
 	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
+	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
 	if (op->status != 0)
 		q_data->queue_stats.dequeue_err_count++;
 
diff --git a/drivers/baseband/acc/vrb1_pf_enum.h b/drivers/baseband/acc/vrb1_pf_enum.h
index 82a36685e9..6dc359800f 100644
--- a/drivers/baseband/acc/vrb1_pf_enum.h
+++ b/drivers/baseband/acc/vrb1_pf_enum.h
@@ -98,11 +98,18 @@ enum {
 	ACC_PF_INT_DMA_UL5G_DESC_IRQ = 8,
 	ACC_PF_INT_DMA_DL5G_DESC_IRQ = 9,
 	ACC_PF_INT_DMA_MLD_DESC_IRQ = 10,
-	ACC_PF_INT_ARAM_ECC_1BIT_ERR = 11,
-	ACC_PF_INT_PARITY_ERR = 12,
-	ACC_PF_INT_QMGR_ERR = 13,
-	ACC_PF_INT_INT_REQ_OVERFLOW = 14,
-	ACC_PF_INT_APB_TIMEOUT = 15,
+	ACC_PF_INT_ARAM_ACCESS_ERR = 11,
+	ACC_PF_INT_ARAM_ECC_1BIT_ERR = 12,
+	ACC_PF_INT_PARITY_ERR = 13,
+	ACC_PF_INT_QMGR_OVERFLOW = 14,
+	ACC_PF_INT_QMGR_ERR = 15,
+	ACC_PF_INT_ATS_ERR = 22,
+	ACC_PF_INT_ARAM_FUUL = 23,
+	ACC_PF_INT_EXTRA_READ = 24,
+	ACC_PF_INT_COMPLETION_TIMEOUT = 25,
+	ACC_PF_INT_CORE_HANG = 26,
+	ACC_PF_INT_DMA_HANG = 28,
+	ACC_PF_INT_DS_HANG = 27,
 };
 
 #endif /* VRB1_PF_ENUM_H */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 12/12] baseband/acc: add configure helper for VRB2
  2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
                   ` (10 preceding siblings ...)
  2023-09-29 16:35 ` [PATCH v3 11/12] baseband/acc: add support for VRB2 engine error detection Nicolas Chautru
@ 2023-09-29 16:35 ` Nicolas Chautru
  2023-10-03 15:30   ` Maxime Coquelin
  11 siblings, 1 reply; 42+ messages in thread
From: Nicolas Chautru @ 2023-09-29 16:35 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

This allows to configure the VRB2 device using a
companion configuration function within the DPDK
bbdev-test environment.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 drivers/baseband/acc/acc100_pmd.h     |   2 +
 drivers/baseband/acc/rte_acc100_pmd.c |   6 +-
 drivers/baseband/acc/rte_vrb_pmd.c    | 321 ++++++++++++++++++++++++++
 drivers/baseband/acc/vrb_cfg.h        |  16 ++
 4 files changed, 344 insertions(+), 1 deletion(-)

diff --git a/drivers/baseband/acc/acc100_pmd.h b/drivers/baseband/acc/acc100_pmd.h
index a48298650c..5a8965fa53 100644
--- a/drivers/baseband/acc/acc100_pmd.h
+++ b/drivers/baseband/acc/acc100_pmd.h
@@ -34,6 +34,8 @@
 #define ACC100_VENDOR_ID           (0x8086)
 #define ACC100_PF_DEVICE_ID        (0x0d5c)
 #define ACC100_VF_DEVICE_ID        (0x0d5d)
+#define VRB1_PF_DEVICE_ID          (0x57C0)
+#define VRB2_PF_DEVICE_ID          (0x57C2)
 
 /* Values used in writing to the registers */
 #define ACC100_REG_IRQ_EN_ALL          0x1FF83FF  /* Enable all interrupts */
diff --git a/drivers/baseband/acc/rte_acc100_pmd.c b/drivers/baseband/acc/rte_acc100_pmd.c
index 7f8d05b5a9..699a227d13 100644
--- a/drivers/baseband/acc/rte_acc100_pmd.c
+++ b/drivers/baseband/acc/rte_acc100_pmd.c
@@ -5187,6 +5187,10 @@ rte_acc_configure(const char *dev_name, struct rte_acc_conf *conf)
 		return acc100_configure(dev_name, conf);
 	else if (pci_dev->id.device_id == ACC101_PF_DEVICE_ID)
 		return acc101_configure(dev_name, conf);
-	else
+	else if (pci_dev->id.device_id == VRB1_PF_DEVICE_ID)
 		return vrb1_configure(dev_name, conf);
+	else if (pci_dev->id.device_id == VRB2_PF_DEVICE_ID)
+		return vrb2_configure(dev_name, conf);
+
+	return -ENXIO;
 }
diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index 3eb1a380fc..d0bc74b53f 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -5052,3 +5052,324 @@ vrb1_configure(const char *dev_name, struct rte_acc_conf *conf)
 	rte_bbdev_log_debug("PF Tip configuration complete for %s", dev_name);
 	return 0;
 }
+
+/* Initial configuration of a VRB2 device prior to running configure(). */
+int
+vrb2_configure(const char *dev_name, struct rte_acc_conf *conf)
+{
+	rte_bbdev_log(INFO, "vrb2_configure");
+	uint32_t value, address, status;
+	int qg_idx, template_idx, vf_idx, acc, i, aq_reg, static_allocation, numEngines;
+	int numQgs, numQqsAcc, totalQgs;
+	int qman_func_id[8] = {0, 2, 1, 3, 4, 5, 0, 0};
+	struct rte_bbdev *bbdev = rte_bbdev_get_named_dev(dev_name);
+	int rlim, alen, timestamp;
+
+	/* Compile time checks. */
+	RTE_BUILD_BUG_ON(sizeof(struct acc_dma_req_desc) != 256);
+	RTE_BUILD_BUG_ON(sizeof(union acc_dma_desc) != 256);
+	RTE_BUILD_BUG_ON(sizeof(struct acc_fcw_td) != 24);
+	RTE_BUILD_BUG_ON(sizeof(struct acc_fcw_te) != 32);
+
+	if (bbdev == NULL) {
+		rte_bbdev_log(ERR,
+		"Invalid dev_name (%s), or device is not yet initialised",
+		dev_name);
+		return -ENODEV;
+	}
+	struct acc_device *d = bbdev->data->dev_private;
+
+	/* Store configuration. */
+	rte_memcpy(&d->acc_conf, conf, sizeof(d->acc_conf));
+
+	/* Explicitly releasing AXI as this may be stopped after PF FLR/BME. */
+	address = VRB2_PfDmaAxiControl;
+	value = 1;
+	acc_reg_write(d, address, value);
+
+	/* Set the fabric mode. */
+	address = VRB2_PfFabricM2iBufferReg;
+	value = VRB2_FABRIC_MODE;
+	acc_reg_write(d, address, value);
+
+	/* Set default descriptor signature. */
+	address = VRB2_PfDmaDescriptorSignature;
+	value = 0;
+	acc_reg_write(d, address, value);
+
+	/* Enable the Error Detection in DMA. */
+	value = VRB2_CFG_DMA_ERROR;
+	address = VRB2_PfDmaErrorDetectionEn;
+	acc_reg_write(d, address, value);
+
+	/* AXI Cache configuration. */
+	value = VRB2_CFG_AXI_CACHE;
+	address = VRB2_PfDmaAxcacheReg;
+	acc_reg_write(d, address, value);
+
+	/* AXI Response configuration. */
+	acc_reg_write(d, VRB2_PfDmaCfgRrespBresp, 0x0);
+
+	/* Default DMA Configuration (Qmgr Enabled) */
+	acc_reg_write(d, VRB2_PfDmaConfig0Reg, 0);
+	acc_reg_write(d, VRB2_PfDmaQmanenSelect, 0xFFFFFFFF);
+	acc_reg_write(d, VRB2_PfDmaQmanen, 0);
+
+	/* Default RLIM/ALEN configuration. */
+	rlim = 0;
+	alen = 3;
+	timestamp = 0;
+	address = VRB2_PfDmaConfig1Reg;
+	value = (1 << 31) + (rlim << 8) + (timestamp << 6) + alen;
+	acc_reg_write(d, address, value);
+
+	/* Default FFT configuration. */
+	for (template_idx = 0; template_idx < VRB2_FFT_NUM; template_idx++) {
+		acc_reg_write(d, VRB2_PfFftConfig0 + template_idx * 0x1000, VRB2_FFT_CFG_0);
+		acc_reg_write(d, VRB2_PfFftParityMask8 + template_idx * 0x1000, VRB2_FFT_ECC);
+	}
+
+	/* Configure DMA Qmanager addresses. */
+	address = VRB2_PfDmaQmgrAddrReg;
+	value = VRB2_PfQmgrEgressQueuesTemplate;
+	acc_reg_write(d, address, value);
+
+	/* ===== Qmgr Configuration ===== */
+	/* Configuration of the AQueue Depth QMGR_GRP_0_DEPTH_LOG2 for UL. */
+	totalQgs = conf->q_ul_4g.num_qgroups + conf->q_ul_5g.num_qgroups +
+			conf->q_dl_4g.num_qgroups + conf->q_dl_5g.num_qgroups +
+			conf->q_fft.num_qgroups + conf->q_mld.num_qgroups;
+	for (qg_idx = 0; qg_idx < VRB2_NUM_QGRPS; qg_idx++) {
+		address = VRB2_PfQmgrDepthLog2Grp + ACC_BYTES_IN_WORD * qg_idx;
+		value = aqDepth(qg_idx, conf);
+		acc_reg_write(d, address, value);
+		address = VRB2_PfQmgrTholdGrp + ACC_BYTES_IN_WORD * qg_idx;
+		value = (1 << 16) + (1 << (aqDepth(qg_idx, conf) - 1));
+		acc_reg_write(d, address, value);
+	}
+
+	/* Template Priority in incremental order. */
+	for (template_idx = 0; template_idx < ACC_NUM_TMPL; template_idx++) {
+		address = VRB2_PfQmgrGrpTmplateReg0Indx + ACC_BYTES_IN_WORD * template_idx;
+		value = ACC_TMPL_PRI_0;
+		acc_reg_write(d, address, value);
+		address = VRB2_PfQmgrGrpTmplateReg1Indx + ACC_BYTES_IN_WORD * template_idx;
+		value = ACC_TMPL_PRI_1;
+		acc_reg_write(d, address, value);
+		address = VRB2_PfQmgrGrpTmplateReg2Indx + ACC_BYTES_IN_WORD * template_idx;
+		value = ACC_TMPL_PRI_2;
+		acc_reg_write(d, address, value);
+		address = VRB2_PfQmgrGrpTmplateReg3Indx + ACC_BYTES_IN_WORD * template_idx;
+		value = ACC_TMPL_PRI_3;
+		acc_reg_write(d, address, value);
+		address = VRB2_PfQmgrGrpTmplateReg4Indx + ACC_BYTES_IN_WORD * template_idx;
+		value = ACC_TMPL_PRI_4;
+		acc_reg_write(d, address, value);
+		address = VRB2_PfQmgrGrpTmplateReg5Indx + ACC_BYTES_IN_WORD * template_idx;
+		value = ACC_TMPL_PRI_5;
+		acc_reg_write(d, address, value);
+		address = VRB2_PfQmgrGrpTmplateReg6Indx + ACC_BYTES_IN_WORD * template_idx;
+		value = ACC_TMPL_PRI_6;
+		acc_reg_write(d, address, value);
+		address = VRB2_PfQmgrGrpTmplateReg7Indx + ACC_BYTES_IN_WORD * template_idx;
+		value = ACC_TMPL_PRI_7;
+		acc_reg_write(d, address, value);
+	}
+
+	address = VRB2_PfQmgrGrpPriority;
+	value = VRB2_CFG_QMGR_HI_P;
+	acc_reg_write(d, address, value);
+
+	/* Template Configuration. */
+	for (template_idx = 0; template_idx < ACC_NUM_TMPL; template_idx++) {
+		value = 0;
+		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
+		acc_reg_write(d, address, value);
+	}
+	/* 4GUL */
+	numQgs = conf->q_ul_4g.num_qgroups;
+	numQqsAcc = 0;
+	value = 0;
+	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
+		value |= (1 << qg_idx);
+	for (template_idx = VRB2_SIG_UL_4G; template_idx <= VRB2_SIG_UL_4G_LAST;
+			template_idx++) {
+		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
+		acc_reg_write(d, address, value);
+	}
+	/* 5GUL */
+	numQqsAcc += numQgs;
+	numQgs = conf->q_ul_5g.num_qgroups;
+	value = 0;
+	numEngines = 0;
+	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
+		value |= (1 << qg_idx);
+	for (template_idx = VRB2_SIG_UL_5G; template_idx <= VRB2_SIG_UL_5G_LAST;
+			template_idx++) {
+		/* Check engine power-on status. */
+		address = VRB2_PfFecUl5gIbDebug0Reg + ACC_ENGINE_OFFSET * template_idx;
+		status = (acc_reg_read(d, address) >> 4) & 0x7;
+		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
+		if (status == 1) {
+			acc_reg_write(d, address, value);
+			numEngines++;
+		} else
+			acc_reg_write(d, address, 0);
+	}
+	rte_bbdev_log(INFO, "Number of 5GUL engines %d", numEngines);
+	/* 4GDL */
+	numQqsAcc += numQgs;
+	numQgs	= conf->q_dl_4g.num_qgroups;
+	value = 0;
+	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
+		value |= (1 << qg_idx);
+	for (template_idx = VRB2_SIG_DL_4G; template_idx <= VRB2_SIG_DL_4G_LAST;
+			template_idx++) {
+		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
+		acc_reg_write(d, address, value);
+	}
+	/* 5GDL */
+	numQqsAcc += numQgs;
+	numQgs	= conf->q_dl_5g.num_qgroups;
+	value = 0;
+	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
+		value |= (1 << qg_idx);
+	for (template_idx = VRB2_SIG_DL_5G; template_idx <= VRB2_SIG_DL_5G_LAST;
+			template_idx++) {
+		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
+		acc_reg_write(d, address, value);
+	}
+	/* FFT */
+	numQqsAcc += numQgs;
+	numQgs	= conf->q_fft.num_qgroups;
+	value = 0;
+	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
+		value |= (1 << qg_idx);
+	for (template_idx = VRB2_SIG_FFT; template_idx <= VRB2_SIG_FFT_LAST;
+			template_idx++) {
+		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
+		acc_reg_write(d, address, value);
+	}
+	/* MLD */
+	numQqsAcc += numQgs;
+	numQgs	= conf->q_mld.num_qgroups;
+	value = 0;
+	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
+		value |= (1 << qg_idx);
+	for (template_idx = VRB2_SIG_MLD; template_idx <= VRB2_SIG_MLD_LAST;
+			template_idx++) {
+		address = VRB2_PfQmgrGrpTmplateEnRegIndx
+				+ ACC_BYTES_IN_WORD * template_idx;
+		acc_reg_write(d, address, value);
+	}
+
+	/* Queue Group Function mapping. */
+	for (i = 0; i < 4; i++) {
+		value = 0;
+		for (qg_idx = 0; qg_idx < ACC_NUM_QGRPS_PER_WORD; qg_idx++) {
+			acc = accFromQgid(qg_idx + i * ACC_NUM_QGRPS_PER_WORD, conf);
+			value |= qman_func_id[acc] << (qg_idx * 4);
+		}
+		acc_reg_write(d, VRB2_PfQmgrGrpFunction0 + i * ACC_BYTES_IN_WORD, value);
+	}
+
+	/* Configuration of the Arbitration QGroup depth to 1. */
+	for (qg_idx = 0; qg_idx < VRB2_NUM_QGRPS; qg_idx++) {
+		address = VRB2_PfQmgrArbQDepthGrp + ACC_BYTES_IN_WORD * qg_idx;
+		value = 0;
+		acc_reg_write(d, address, value);
+	}
+
+	static_allocation = 1;
+	if (static_allocation == 1) {
+		/* This pointer to ARAM (512kB) is shifted by 2 (4B per register). */
+		uint32_t aram_address = 0;
+		for (qg_idx = 0; qg_idx < totalQgs; qg_idx++) {
+			for (vf_idx = 0; vf_idx < conf->num_vf_bundles; vf_idx++) {
+				address = VRB2_PfQmgrVfBaseAddr + vf_idx
+						* ACC_BYTES_IN_WORD + qg_idx
+						* ACC_BYTES_IN_WORD * 64;
+				value = aram_address;
+				acc_reg_fast_write(d, address, value);
+				/* Offset ARAM Address for next memory bank  - increment of 4B. */
+				aram_address += aqNum(qg_idx, conf) *
+						(1 << aqDepth(qg_idx, conf));
+			}
+		}
+		if (aram_address > VRB2_WORDS_IN_ARAM_SIZE) {
+			rte_bbdev_log(ERR, "ARAM Configuration not fitting %d %d\n",
+					aram_address, VRB2_WORDS_IN_ARAM_SIZE);
+			return -EINVAL;
+		}
+	} else {
+		/* Dynamic Qmgr allocation. */
+		acc_reg_write(d, VRB2_PfQmgrAramAllocEn, 1);
+		acc_reg_write(d, VRB2_PfQmgrAramAllocSetupN0, 0x1000);
+		acc_reg_write(d, VRB2_PfQmgrAramAllocSetupN1, 0);
+		acc_reg_write(d, VRB2_PfQmgrAramAllocSetupN2, 0);
+		acc_reg_write(d, VRB2_PfQmgrAramAllocSetupN3, 0);
+		acc_reg_write(d, VRB2_PfQmgrSoftReset, 1);
+		acc_reg_write(d, VRB2_PfQmgrSoftReset, 0);
+	}
+
+	/* ==== HI Configuration ==== */
+
+	/* No Info Ring/MSI by default. */
+	address = VRB2_PfHiInfoRingIntWrEnRegPf;
+	value = 0;
+	acc_reg_write(d, address, value);
+	address = VRB2_PfHiCfgMsiIntWrEnRegPf;
+	value = 0xFFFFFFFF;
+	acc_reg_write(d, address, value);
+	/* Prevent Block on Transmit Error. */
+	address = VRB2_PfHiBlockTransmitOnErrorEn;
+	value = 0;
+	acc_reg_write(d, address, value);
+	/* Prevents to drop MSI */
+	address = VRB2_PfHiMsiDropEnableReg;
+	value = 0;
+	acc_reg_write(d, address, value);
+	/* Set the PF Mode register */
+	address = VRB2_PfHiPfMode;
+	value = ((conf->pf_mode_en) ? ACC_PF_VAL : 0) | 0x1F07F0;
+	acc_reg_write(d, address, value);
+	/* Explicitly releasing AXI after PF Mode. */
+	acc_reg_write(d, VRB2_PfDmaAxiControl, 1);
+
+	/* QoS overflow init. */
+	value = 1;
+	address = VRB2_PfQosmonAEvalOverflow0;
+	acc_reg_write(d, address, value);
+	address = VRB2_PfQosmonBEvalOverflow0;
+	acc_reg_write(d, address, value);
+
+	/* Enabling AQueues through the Queue hierarchy. */
+	unsigned int  en_bitmask[VRB2_AQ_REG_NUM];
+	for (vf_idx = 0; vf_idx < VRB2_NUM_VFS; vf_idx++) {
+		for (qg_idx = 0; qg_idx < VRB2_NUM_QGRPS; qg_idx++) {
+			for (aq_reg = 0;  aq_reg < VRB2_AQ_REG_NUM; aq_reg++)
+				en_bitmask[aq_reg] = 0;
+			if (vf_idx < conf->num_vf_bundles && qg_idx < totalQgs) {
+				for (aq_reg = 0;  aq_reg < VRB2_AQ_REG_NUM; aq_reg++) {
+					if (aqNum(qg_idx, conf) >= 16 * (aq_reg + 1))
+						en_bitmask[aq_reg] = 0xFFFF;
+					else if (aqNum(qg_idx, conf) <= 16 * aq_reg)
+						en_bitmask[aq_reg] = 0x0;
+					else
+						en_bitmask[aq_reg] = (1 << (aqNum(qg_idx,
+								conf) - aq_reg * 16)) - 1;
+				}
+			}
+			for (aq_reg = 0; aq_reg < VRB2_AQ_REG_NUM; aq_reg++) {
+				address = VRB2_PfQmgrAqEnableVf + vf_idx * 16 + aq_reg * 4;
+				value = (qg_idx << 16) + en_bitmask[aq_reg];
+				acc_reg_fast_write(d, address, value);
+			}
+		}
+	}
+
+	rte_bbdev_log(INFO,
+			"VRB2 basic config complete for %s - pf_bb_config should ideally be used instead",
+			dev_name);
+	return 0;
+}
diff --git a/drivers/baseband/acc/vrb_cfg.h b/drivers/baseband/acc/vrb_cfg.h
index e3c8902b46..79487c4e04 100644
--- a/drivers/baseband/acc/vrb_cfg.h
+++ b/drivers/baseband/acc/vrb_cfg.h
@@ -29,4 +29,20 @@
 int
 vrb1_configure(const char *dev_name, struct rte_acc_conf *conf);
 
+/**
+ * Configure a VRB2 device.
+ *
+ * @param dev_name
+ *   The name of the device. This is the short form of PCI BDF, e.g. 00:01.0.
+ *   It can also be retrieved for a bbdev device from the dev_name field in the
+ *   rte_bbdev_info structure returned by rte_bbdev_info_get().
+ * @param conf
+ *   Configuration to apply to VRB2 HW.
+ *
+ * @return
+ *   Zero on success, negative value on failure.
+ */
+int
+vrb2_configure(const char *dev_name, struct rte_acc_conf *conf);
+
 #endif /* _VRB_CFG_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 02/12] baseband/acc: add FFT window width in the VRB PMD
  2023-09-29 16:35 ` [PATCH v3 02/12] baseband/acc: add FFT window width in the VRB PMD Nicolas Chautru
@ 2023-10-03 11:52   ` Maxime Coquelin
  2023-10-03 19:06     ` Chautru, Nicolas
  0 siblings, 1 reply; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-03 11:52 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas



On 9/29/23 18:35, Nicolas Chautru wrote:
> This allows to expose the FFT window width being introduced in
> previous commit based on what is configured dynamically on the
> device platform.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>   drivers/baseband/acc/acc_common.h  |  3 +++
>   drivers/baseband/acc/rte_vrb_pmd.c | 29 +++++++++++++++++++++++++++++
>   2 files changed, 32 insertions(+)
> 
> diff --git a/drivers/baseband/acc/acc_common.h b/drivers/baseband/acc/acc_common.h
> index 5bb00746c3..7d24c644c0 100644
> --- a/drivers/baseband/acc/acc_common.h
> +++ b/drivers/baseband/acc/acc_common.h
> @@ -512,6 +512,8 @@ struct acc_deq_intr_details {
>   enum {
>   	ACC_VF2PF_STATUS_REQUEST = 1,
>   	ACC_VF2PF_USING_VF = 2,
> +	ACC_VF2PF_LUT_VER_REQUEST = 3,
> +	ACC_VF2PF_FFT_WIN_REQUEST = 4,
>   };
>   
>   
> @@ -558,6 +560,7 @@ struct acc_device {
>   	queue_offset_fun_t queue_offset;  /* Device specific queue offset */
>   	uint16_t num_qgroups;
>   	uint16_t num_aqs;
> +	uint16_t fft_window_width[RTE_BBDEV_MAX_FFT_WIN]; /* FFT windowing width. */
>   };
>   
>   /* Structure associated with each queue. */
> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
> index 9e5a73c9c7..c5a74bae11 100644
> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> @@ -298,6 +298,34 @@ vrb_device_status(struct rte_bbdev *dev)
>   	return reg;
>   }
>   
> +/* Request device FFT windowing information. */
> +static inline void
> +vrb_device_fft_win(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
> +{
> +	struct acc_device *d = dev->data->dev_private;
> +	uint32_t reg, time_out = 0, win;
> +
> +	if (d->pf_device)
> +		return;
> +
> +	/* Check from the device the first time. */
> +	if (d->fft_window_width[0] == 0) {

O is not a possible value? Might not be a big deal as it would just
perform reading of 16 x 2 registers every time .info_get() is called,
just wondering.

> +		for (win = 0; win < RTE_BBDEV_MAX_FFT_WIN; win++) {
> +			vrb_vf2pf(d, ACC_VF2PF_FFT_WIN_REQUEST | win);

That looks broken, as extending RTE_BBDEV_MAX_FFT_WIN to support other
devices may break this implementation.

To me, it tends to show how this fft_window_width array is more device
specific than generic.

> +			reg = acc_reg_read(d, d->reg_addr->pf2vf_doorbell);
> +			while ((time_out < ACC_STATUS_TO) && (reg == RTE_BBDEV_DEV_NOSTATUS)) {
> +				usleep(ACC_STATUS_WAIT); /*< Wait or VF->PF->VF Comms */
> +				reg = acc_reg_read(d, d->reg_addr->pf2vf_doorbell);
> +				time_out++;
> +			}
> +			d->fft_window_width[win] = reg;
> +		}
> +	}
> +
> +	for (win = 0; win < RTE_BBDEV_MAX_FFT_WIN; win++)
> +		dev_info->fft_window_width[win] = d->fft_window_width[win];
> +}
> +
>   /* Checks PF Info Ring to find the interrupt cause and handles it accordingly. */
>   static inline void
>   vrb_check_ir(struct acc_device *acc_dev)
> @@ -1100,6 +1128,7 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
>   	fetch_acc_config(dev);
>   	/* Check the status of device. */
>   	dev_info->device_status = vrb_device_status(dev);
> +	vrb_device_fft_win(dev, dev_info);
>   
>   	/* Exposed number of queues. */
>   	dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 03/12] baseband/acc: remove the 4G SO capability for VRB1
  2023-09-29 16:35 ` [PATCH v3 03/12] baseband/acc: remove the 4G SO capability for VRB1 Nicolas Chautru
@ 2023-10-03 12:04   ` Maxime Coquelin
  0 siblings, 0 replies; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-03 12:04 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas



On 9/29/23 18:35, Nicolas Chautru wrote:
> This removes the specific capability and support of LTE Decoder
> Soft Output option on the VRB1 PMD.
> 
> This is triggered as a vendor decision to defeature the related optional
> capability so that to avoid theoretical risk of race conditions
> impacting the device reliability. That optional APP LLR output is
> not impacting the actual decoder hard output.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>   doc/guides/bbdevs/vrb1.rst         |  4 ----
>   drivers/baseband/acc/rte_vrb_pmd.c | 10 ++++++----
>   2 files changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/doc/guides/bbdevs/vrb1.rst b/doc/guides/bbdevs/vrb1.rst
> index 9c48d30964..fdefb20651 100644
> --- a/doc/guides/bbdevs/vrb1.rst
> +++ b/doc/guides/bbdevs/vrb1.rst
> @@ -71,11 +71,7 @@ The Intel vRAN Boost v1.0 PMD supports the following bbdev capabilities:
>      - ``RTE_BBDEV_TURBO_EARLY_TERMINATION``: set early termination feature.
>      - ``RTE_BBDEV_TURBO_DEC_SCATTER_GATHER``: supports scatter-gather for input/output data.
>      - ``RTE_BBDEV_TURBO_HALF_ITERATION_EVEN``: set half iteration granularity.
> -   - ``RTE_BBDEV_TURBO_SOFT_OUTPUT``: set the APP LLR soft output.
> -   - ``RTE_BBDEV_TURBO_EQUALIZER``: set the turbo equalizer feature.
> -   - ``RTE_BBDEV_TURBO_SOFT_OUT_SATURATE``: set the soft output saturation.
>      - ``RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH``: set to run an extra odd iteration after CRC match.
> -   - ``RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT``: set if negative APP LLR output supported.
>      - ``RTE_BBDEV_TURBO_MAP_DEC``: supports flexible parallel MAP engine decoding.
>   
>   * For the FFT operation:
> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
> index c5a74bae11..f11882f90e 100644
> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> @@ -1025,15 +1025,11 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
>   					RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE |
>   					RTE_BBDEV_TURBO_CRC_TYPE_24B |
>   					RTE_BBDEV_TURBO_DEC_CRC_24B_DROP |
> -					RTE_BBDEV_TURBO_EQUALIZER |
> -					RTE_BBDEV_TURBO_SOFT_OUT_SATURATE |
>   					RTE_BBDEV_TURBO_HALF_ITERATION_EVEN |
>   					RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH |
> -					RTE_BBDEV_TURBO_SOFT_OUTPUT |
>   					RTE_BBDEV_TURBO_EARLY_TERMINATION |
>   					RTE_BBDEV_TURBO_DEC_INTERRUPTS |
>   					RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN |
> -					RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT |
>   					RTE_BBDEV_TURBO_MAP_DEC |
>   					RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP |
>   					RTE_BBDEV_TURBO_DEC_SCATTER_GATHER,
> @@ -1982,6 +1978,12 @@ enqueue_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
>   	struct rte_mbuf *input, *h_output_head, *h_output,
>   		*s_output_head, *s_output;
>   
> +	if ((q->d->device_variant == VRB1_VARIANT) &&
> +			(check_bit(op->turbo_dec.op_flags, RTE_BBDEV_TURBO_SOFT_OUTPUT))) {
> +		/* SO not supported for VRB1. */
> +		return -EPERM;
> +	}
> +

A better option would be to have a pointer on the device capabilities in
the acc_device struct, doing so would be more future-proof. Maybe it
could be considered?

But in the mean time, it addresses this specific issue:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

>   	desc = acc_desc(q, total_enqueued_cbs);
>   	vrb_fcw_td_fill(op, &desc->req.fcw_td);
>   


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 04/12] baseband/acc: allocate FCW memory separately
  2023-09-29 16:35 ` [PATCH v3 04/12] baseband/acc: allocate FCW memory separately Nicolas Chautru
@ 2023-10-03 12:51   ` Maxime Coquelin
  0 siblings, 0 replies; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-03 12:51 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas



On 9/29/23 18:35, Nicolas Chautru wrote:
> This allows more flexibility to the FCW size for the
> unified driver. No actual functional change.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>   drivers/baseband/acc/acc_common.h  |  4 +++-
>   drivers/baseband/acc/rte_vrb_pmd.c | 25 ++++++++++++++++++++++++-
>   2 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/baseband/acc/acc_common.h b/drivers/baseband/acc/acc_common.h
> index 7d24c644c0..2c7425e524 100644
> --- a/drivers/baseband/acc/acc_common.h
> +++ b/drivers/baseband/acc/acc_common.h
> @@ -101,6 +101,7 @@
>   #define ACC_NUM_QGRPS_PER_WORD         8
>   #define ACC_MAX_NUM_QGRPS              32
>   #define ACC_RING_SIZE_GRANULARITY      64
> +#define ACC_MAX_FCW_SIZE              128
>   
>   /* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2 */
>   #define ACC_N_ZC_1 66 /* N = 66 Zc for BG 1 */
> @@ -584,13 +585,14 @@ struct __rte_cache_aligned acc_queue {
>   	uint32_t aq_enqueued;  /* Count how many "batches" have been enqueued */
>   	uint32_t aq_dequeued;  /* Count how many "batches" have been dequeued */
>   	uint32_t irq_enable;  /* Enable ops dequeue interrupts if set to 1 */
> -	struct rte_mempool *fcw_mempool;  /* FCW mempool */
>   	enum rte_bbdev_op_type op_type;  /* Type of this Queue: TE or TD */
>   	/* Internal Buffers for loopback input */
>   	uint8_t *lb_in;
>   	uint8_t *lb_out;
> +	uint8_t *fcw_ring;
>   	rte_iova_t lb_in_addr_iova;
>   	rte_iova_t lb_out_addr_iova;
> +	rte_iova_t fcw_ring_addr_iova;
>   	int8_t *derm_buffer; /* interim buffer for de-rm in SDK */
>   	struct acc_device *d;
>   };
> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
> index f11882f90e..cf0551c0c7 100644
> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> @@ -890,6 +890,25 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t queue_id,
>   		goto free_companion_ring_addr;
>   	}
>   
> +	q->fcw_ring = rte_zmalloc_socket(dev->device->driver->name,
> +			ACC_MAX_FCW_SIZE * d->sw_ring_max_depth,
> +			RTE_CACHE_LINE_SIZE, conf->socket);
> +	if (q->fcw_ring == NULL) {
> +		rte_bbdev_log(ERR, "Failed to allocate fcw_ring memory");
> +		ret = -ENOMEM;
> +		goto free_companion_ring_addr;
> +	}
> +	q->fcw_ring_addr_iova = rte_malloc_virt2iova(q->fcw_ring);
> +
> +	/* For FFT we need to store the FCW separately */
> +	if (conf->op_type == RTE_BBDEV_OP_FFT) {
> +		for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) {
> +			desc = q->ring_addr + desc_idx;
> +			desc->req.data_ptrs[0].address = q->fcw_ring_addr_iova +
> +					desc_idx * ACC_MAX_FCW_SIZE;
> +		}
> +	}
> +
>   	q->qgrp_id = (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF;
>   	q->vf_id = (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F;
>   	q->aq_id = q_idx & 0xF;
> @@ -1001,6 +1020,7 @@ vrb_queue_release(struct rte_bbdev *dev, uint16_t q_id)
>   	if (q != NULL) {
>   		/* Mark the Queue as un-assigned. */
>   		d->q_assigned_bit_map[q->qgrp_id] &= (~0ULL - (1 << (uint64_t) q->aq_id));
> +		rte_free(q->fcw_ring);
>   		rte_free(q->companion_ring_addr);
>   		rte_free(q->lb_in);
>   		rte_free(q->lb_out);
> @@ -3234,7 +3254,10 @@ vrb_enqueue_fft_one_op(struct acc_queue *q, struct rte_bbdev_fft_op *op,
>   	output = op->fft.base_output.data;
>   	in_offset = op->fft.base_input.offset;
>   	out_offset = op->fft.base_output.offset;
> -	fcw = &desc->req.fcw_fft;
> +
> +	fcw = (struct acc_fcw_fft *) (q->fcw_ring +
> +			((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask)
> +			* ACC_MAX_FCW_SIZE);
>   
>   	vrb1_fcw_fft_fill(op, fcw);
>   	vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset, &out_offset);

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 06/12] baseband/acc: refactor to allow unified driver extension
  2023-09-29 16:35 ` [PATCH v3 06/12] baseband/acc: refactor to allow unified driver extension Nicolas Chautru
@ 2023-10-03 13:14   ` Maxime Coquelin
  2023-10-03 18:54     ` Chautru, Nicolas
  0 siblings, 1 reply; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-03 13:14 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas

Thanks for doing the split, that will ease review.

On 9/29/23 18:35, Nicolas Chautru wrote:
> Adding a few functions and common code prior to
> extending the VRB driver.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>   drivers/baseband/acc/acc_common.h     | 164 +++++++++++++++++++++++---
>   drivers/baseband/acc/rte_acc100_pmd.c |   4 +-
>   drivers/baseband/acc/rte_vrb_pmd.c    |  62 +++++-----
>   3 files changed, 184 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/baseband/acc/acc_common.h b/drivers/baseband/acc/acc_common.h
> index 788abf1a3c..89893eae43 100644
> --- a/drivers/baseband/acc/acc_common.h
> +++ b/drivers/baseband/acc/acc_common.h
> @@ -18,6 +18,7 @@
>   #define ACC_DMA_BLKID_OUT_HARQ      3
>   #define ACC_DMA_BLKID_IN_HARQ       3
>   #define ACC_DMA_BLKID_IN_MLD_R      3
> +#define ACC_DMA_BLKID_DEWIN_IN      3
>   
>   /* Values used in filling in decode FCWs */
>   #define ACC_FCW_TD_VER              1
> @@ -103,6 +104,9 @@
>   #define ACC_MAX_NUM_QGRPS              32
>   #define ACC_RING_SIZE_GRANULARITY      64
>   #define ACC_MAX_FCW_SIZE              128
> +#define ACC_IQ_SIZE                    4
> +
> +#define ACC_FCW_FFT_BLEN_3             28
>   
>   /* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2 */
>   #define ACC_N_ZC_1 66 /* N = 66 Zc for BG 1 */
> @@ -132,6 +136,17 @@
>   #define ACC_LIM_21 14 /* 0.21 */
>   #define ACC_LIM_31 20 /* 0.31 */
>   #define ACC_MAX_E (128 * 1024 - 2)
> +#define ACC_MAX_CS 12
> +
> +#define ACC100_VARIANT          0
> +#define VRB1_VARIANT		2
> +#define VRB2_VARIANT		3
> +
> +/* Queue Index Hierarchy */
> +#define VRB1_GRP_ID_SHIFT    10
> +#define VRB1_VF_ID_SHIFT     4
> +#define VRB2_GRP_ID_SHIFT    12
> +#define VRB2_VF_ID_SHIFT     6
>   
>   /* Helper macro for logging */
>   #define rte_acc_log(level, fmt, ...) \
> @@ -332,6 +347,37 @@ struct __rte_packed acc_fcw_fft {
>   		res:19;
>   };
>   
> +/* FFT Frame Control Word. */
> +struct __rte_packed acc_fcw_fft_3 {
> +	uint32_t in_frame_size:16,
> +		leading_pad_size:16;
> +	uint32_t out_frame_size:16,
> +		leading_depad_size:16;
> +	uint32_t cs_window_sel;
> +	uint32_t cs_window_sel2:16,
> +		cs_enable_bmap:16;
> +	uint32_t num_antennas:8,
> +		idft_size:8,
> +		dft_size:8,
> +		cs_offset:8;
> +	uint32_t idft_shift:8,
> +		dft_shift:8,
> +		cs_multiplier:16;
> +	uint32_t bypass:2,
> +		fp16_in:1,
> +		fp16_out:1,
> +		exp_adj:4,
> +		power_shift:4,
> +		power_en:1,
> +		enable_dewin:1,
> +		freq_resample_mode:2,
> +		depad_output_size:16;
> +	uint16_t cs_theta_0[ACC_MAX_CS];
> +	uint32_t cs_theta_d[ACC_MAX_CS];
> +	int8_t cs_time_offset[ACC_MAX_CS];
> +};
> +
> +
>   /* MLD-TS Frame Control Word */
>   struct __rte_packed acc_fcw_mldts {
>   	uint32_t fcw_version:4,
> @@ -473,14 +519,14 @@ union acc_info_ring_data {
>   		uint16_t valid: 1;
>   	};
>   	struct {
> -		uint32_t aq_id_3: 6;
> -		uint32_t qg_id_3: 5;
> -		uint32_t vf_id_3: 6;
> -		uint32_t int_nb_3: 6;
> -		uint32_t msi_0_3: 1;
> -		uint32_t vf2pf_3: 6;
> -		uint32_t loop_3: 1;
> -		uint32_t valid_3: 1;
> +		uint32_t aq_id_vrb2: 6;
> +		uint32_t qg_id_vrb2: 5;
> +		uint32_t vf_id_vrb2: 6;
> +		uint32_t int_nb_vrb2: 6;
> +		uint32_t msi_0_vrb2: 1;
> +		uint32_t vf2pf_vrb2: 6;
> +		uint32_t loop_vrb2: 1;
> +		uint32_t valid_vrb2: 1;
>   	};
>   } __rte_packed;
>   
> @@ -761,22 +807,105 @@ alloc_sw_rings_min_mem(struct rte_bbdev *dev, struct acc_device *d,
>   	free_base_addresses(base_addrs, i);
>   }
>   
> +/* Wrapper to provide VF index from ring data. */
> +static inline uint16_t
> +vf_from_ring(const union acc_info_ring_data ring_data, uint16_t device_variant) {

curly braces on a new line.

> +	if (device_variant == VRB2_VARIANT)
> +		return ring_data.vf_id_vrb2;
> +	else
> +		return ring_data.vf_id;
> +}
> +
> +/* Wrapper to provide QG index from ring data. */
> +static inline uint16_t
> +qg_from_ring(const union acc_info_ring_data ring_data, uint16_t device_variant) {
> +	if (device_variant == VRB2_VARIANT)
> +		return ring_data.qg_id_vrb2;
> +	else
> +		return ring_data.qg_id;
> +}
> +
> +/* Wrapper to provide AQ index from ring data. */
> +static inline uint16_t
> +aq_from_ring(const union acc_info_ring_data ring_data, uint16_t device_variant) {
> +	if (device_variant == VRB2_VARIANT)
> +		return ring_data.aq_id_vrb2;
> +	else
> +		return ring_data.aq_id;
> +}
> +
> +/* Wrapper to provide int index from ring data. */
> +static inline uint16_t
> +int_from_ring(const union acc_info_ring_data ring_data, uint16_t device_variant) {
> +	if (device_variant == VRB2_VARIANT)
> +		return ring_data.int_nb_vrb2;
> +	else
> +		return ring_data.int_nb;
> +}
> +
> +/* Wrapper to provide queue index from group and aq index. */
> +static inline int
> +queue_index(uint16_t group_idx, uint16_t aq_idx, uint16_t device_variant) {
> +	if (device_variant == VRB2_VARIANT)
> +		return (group_idx << VRB2_GRP_ID_SHIFT) + aq_idx;
> +	else
> +		return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx;
> +}
> +
> +/* Wrapper to provide queue group from queue index. */
> +static inline int
> +qg_from_q(uint32_t q_idx, uint16_t device_variant) {
> +	if (device_variant == VRB2_VARIANT)
> +		return (q_idx >> VRB2_GRP_ID_SHIFT) & 0x1F;
> +	else
> +		return (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF;
> +}
> +
> +/* Wrapper to provide vf from queue index. */
> +static inline int32_t
> +vf_from_q(uint32_t q_idx, uint16_t device_variant) {
> +	if (device_variant == VRB2_VARIANT)
> +		return (q_idx >> VRB2_VF_ID_SHIFT)  & 0x3F;
> +	else
> +		return (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F;
> +}
> +
> +/* Wrapper to provide aq index from queue index. */
> +static inline int32_t
> +aq_from_q(uint32_t q_idx, uint16_t device_variant) {
> +	if (device_variant == VRB2_VARIANT)
> +		return q_idx & 0x3F;
> +	else
> +		return q_idx & 0xF;
> +}
> +
> +/* Wrapper to set VF index in ring data. */
> +static inline int32_t
> +set_vf_in_ring(volatile union acc_info_ring_data *ring_data,
> +		uint16_t device_variant, uint16_t value) {
> +	if (device_variant == VRB2_VARIANT)
> +		return ring_data->vf_id_vrb2 = value;
> +	else
> +		return ring_data->vf_id = value;
> +}
> +
>   /*
>    * Find queue_id of a device queue based on details from the Info Ring.
>    * If a queue isn't found UINT16_MAX is returned.
>    */
>   static inline uint16_t
>   get_queue_id_from_ring_info(struct rte_bbdev_data *data,
> -		const union acc_info_ring_data ring_data)
> +		const union acc_info_ring_data ring_data, uint16_t device_variant)

As I suggested on v2:

get_queue_id_from_ring_info(struct rte_bbdev_data *data,
	const union acc_info_ring_data ring_data)
{
	struct acc_device *d = data->dev_private;

	...

	if (acc_q != NULL && acc_q->aq_id == aq_from_ring(d, ring_data) &&
...

}

with

/* Wrapper to provide AQ index from ring data. */
tatic inline uint16_t
aq_from_ring(struct acc_device *d, const union acc_info_ring_data ring_data)
{
	if (d->device_variant == VRB2_VARIANT)
		return ring_data.aq_id_vrb2;
	else
		return ring_data.aq_id;
}

>   {
>   	uint16_t queue_id;
> +	struct acc_queue *acc_q;
>   
>   	for (queue_id = 0; queue_id < data->num_queues; ++queue_id) {
> -		struct acc_queue *acc_q =
> -				data->queues[queue_id].queue_private;
> -		if (acc_q != NULL && acc_q->aq_id == ring_data.aq_id &&
> -				acc_q->qgrp_id == ring_data.qg_id &&
> -				acc_q->vf_id == ring_data.vf_id)
> +		acc_q = data->queues[queue_id].queue_private;
> +
> +		if (acc_q != NULL && acc_q->aq_id == aq_from_ring(ring_data, device_variant) &&
> +				acc_q->qgrp_id == qg_from_ring(ring_data, device_variant) &&
> +				acc_q->vf_id == vf_from_ring(ring_data, device_variant))
>   			return queue_id;
>   	}
>   
> @@ -1438,4 +1567,11 @@ get_num_cbs_in_tb_ldpc_enc(struct rte_bbdev_op_ldpc_enc *ldpc_enc)
>   	return cbs_in_tb;
>   }
>   
> +static inline void
> +acc_reg_fast_write(struct acc_device *d, uint32_t offset, uint32_t value)
> +{
> +	void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset);
> +	mmio_write(reg_addr, value);
> +}
> +
>   #endif /* _ACC_COMMON_H_ */
> diff --git a/drivers/baseband/acc/rte_acc100_pmd.c b/drivers/baseband/acc/rte_acc100_pmd.c
> index 5362d39c30..7f8d05b5a9 100644
> --- a/drivers/baseband/acc/rte_acc100_pmd.c
> +++ b/drivers/baseband/acc/rte_acc100_pmd.c
> @@ -294,7 +294,7 @@ acc100_pf_interrupt_handler(struct rte_bbdev *dev)
>   		case ACC100_PF_INT_DMA_UL5G_DESC_IRQ:
>   		case ACC100_PF_INT_DMA_DL5G_DESC_IRQ:
>   			deq_intr_det.queue_id = get_queue_id_from_ring_info(
> -					dev->data, *ring_data);
> +					dev->data, *ring_data, acc100_dev->device_variant);
>   			if (deq_intr_det.queue_id == UINT16_MAX) {
>   				rte_bbdev_log(ERR,
>   						"Couldn't find queue: aq_id: %u, qg_id: %u, vf_id: %u",
> @@ -348,7 +348,7 @@ acc100_vf_interrupt_handler(struct rte_bbdev *dev)
>   			 */
>   			ring_data->vf_id = 0;
>   			deq_intr_det.queue_id = get_queue_id_from_ring_info(
> -					dev->data, *ring_data);
> +					dev->data, *ring_data, acc100_dev->device_variant);
>   			if (deq_intr_det.queue_id == UINT16_MAX) {
>   				rte_bbdev_log(ERR,
>   						"Couldn't find queue: aq_id: %u, qg_id: %u",
> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
> index a1de012b40..c89c26c59a 100644
> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> @@ -341,17 +341,18 @@ static inline void
>   vrb_check_ir(struct acc_device *acc_dev)
>   {
>   	volatile union acc_info_ring_data *ring_data;
> -	uint16_t info_ring_head = acc_dev->info_ring_head;
> +	uint16_t info_ring_head = acc_dev->info_ring_head, int_nb;
>   	if (unlikely(acc_dev->info_ring == NULL))
>   		return;
>   
>   	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head & ACC_INFO_RING_MASK);
>   
>   	while (ring_data->valid) {
> -		if ((ring_data->int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
> -				ring_data->int_nb > ACC_PF_INT_DMA_MLD_DESC_IRQ)) {
> +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
> +		if ((int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
> +				int_nb > ACC_PF_INT_DMA_MLD_DESC_IRQ)) {
>   			rte_bbdev_log(WARNING, "InfoRing: ITR:%d Info:0x%x",
> -					ring_data->int_nb, ring_data->detailed_info);
> +					int_nb, ring_data->detailed_info);
>   			/* Initialize Info Ring entry and move forward. */
>   			ring_data->val = 0;
>   		}
> @@ -368,16 +369,21 @@ vrb_dev_interrupt_handler(void *cb_arg)
>   	struct acc_device *acc_dev = dev->data->dev_private;
>   	volatile union acc_info_ring_data *ring_data;
>   	struct acc_deq_intr_details deq_intr_det;
> +	uint16_t vf_id, aq_id, qg_id, int_nb;
>   
>   	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head & ACC_INFO_RING_MASK);
>   
>   	while (ring_data->valid) {
> +		vf_id = vf_from_ring(*ring_data, acc_dev->device_variant);
> +		aq_id = aq_from_ring(*ring_data, acc_dev->device_variant);
> +		qg_id = qg_from_ring(*ring_data, acc_dev->device_variant);
> +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
>   		if (acc_dev->pf_device) {
>   			rte_bbdev_log_debug(
> -					"VRB1 PF Interrupt received, Info Ring data: 0x%x -> %d",
> -					ring_data->val, ring_data->int_nb);
> +					"PF Interrupt received, Info Ring data: 0x%x -> %d",
> +					ring_data->val, int_nb);
>   
> -			switch (ring_data->int_nb) {
> +			switch (int_nb) {
>   			case ACC_PF_INT_DMA_DL_DESC_IRQ:
>   			case ACC_PF_INT_DMA_UL_DESC_IRQ:
>   			case ACC_PF_INT_DMA_FFT_DESC_IRQ:
> @@ -385,13 +391,11 @@ vrb_dev_interrupt_handler(void *cb_arg)
>   			case ACC_PF_INT_DMA_DL5G_DESC_IRQ:
>   			case ACC_PF_INT_DMA_MLD_DESC_IRQ:
>   				deq_intr_det.queue_id = get_queue_id_from_ring_info(
> -						dev->data, *ring_data);
> +						dev->data, *ring_data, acc_dev->device_variant);
>   				if (deq_intr_det.queue_id == UINT16_MAX) {
>   					rte_bbdev_log(ERR,
>   							"Couldn't find queue: aq_id: %u, qg_id: %u, vf_id: %u",
> -							ring_data->aq_id,
> -							ring_data->qg_id,
> -							ring_data->vf_id);
> +							aq_id, qg_id, vf_id);
>   					return;
>   				}
>   				rte_bbdev_pmd_callback_process(dev,
> @@ -403,9 +407,9 @@ vrb_dev_interrupt_handler(void *cb_arg)
>   			}
>   		} else {
>   			rte_bbdev_log_debug(
> -					"VRB1 VF Interrupt received, Info Ring data: 0x%x\n",
> +					"VRB VF Interrupt received, Info Ring data: 0x%x\n",
>   					ring_data->val);
> -			switch (ring_data->int_nb) {
> +			switch (int_nb) {
>   			case ACC_VF_INT_DMA_DL_DESC_IRQ:
>   			case ACC_VF_INT_DMA_UL_DESC_IRQ:
>   			case ACC_VF_INT_DMA_FFT_DESC_IRQ:
> @@ -413,14 +417,13 @@ vrb_dev_interrupt_handler(void *cb_arg)
>   			case ACC_VF_INT_DMA_DL5G_DESC_IRQ:
>   			case ACC_VF_INT_DMA_MLD_DESC_IRQ:
>   				/* VFs are not aware of their vf_id - it's set to 0.  */
> -				ring_data->vf_id = 0;
> +				set_vf_in_ring(ring_data, acc_dev->device_variant, 0);
>   				deq_intr_det.queue_id = get_queue_id_from_ring_info(
> -						dev->data, *ring_data);
> +						dev->data, *ring_data, acc_dev->device_variant);
>   				if (deq_intr_det.queue_id == UINT16_MAX) {
>   					rte_bbdev_log(ERR,
>   							"Couldn't find queue: aq_id: %u, qg_id: %u",
> -							ring_data->aq_id,
> -							ring_data->qg_id);
> +							aq_id, qg_id);
>   					return;
>   				}
>   				rte_bbdev_pmd_callback_process(dev,
> @@ -435,8 +438,7 @@ vrb_dev_interrupt_handler(void *cb_arg)
>   		/* Initialize Info Ring entry and move forward. */
>   		ring_data->val = 0;
>   		++acc_dev->info_ring_head;
> -		ring_data = acc_dev->info_ring +
> -				(acc_dev->info_ring_head & ACC_INFO_RING_MASK);
> +		ring_data = acc_dev->info_ring + (acc_dev->info_ring_head & ACC_INFO_RING_MASK);
>   	}
>   }
>   
> @@ -556,8 +558,7 @@ vrb_setup_queues(struct rte_bbdev *dev, uint16_t num_queues, int socket_id)
>   
>   	/* Configure tail pointer for use when SDONE enabled. */
>   	if (d->tail_ptrs == NULL)
> -		d->tail_ptrs = rte_zmalloc_socket(
> -				dev->device->driver->name,
> +		d->tail_ptrs = rte_zmalloc_socket(dev->device->driver->name,
>   				VRB_MAX_QGRPS * VRB_MAX_AQS * sizeof(uint32_t),
>   				RTE_CACHE_LINE_SIZE, socket_id);
>   	if (d->tail_ptrs == NULL) {
> @@ -783,7 +784,7 @@ vrb_find_free_queue_idx(struct rte_bbdev *dev,
>   			/* Mark the Queue as assigned. */
>   			d->q_assigned_bit_map[group_idx] |= (1ULL << aq_idx);
>   			/* Report the AQ Index. */
> -			return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx;
> +			return queue_index(group_idx, aq_idx, d->device_variant);
>   		}
>   	}
>   	rte_bbdev_log(INFO, "Failed to find free queue on %s, priority %u",
> @@ -922,9 +923,10 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t queue_id,
>   		}
>   	}
>   
> -	q->qgrp_id = (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF;
> -	q->vf_id = (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F;
> -	q->aq_id = q_idx & 0xF;
> +	q->qgrp_id = qg_from_q(q_idx, d->device_variant);
> +	q->vf_id = vf_from_q(q_idx, d->device_variant);
> +	q->aq_id = aq_from_q(q_idx, d->device_variant);
> +
>   	q->aq_depth = 0;
>   	if (conf->op_type ==  RTE_BBDEV_OP_TURBO_DEC)
>   		q->aq_depth = (1 << d->acc_conf.q_ul_4g.aq_depth_log2);
> @@ -1311,7 +1313,7 @@ vrb_fcw_td_fill(const struct rte_bbdev_dec_op *op, struct acc_fcw_td *fcw)
>   		fcw->bypass_teq = 0;
>   	}
>   
> -	fcw->code_block_mode = 1; /* FIXME */
> +	fcw->code_block_mode = 1;

Could you remind me what was the issue?

>   	fcw->turbo_crc_type = check_bit(op->turbo_dec.op_flags,
>   			RTE_BBDEV_TURBO_CRC_TYPE_24B);
>   
> @@ -1471,8 +1473,8 @@ vrb_dma_desc_td_fill(struct rte_bbdev_dec_op *op,
>   	if (op->turbo_dec.code_block_mode == RTE_BBDEV_TRANSPORT_BLOCK) {
>   		k = op->turbo_dec.tb_params.k_pos;
>   		e = (r < op->turbo_dec.tb_params.cab)
> -			? op->turbo_dec.tb_params.ea
> -			: op->turbo_dec.tb_params.eb;
> +				? op->turbo_dec.tb_params.ea
> +				: op->turbo_dec.tb_params.eb;
>   	} else {
>   		k = op->turbo_dec.cb_params.k;
>   		e = op->turbo_dec.cb_params.e;
> @@ -1726,7 +1728,7 @@ vrb_dma_desc_ld_update(struct rte_bbdev_dec_op *op,
>   	desc->op_addr = op;
>   }
>   
> -/* Enqueue one encode operations for device in CB mode */
> +/* Enqueue one encode operations for device in CB mode. */
>   static inline int
>   enqueue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op *op,
>   		uint16_t total_enqueued_cbs)
> @@ -2263,7 +2265,7 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
>   	return current_enqueued_cbs;
>   }
>   
> -/* Enqueue one decode operations for device in TB mode */
> +/* Enqueue one decode operations for device in TB mode. */
>   static inline int
>   enqueue_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
>   		uint16_t total_enqueued_cbs, uint8_t cbs_in_tb)


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 07/12] baseband/acc: adding VRB2 device variant
  2023-09-29 16:35 ` [PATCH v3 07/12] baseband/acc: adding VRB2 device variant Nicolas Chautru
@ 2023-10-03 13:41   ` Maxime Coquelin
  0 siblings, 0 replies; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-03 13:41 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas



On 9/29/23 18:35, Nicolas Chautru wrote:
> No functionality exposed only device enumeration and
> configuration.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>   doc/guides/bbdevs/features/vrb2.ini    |  14 ++
>   doc/guides/bbdevs/index.rst            |   1 +
>   doc/guides/bbdevs/vrb2.rst             | 206 +++++++++++++++++++++++++
>   doc/guides/rel_notes/release_23_11.rst |   3 +
>   drivers/baseband/acc/rte_vrb_pmd.c     | 156 +++++++++++++++----
>   drivers/baseband/acc/vrb2_pf_enum.h    | 124 +++++++++++++++
>   drivers/baseband/acc/vrb2_vf_enum.h    | 121 +++++++++++++++
>   drivers/baseband/acc/vrb_pmd.h         | 161 ++++++++++++++++++-
>   8 files changed, 751 insertions(+), 35 deletions(-)
>   create mode 100644 doc/guides/bbdevs/features/vrb2.ini
>   create mode 100644 doc/guides/bbdevs/vrb2.rst
>   create mode 100644 drivers/baseband/acc/vrb2_pf_enum.h
>   create mode 100644 drivers/baseband/acc/vrb2_vf_enum.h
> 
> diff --git a/doc/guides/bbdevs/features/vrb2.ini b/doc/guides/bbdevs/features/vrb2.ini
> new file mode 100644
> index 0000000000..23ca6990b7
> --- /dev/null
> +++ b/doc/guides/bbdevs/features/vrb2.ini
> @@ -0,0 +1,14 @@
> +;
> +; Supported features of the 'Intel vRAN Boost v2' baseband driver.
> +;
> +; Refer to default.ini for the full list of available PMD features.
> +;
> +[Features]
> +Turbo Decoder (4G)     = Y
> +Turbo Encoder (4G)     = Y
> +LDPC Decoder (5G)      = Y
> +LDPC Encoder (5G)      = Y
> +LLR/HARQ Compression   = Y
> +FFT/SRS                = Y
> +External DDR Access    = N
> +HW Accelerated         = Y
> diff --git a/doc/guides/bbdevs/index.rst b/doc/guides/bbdevs/index.rst
> index 77d4c54664..269157d77f 100644
> --- a/doc/guides/bbdevs/index.rst
> +++ b/doc/guides/bbdevs/index.rst
> @@ -15,4 +15,5 @@ Baseband Device Drivers
>       fpga_5gnr_fec
>       acc100
>       vrb1
> +    vrb2
>       la12xx
> diff --git a/doc/guides/bbdevs/vrb2.rst b/doc/guides/bbdevs/vrb2.rst
> new file mode 100644
> index 0000000000..2a30002e05
> --- /dev/null
> +++ b/doc/guides/bbdevs/vrb2.rst
> @@ -0,0 +1,206 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2023 Intel Corporation
> +
> +.. include:: <isonum.txt>
> +
> +Intel\ |reg| vRAN Boost v2 Poll Mode Driver (PMD)
> +=================================================
> +
> +The Intel\ |reg| vRAN Boost integrated accelerator enables
> +cost-effective 4G and 5G next-generation virtualized Radio Access Network (vRAN)
> +solutions.
> +The Intel vRAN Boost v2.0 (VRB2 in the code) is specifically integrated on the
> +Intel\ |reg| Xeon\ |reg| Granite Rapids-D Process (GNR-D).
> +
> +Features
> +--------
> +
> +Intel vRAN Boost v2.0 includes a 5G Low Density Parity Check (LDPC) encoder/decoder,
> +rate match/dematch, Hybrid Automatic Repeat Request (HARQ) with access to DDR
> +memory for buffer management, a 4G Turbo encoder/decoder,
> +a Fast Fourier Transform (FFT) block providing DFT/iDFT processing offload
> +for the 5G Sounding Reference Signal (SRS), a MLD-TS accelerator, a Queue Manager (QMGR),
> +and a DMA subsystem.
> +There is no dedicated on-card memory for HARQ, the coherent memory on the CPU side is being used.
> +
> +These hardware blocks provide the following features exposed by the PMD:
> +
> +- LDPC Encode in the Downlink (5GNR)
> +- LDPC Decode in the Uplink (5GNR)
> +- Turbo Encode in the Downlink (4G)
> +- Turbo Decode in the Uplink (4G)
> +- FFT processing
> +- MLD-TS processing
> +- Single Root I/O Virtualization (SR-IOV) with 16 Virtual Functions (VFs) per Physical Function (PF)
> +- Maximum of 2048 queues per VF
> +- Message Signaled Interrupts (MSIs)
> +
> +The Intel vRAN Boost v2.0 PMD supports the following bbdev capabilities:
> +
> +* For the LDPC encode operation:
> +   - ``RTE_BBDEV_LDPC_CRC_24B_ATTACH``: set to attach CRC24B to CB(s).
> +   - ``RTE_BBDEV_LDPC_RATE_MATCH``: if set then do not do Rate Match bypass.
> +   - ``RTE_BBDEV_LDPC_INTERLEAVER_BYPASS``: if set then bypass interleaver.
> +   - ``RTE_BBDEV_LDPC_ENC_SCATTER_GATHER``: supports scatter-gather for input/output data.
> +   - ``RTE_BBDEV_LDPC_ENC_CONCATENATION``: concatenate code blocks with bit granularity.
> +
> +* For the LDPC decode operation:
> +   - ``RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK``: check CRC24B from CB(s).
> +   - ``RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP``: drops CRC24B bits appended while decoding.
> +   - ``RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK``: check CRC24A from CB(s).
> +   - ``RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK``: check CRC16 from CB(s).
> +   - ``RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE``: provides an input for HARQ combining.
> +   - ``RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE``: provides an input for HARQ combining.
> +   - ``RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE``: disable early termination.
> +   - ``RTE_BBDEV_LDPC_DEC_SCATTER_GATHER``: supports scatter-gather for input/output data.
> +   - ``RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION``: supports compression of the HARQ input/output.
> +   - ``RTE_BBDEV_LDPC_LLR_COMPRESSION``: supports LLR input compression.
> +   - ``RTE_BBDEV_LDPC_HARQ_4BIT_COMPRESSION``: supports compression of the HARQ input/output.
> +   - ``RTE_BBDEV_LDPC_SOFT_OUT_ENABLE``: set the APP LLR soft output.
> +   - ``RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS``: set the APP LLR soft output after rate-matching.
> +   - ``RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS``: disables the de-interleaver.
> +
> +* For the turbo encode operation:
> +   - ``RTE_BBDEV_TURBO_CRC_24B_ATTACH``: set to attach CRC24B to CB(s).
> +   - ``RTE_BBDEV_TURBO_RATE_MATCH``: if set then do not do Rate Match bypass.
> +   - ``RTE_BBDEV_TURBO_ENC_INTERRUPTS``: set for encoder dequeue interrupts.
> +   - ``RTE_BBDEV_TURBO_RV_INDEX_BYPASS``: set to bypass RV index.
> +   - ``RTE_BBDEV_TURBO_ENC_SCATTER_GATHER``: supports scatter-gather for input/output data.
> +
> +* For the turbo decode operation:
> +   - ``RTE_BBDEV_TURBO_CRC_TYPE_24B``: check CRC24B from CB(s).
> +   - ``RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE``: perform subblock de-interleave.
> +   - ``RTE_BBDEV_TURBO_DEC_INTERRUPTS``: set for decoder dequeue interrupts.
> +   - ``RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN``: set if negative LLR input is supported.
> +   - ``RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP``: keep CRC24B bits appended while decoding.
> +   - ``RTE_BBDEV_TURBO_DEC_CRC_24B_DROP``: option to drop the code block CRC after decoding.
> +   - ``RTE_BBDEV_TURBO_EARLY_TERMINATION``: set early termination feature.
> +   - ``RTE_BBDEV_TURBO_DEC_SCATTER_GATHER``: supports scatter-gather for input/output data.
> +   - ``RTE_BBDEV_TURBO_HALF_ITERATION_EVEN``: set half iteration granularity.
> +   - ``RTE_BBDEV_TURBO_SOFT_OUTPUT``: set the APP LLR soft output.
> +   - ``RTE_BBDEV_TURBO_EQUALIZER``: set the turbo equalizer feature.
> +   - ``RTE_BBDEV_TURBO_SOFT_OUT_SATURATE``: set the soft output saturation.
> +   - ``RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH``: set to run an extra odd iteration after CRC match.
> +   - ``RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT``: set if negative APP LLR output supported.
> +   - ``RTE_BBDEV_TURBO_MAP_DEC``: supports flexible parallel MAP engine decoding.
> +
> +* For the FFT operation:
> +   - ``RTE_BBDEV_FFT_WINDOWING``: flexible windowing capability.
> +   - ``RTE_BBDEV_FFT_CS_ADJUSTMENT``: flexible adjustment of Cyclic Shift time offset.
> +   - ``RTE_BBDEV_FFT_DFT_BYPASS``: set for bypass the DFT and get directly into iDFT input.
> +   - ``RTE_BBDEV_FFT_IDFT_BYPASS``: set for bypass the IDFT and get directly the DFT output.
> +   - ``RTE_BBDEV_FFT_WINDOWING_BYPASS``: set for bypass time domain windowing.
> +
> +* For the MLD-TS operation:
> +   - ``RTE_BBDEV_MLDTS_REP``: set to repeat and reuse channel across operations.
> +
> +Installation
> +------------
> +
> +Section 3 of the DPDK manual provides instructions on installing and compiling DPDK.
> +
> +DPDK requires hugepages to be configured as detailed in section 2 of the DPDK manual.
> +The bbdev test application has been tested with a configuration 40 x 1GB hugepages.
> +The hugepage configuration of a server may be examined using:
> +
> +.. code-block:: console
> +
> +   grep Huge* /proc/meminfo
> +
> +
> +Initialization
> +--------------
> +
> +When the device first powers up, its PCI Physical Functions (PF)
> +can be listed through these commands for Intel vRAN Boost v2:
> +
> +.. code-block:: console
> +
> +   sudo lspci -vd8086:57c2
> +
> +The physical and virtual functions are compatible with Linux UIO drivers:
> +``vfio`` (preferred) and ``igb_uio`` (legacy).
> +However, in order to work the 5G/4G FEC device first needs to be bound
> +to one of these Linux drivers through DPDK.
> +
> +
> +Configure the VFs through PF
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The PCI virtual functions must be configured before working or getting assigned
> +to VMs/Containers.
> +The configuration involves allocating the number of hardware queues, priorities,
> +load balance, bandwidth and other settings necessary for the device
> +to perform FEC functions.
> +
> +This configuration needs to be executed at least once after reboot or PCI FLR
> +and can be achieved by using the functions ``rte_acc_configure()``,
> +which sets up the parameters defined in the compatible ``rte_acc_conf`` structure.
> +
> +
> +Test Application
> +----------------
> +
> +The bbdev class is provided with a test application, ``test-bbdev.py``
> +and range of test data for testing the functionality of the device,
> +depending on the device's capabilities.
> +The test application is located under app/test-bbdev folder
> +and has the following options:
> +
> +.. code-block:: console
> +
> +   "-p", "--testapp-path": specifies path to the bbdev test app.
> +   "-e", "--eal-params": EAL arguments which are passed to the test app.
> +   "-t", "--timeout": Timeout in seconds (default=300).
> +   "-c", "--test-cases": Defines test cases to run. Run all if not specified.
> +   "-v", "--test-vector": Test vector path.
> +   "-n", "--num-ops": Number of operations to process on device (default=32).
> +   "-b", "--burst-size": Operations enqueue/dequeue burst size (default=32).
> +   "-s", "--snr": SNR in dB used when generating LLRs for bler tests.
> +   "-s", "--iter_max": Number of iterations for LDPC decoder.
> +   "-l", "--num-lcores": Number of lcores to run (default=16).
> +   "-i", "--init-device": Initialise PF device with default values.
> +
> +
> +To execute the test application tool using simple decode or encode data,
> +type one of the following:
> +
> +.. code-block:: console
> +
> +  ./test-bbdev.py -c validation -n 64 -b 1 -v ./ldpc_dec_default.data
> +  ./test-bbdev.py -c validation -n 64 -b 1 -v ./ldpc_enc_default.data
> +
> +
> +The test application ``test-bbdev.py``, supports the ability to configure the
> +PF device with a default set of values, if the "-i" or "- -init-device" option
> +is included. The default values are defined in test_bbdev_perf.c.
> +
> +
> +Test Vectors
> +~~~~~~~~~~~~
> +
> +In addition to the simple LDPC decoder and LDPC encoder tests,
> +bbdev also provides a range of additional tests under the test_vectors folder,
> +which may be useful.
> +The results of these tests will depend on the device capabilities which may
> +cause some test cases to be skipped, but no failure should be reported.
> +
> +
> +Alternate Baseband Device configuration tool
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +On top of the embedded configuration feature supported in test-bbdev using
> +"- -init-device" option mentioned above, there is also a tool available
> +to perform that device configuration using a companion application.
> +The ``pf_bb_config`` application notably enables then to run bbdev-test
> +from the VF and not only limited to the PF as captured above.
> +
> +See for more details: https://github.com/intel/pf-bb-config
> +
> +Specifically for the bbdev Intel vRAN Boost v2 PMD, the command below can be used
> +(note that ACC200 was used previously to refer to VRB2):
> +
> +.. code-block:: console
> +
> +   pf_bb_config VRB2 -c ./vrb2/vrb2_config_vf_5g.cfg
> +   test-bbdev.py -e="-c 0xff0 -a${VF_PCI_ADDR}" -c validation -n 64 -b 64 -l 1 -v ./ldpc_dec_default.data
> diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst
> index 333e1d95a2..668dd58ee3 100644
> --- a/doc/guides/rel_notes/release_23_11.rst
> +++ b/doc/guides/rel_notes/release_23_11.rst
> @@ -78,6 +78,9 @@ New Features
>   * build: Optional libraries can now be selected with the new ``enable_libs``
>     build option similarly to the existing ``enable_drivers`` build option.
>   
> +* **Updated Intel vRAN Boost bbdev PMD.**
> +
> +  Added support for the new Intel vRAN Boost v2 device variant (GNR-D) within the unified driver.
>   
>   Removed Items
>   -------------
> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
> index c89c26c59a..48e779ce77 100644
> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> @@ -37,6 +37,15 @@ vrb1_queue_offset(bool pf_device, uint8_t vf_id, uint8_t qgrp_id, uint16_t aq_id
>   		return ((qgrp_id << 7) + (aq_id << 3) + VRB1_VfQmgrIngressAq);
>   }
>   
> +static inline uint32_t
> +vrb2_queue_offset(bool pf_device, uint8_t vf_id, uint8_t qgrp_id, uint16_t aq_id)
> +{
> +	if (pf_device)
> +		return ((vf_id << 14) + (qgrp_id << 9) + (aq_id << 3) + VRB2_PfQmgrIngressAq);
> +	else
> +		return ((qgrp_id << 9) + (aq_id << 3) + VRB2_VfQmgrIngressAq);
> +}
> +
>   enum {UL_4G = 0, UL_5G, DL_4G, DL_5G, FFT, MLD, NUM_ACC};
>   
>   /* Return the accelerator enum for a Queue Group Index. */
> @@ -197,7 +206,7 @@ fetch_acc_config(struct rte_bbdev *dev)
>   	struct acc_device *d = dev->data->dev_private;
>   	struct rte_acc_conf *acc_conf = &d->acc_conf;
>   	uint8_t acc, qg;
> -	uint32_t reg_aq, reg_len0, reg_len1, reg0, reg1;
> +	uint32_t reg_aq, reg_len0, reg_len1, reg_len2, reg_len3, reg0, reg1, reg2, reg3;
>   	uint32_t reg_mode, idx;
>   	struct rte_acc_queue_topology *q_top = NULL;
>   	int qman_func_id[VRB_NUM_ACCS] = {ACC_ACCMAP_0, ACC_ACCMAP_1,
> @@ -219,32 +228,81 @@ fetch_acc_config(struct rte_bbdev *dev)
>   	acc_conf->num_vf_bundles = 1;
>   	initQTop(acc_conf);
>   
> -	reg0 = acc_reg_read(d, d->reg_addr->qman_group_func);
> -	reg1 = acc_reg_read(d, d->reg_addr->qman_group_func + 4);
> -	for (qg = 0; qg < d->num_qgroups; qg++) {
> -		reg_aq = acc_reg_read(d, d->queue_offset(d->pf_device, 0, qg, 0));
> -		if (reg_aq & ACC_QUEUE_ENABLE) {
> -			if (qg < ACC_NUM_QGRPS_PER_WORD)
> -				idx = (reg0 >> (qg * 4)) & 0x7;
> +	if (d->device_variant == VRB1_VARIANT) {
> +		reg0 = acc_reg_read(d, d->reg_addr->qman_group_func);
> +		reg1 = acc_reg_read(d, d->reg_addr->qman_group_func + 4);
> +		for (qg = 0; qg < d->num_qgroups; qg++) {
> +			reg_aq = acc_reg_read(d, d->queue_offset(d->pf_device, 0, qg, 0));
> +			if (reg_aq & ACC_QUEUE_ENABLE) {
> +				if (qg < ACC_NUM_QGRPS_PER_WORD)
> +					idx = (reg0 >> (qg * 4)) & 0x7;
> +				else
> +					idx = (reg1 >> ((qg - ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
> +				if (idx < VRB1_NUM_ACCS) {
> +					acc = qman_func_id[idx];
> +					updateQtop(acc, qg, acc_conf, d);
> +				}
> +			}
> +		}
> +
> +		/* Check the depth of the AQs. */
> +		reg_len0 = acc_reg_read(d, d->reg_addr->depth_log0_offset);
> +		reg_len1 = acc_reg_read(d, d->reg_addr->depth_log1_offset);
> +		for (acc = 0; acc < NUM_ACC; acc++) {
> +			qtopFromAcc(&q_top, acc, acc_conf);
> +			if (q_top->first_qgroup_index < ACC_NUM_QGRPS_PER_WORD)
> +				q_top->aq_depth_log2 =
> +						(reg_len0 >> (q_top->first_qgroup_index * 4)) & 0xF;
>   			else
> -				idx = (reg1 >> ((qg - ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
> -			if (idx < VRB1_NUM_ACCS) {
> -				acc = qman_func_id[idx];
> -				updateQtop(acc, qg, acc_conf, d);
> +				q_top->aq_depth_log2 = (reg_len1 >> ((q_top->first_qgroup_index -
> +						ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
> +		}
> +	} else {
> +		reg0 = acc_reg_read(d, d->reg_addr->qman_group_func);
> +		reg1 = acc_reg_read(d, d->reg_addr->qman_group_func + 4);
> +		reg2 = acc_reg_read(d, d->reg_addr->qman_group_func + 8);
> +		reg3 = acc_reg_read(d, d->reg_addr->qman_group_func + 12);
> +		/* printf("Debug Function %08x %08x %08x %08x\n", reg0, reg1, reg2, reg3);*/
> +		for (qg = 0; qg < VRB2_NUM_QGRPS; qg++) {
> +			reg_aq = acc_reg_read(d, vrb2_queue_offset(d->pf_device, 0, qg, 0));
> +			if (reg_aq & ACC_QUEUE_ENABLE) {
> +				/* printf("Qg enabled %d %x\n", qg, reg_aq);*/
> +				if (qg / ACC_NUM_QGRPS_PER_WORD == 0)
> +					idx = (reg0 >> ((qg % ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
> +				else if (qg / ACC_NUM_QGRPS_PER_WORD == 1)
> +					idx = (reg1 >> ((qg % ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
> +				else if (qg / ACC_NUM_QGRPS_PER_WORD == 2)
> +					idx = (reg2 >> ((qg % ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
> +				else
> +					idx = (reg3 >> ((qg % ACC_NUM_QGRPS_PER_WORD) * 4)) & 0x7;
> +				if (idx < VRB_NUM_ACCS) {
> +					acc = qman_func_id[idx];
> +					updateQtop(acc, qg, acc_conf, d);
> +				}
>   			}
>   		}
> -	}
>   
> -	/* Check the depth of the AQs. */
> -	reg_len0 = acc_reg_read(d, d->reg_addr->depth_log0_offset);
> -	reg_len1 = acc_reg_read(d, d->reg_addr->depth_log1_offset);
> -	for (acc = 0; acc < NUM_ACC; acc++) {
> -		qtopFromAcc(&q_top, acc, acc_conf);
> -		if (q_top->first_qgroup_index < ACC_NUM_QGRPS_PER_WORD)
> -			q_top->aq_depth_log2 = (reg_len0 >> (q_top->first_qgroup_index * 4)) & 0xF;
> -		else
> -			q_top->aq_depth_log2 = (reg_len1 >> ((q_top->first_qgroup_index -
> -					ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
> +		/* Check the depth of the AQs. */
> +		reg_len0 = acc_reg_read(d, d->reg_addr->depth_log0_offset);
> +		reg_len1 = acc_reg_read(d, d->reg_addr->depth_log0_offset + 4);
> +		reg_len2 = acc_reg_read(d, d->reg_addr->depth_log0_offset + 8);
> +		reg_len3 = acc_reg_read(d, d->reg_addr->depth_log0_offset + 12);
> +
> +		for (acc = 0; acc < NUM_ACC; acc++) {
> +			qtopFromAcc(&q_top, acc, acc_conf);
> +			if (q_top->first_qgroup_index / ACC_NUM_QGRPS_PER_WORD == 0)
> +				q_top->aq_depth_log2 = (reg_len0 >> ((q_top->first_qgroup_index %
> +						ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
> +			else if (q_top->first_qgroup_index / ACC_NUM_QGRPS_PER_WORD == 1)
> +				q_top->aq_depth_log2 = (reg_len1 >> ((q_top->first_qgroup_index %
> +						ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
> +			else if (q_top->first_qgroup_index / ACC_NUM_QGRPS_PER_WORD == 2)
> +				q_top->aq_depth_log2 = (reg_len2 >> ((q_top->first_qgroup_index %
> +						ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
> +			else
> +				q_top->aq_depth_log2 = (reg_len3 >> ((q_top->first_qgroup_index %
> +						ACC_NUM_QGRPS_PER_WORD) * 4)) & 0xF;
> +		}
>   	}
>   
>   	/* Read PF mode. */
> @@ -470,7 +528,10 @@ allocate_info_ring(struct rte_bbdev *dev)
>   	phys_low  = (uint32_t)(info_ring_iova);
>   	acc_reg_write(d, d->reg_addr->info_ring_hi, phys_high);
>   	acc_reg_write(d, d->reg_addr->info_ring_lo, phys_low);
> -	acc_reg_write(d, d->reg_addr->info_ring_en, VRB1_REG_IRQ_EN_ALL);
> +	if (d->device_variant == VRB1_VARIANT)
> +		acc_reg_write(d, d->reg_addr->info_ring_en, VRB1_REG_IRQ_EN_ALL);
> +	else
> +		acc_reg_write(d, d->reg_addr->info_ring_en, VRB2_REG_IRQ_EN_ALL);
>   	d->info_ring_head = (acc_reg_read(d, d->reg_addr->info_ring_ptr) &
>   			0xFFF) / sizeof(union acc_info_ring_data);
>   	return 0;
> @@ -549,6 +610,10 @@ vrb_setup_queues(struct rte_bbdev *dev, uint16_t num_queues, int socket_id)
>   	acc_reg_write(d, d->reg_addr->dma_ring_dl4g_lo, phys_low);
>   	acc_reg_write(d, d->reg_addr->dma_ring_fft_hi, phys_high);
>   	acc_reg_write(d, d->reg_addr->dma_ring_fft_lo, phys_low);
> +	if (d->device_variant == VRB2_VARIANT) {
> +		acc_reg_write(d, d->reg_addr->dma_ring_mld_hi, phys_high);
> +		acc_reg_write(d, d->reg_addr->dma_ring_mld_lo, phys_low);
> +	}
>   	/*
>   	 * Configure Ring Size to the max queue ring size
>   	 * (used for wrapping purpose).
> @@ -582,6 +647,10 @@ vrb_setup_queues(struct rte_bbdev *dev, uint16_t num_queues, int socket_id)
>   	acc_reg_write(d, d->reg_addr->tail_ptrs_dl4g_lo, phys_low);
>   	acc_reg_write(d, d->reg_addr->tail_ptrs_fft_hi, phys_high);
>   	acc_reg_write(d, d->reg_addr->tail_ptrs_fft_lo, phys_low);
> +	if (d->device_variant == VRB2_VARIANT) {
> +		acc_reg_write(d, d->reg_addr->tail_ptrs_mld_hi, phys_high);
> +		acc_reg_write(d, d->reg_addr->tail_ptrs_mld_lo, phys_low);
> +	}
>   
>   	ret = allocate_info_ring(dev);
>   	if (ret < 0) {
> @@ -679,10 +748,17 @@ vrb_intr_enable(struct rte_bbdev *dev)
>   			return ret;
>   		}
>   
> -		if (acc_dev->pf_device)
> -			max_queues = VRB1_MAX_PF_MSIX;
> -		else
> -			max_queues = VRB1_MAX_VF_MSIX;
> +		if (d->device_variant == VRB1_VARIANT) {
> +			if (acc_dev->pf_device)
> +				max_queues = VRB1_MAX_PF_MSIX;
> +			else
> +				max_queues = VRB1_MAX_VF_MSIX;
> +		} else {
> +			if (acc_dev->pf_device)
> +				max_queues = VRB2_MAX_PF_MSIX;
> +			else
> +				max_queues = VRB2_MAX_VF_MSIX;
> +		}
>   
>   		if (rte_intr_efd_enable(dev->intr_handle, max_queues)) {
>   			rte_bbdev_log(ERR, "Failed to create fds for %u queues",
> @@ -1158,6 +1234,10 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
>   		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
>   	};
>   
> +	static const struct rte_bbdev_op_cap vrb2_bbdev_capabilities[] = {
> +		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
> +	};
> +
>   	static struct rte_bbdev_queue_conf default_queue_conf;
>   	default_queue_conf.socket = dev->data->socket_id;
>   	default_queue_conf.queue_size = ACC_MAX_QUEUE_DEPTH;
> @@ -1202,7 +1282,10 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
>   	dev_info->default_queue_conf = default_queue_conf;
>   	dev_info->cpu_flag_reqs = NULL;
>   	dev_info->min_alignment = 1;
> -	dev_info->capabilities = vrb1_bbdev_capabilities;
> +	if (d->device_variant == VRB1_VARIANT)
> +		dev_info->capabilities = vrb1_bbdev_capabilities;
> +	else
> +		dev_info->capabilities = vrb2_bbdev_capabilities;
>   	dev_info->harq_buffer_size = 0;
>   
>   	vrb_check_ir(d);
> @@ -1251,6 +1334,9 @@ static struct rte_pci_id pci_id_vrb_pf_map[] = {
>   	{
>   		RTE_PCI_DEVICE(RTE_VRB1_VENDOR_ID, RTE_VRB1_PF_DEVICE_ID)
>   	},
> +	{
> +		RTE_PCI_DEVICE(RTE_VRB2_VENDOR_ID, RTE_VRB2_PF_DEVICE_ID)
> +	},
>   	{.device_id = 0},
>   };
>   
> @@ -1259,6 +1345,9 @@ static struct rte_pci_id pci_id_vrb_vf_map[] = {
>   	{
>   		RTE_PCI_DEVICE(RTE_VRB1_VENDOR_ID, RTE_VRB1_VF_DEVICE_ID)
>   	},
> +	{
> +		RTE_PCI_DEVICE(RTE_VRB2_VENDOR_ID, RTE_VRB2_VF_DEVICE_ID)
> +	},
>   	{.device_id = 0},
>   };
>   
> @@ -3444,6 +3533,15 @@ vrb_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
>   			d->reg_addr = &vrb1_pf_reg_addr;
>   		else
>   			d->reg_addr = &vrb1_vf_reg_addr;
> +	} else {
> +		d->device_variant = VRB2_VARIANT;
> +		d->queue_offset = vrb2_queue_offset;
> +		d->num_qgroups = VRB2_NUM_QGRPS;
> +		d->num_aqs = VRB2_NUM_AQS;
> +		if (d->pf_device)
> +			d->reg_addr = &vrb2_pf_reg_addr;
> +		else
> +			d->reg_addr = &vrb2_vf_reg_addr;
>   	}
>   
>   	rte_bbdev_log_debug("Init device %s [%s] @ vaddr %p paddr %#"PRIx64"",
> diff --git a/drivers/baseband/acc/vrb2_pf_enum.h b/drivers/baseband/acc/vrb2_pf_enum.h
> new file mode 100644
> index 0000000000..28f10dc35b
> --- /dev/null
> +++ b/drivers/baseband/acc/vrb2_pf_enum.h
> @@ -0,0 +1,124 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2023 Intel Corporation
> + */
> +
> +#ifndef VRB2_PF_ENUM_H
> +#define VRB2_PF_ENUM_H
> +
> +/*
> + * VRB2 Register mapping on PF BAR0
> + * This is automatically generated from RDL, format may change with new RDL
> + * Release.
> + * Variable names are as is
> + */
> +enum {
> +	VRB2_PfQmgrEgressQueuesTemplate             = 0x0007FC00,
> +	VRB2_PfQmgrIngressAq                        = 0x00100000,
> +	VRB2_PfQmgrSoftReset                        = 0x00A00034,
> +	VRB2_PfQmgrAramAllocEn	                    = 0x00A000a0,
> +	VRB2_PfQmgrAramAllocSetupN0                 = 0x00A000b0,
> +	VRB2_PfQmgrAramAllocSetupN1                 = 0x00A000b4,
> +	VRB2_PfQmgrAramAllocSetupN2                 = 0x00A000b8,
> +	VRB2_PfQmgrAramAllocSetupN3                 = 0x00A000bc,
> +	VRB2_PfQmgrDepthLog2Grp                     = 0x00A00200,
> +	VRB2_PfQmgrTholdGrp                         = 0x00A00300,
> +	VRB2_PfQmgrGrpTmplateReg0Indx               = 0x00A00600,
> +	VRB2_PfQmgrGrpTmplateReg1Indx               = 0x00A00700,
> +	VRB2_PfQmgrGrpTmplateReg2Indx               = 0x00A00800,
> +	VRB2_PfQmgrGrpTmplateReg3Indx               = 0x00A00900,
> +	VRB2_PfQmgrGrpTmplateReg4Indx               = 0x00A00A00,
> +	VRB2_PfQmgrGrpTmplateReg5Indx               = 0x00A00B00,
> +	VRB2_PfQmgrGrpTmplateReg6Indx               = 0x00A00C00,
> +	VRB2_PfQmgrGrpTmplateReg7Indx               = 0x00A00D00,
> +	VRB2_PfQmgrGrpTmplateEnRegIndx              = 0x00A00E00,
> +	VRB2_PfQmgrArbQDepthGrp                     = 0x00A02F00,
> +	VRB2_PfQmgrGrpFunction0                     = 0x00A02F80,
> +	VRB2_PfQmgrGrpPriority                      = 0x00A02FC0,
> +	VRB2_PfQmgrVfBaseAddr                       = 0x00A08000,
> +	VRB2_PfQmgrAqEnableVf                       = 0x00A10000,
> +	VRB2_PfQmgrRingSizeVf                       = 0x00A20010,
> +	VRB2_PfQmgrGrpDepthLog20Vf                  = 0x00A20020,
> +	VRB2_PfQmgrGrpDepthLog21Vf                  = 0x00A20024,
> +	VRB2_PfFabricM2iBufferReg                   = 0x00B30000,
> +	VRB2_PfFecUl5gIbDebug0Reg                   = 0x00B401FC,
> +	VRB2_PfFftConfig0                           = 0x00B58004,
> +	VRB2_PfFftParityMask8                       = 0x00B5803C,
> +	VRB2_PfDmaConfig0Reg                        = 0x00B80000,
> +	VRB2_PfDmaConfig1Reg                        = 0x00B80004,
> +	VRB2_PfDmaQmgrAddrReg                       = 0x00B80008,
> +	VRB2_PfDmaAxcacheReg                        = 0x00B80010,
> +	VRB2_PfDmaAxiControl                        = 0x00B8002C,
> +	VRB2_PfDmaQmanen                            = 0x00B80040,
> +	VRB2_PfDmaQmanenSelect                      = 0x00B80044,
> +	VRB2_PfDmaCfgRrespBresp                     = 0x00B80814,
> +	VRB2_PfDmaDescriptorSignature               = 0x00B80868,
> +	VRB2_PfDmaErrorDetectionEn                  = 0x00B80870,
> +	VRB2_PfDmaFec5GulDescBaseLoRegVf            = 0x00B88020,
> +	VRB2_PfDmaFec5GulDescBaseHiRegVf            = 0x00B88024,
> +	VRB2_PfDmaFec5GulRespPtrLoRegVf             = 0x00B88028,
> +	VRB2_PfDmaFec5GulRespPtrHiRegVf             = 0x00B8802C,
> +	VRB2_PfDmaFec5GdlDescBaseLoRegVf            = 0x00B88040,
> +	VRB2_PfDmaFec5GdlDescBaseHiRegVf            = 0x00B88044,
> +	VRB2_PfDmaFec5GdlRespPtrLoRegVf             = 0x00B88048,
> +	VRB2_PfDmaFec5GdlRespPtrHiRegVf             = 0x00B8804C,
> +	VRB2_PfDmaFec4GulDescBaseLoRegVf            = 0x00B88060,
> +	VRB2_PfDmaFec4GulDescBaseHiRegVf            = 0x00B88064,
> +	VRB2_PfDmaFec4GulRespPtrLoRegVf             = 0x00B88068,
> +	VRB2_PfDmaFec4GulRespPtrHiRegVf             = 0x00B8806C,
> +	VRB2_PfDmaFec4GdlDescBaseLoRegVf            = 0x00B88080,
> +	VRB2_PfDmaFec4GdlDescBaseHiRegVf            = 0x00B88084,
> +	VRB2_PfDmaFec4GdlRespPtrLoRegVf             = 0x00B88088,
> +	VRB2_PfDmaFec4GdlRespPtrHiRegVf             = 0x00B8808C,
> +	VRB2_PfDmaFftDescBaseLoRegVf                = 0x00B880A0,
> +	VRB2_PfDmaFftDescBaseHiRegVf                = 0x00B880A4,
> +	VRB2_PfDmaFftRespPtrLoRegVf                 = 0x00B880A8,
> +	VRB2_PfDmaFftRespPtrHiRegVf                 = 0x00B880AC,
> +	VRB2_PfDmaMldDescBaseLoRegVf                = 0x00B880C0,
> +	VRB2_PfDmaMldDescBaseHiRegVf                = 0x00B880C4,
> +	VRB2_PfQosmonAEvalOverflow0                 = 0x00B90008,
> +	VRB2_PfPermonACntrlRegVf                    = 0x00B98000,
> +	VRB2_PfQosmonBEvalOverflow0                 = 0x00BA0008,
> +	VRB2_PfPermonBCntrlRegVf                    = 0x00BA8000,
> +	VRB2_PfPermonCCntrlRegVf                    = 0x00BB8000,
> +	VRB2_PfHiInfoRingBaseLoRegPf                = 0x00C84014,
> +	VRB2_PfHiInfoRingBaseHiRegPf                = 0x00C84018,
> +	VRB2_PfHiInfoRingPointerRegPf               = 0x00C8401C,
> +	VRB2_PfHiInfoRingIntWrEnRegPf               = 0x00C84020,
> +	VRB2_PfHiBlockTransmitOnErrorEn             = 0x00C84038,
> +	VRB2_PfHiCfgMsiIntWrEnRegPf                 = 0x00C84040,
> +	VRB2_PfHiMsixVectorMapperPf                 = 0x00C84060,
> +	VRB2_PfHiPfMode                             = 0x00C84108,
> +	VRB2_PfHiClkGateHystReg                     = 0x00C8410C,
> +	VRB2_PfHiMsiDropEnableReg                   = 0x00C84114,
> +	VRB2_PfHiSectionPowerGatingReq              = 0x00C84128,
> +	VRB2_PfHiSectionPowerGatingAck              = 0x00C8412C,
> +};
> +
> +/* TIP PF Interrupt numbers */
> +enum {
> +	VRB2_PF_INT_QMGR_AQ_OVERFLOW = 0,
> +	VRB2_PF_INT_DOORBELL_VF_2_PF = 1,
> +	VRB2_PF_INT_ILLEGAL_FORMAT = 2,
> +	VRB2_PF_INT_QMGR_DISABLED_ACCESS = 3,
> +	VRB2_PF_INT_QMGR_AQ_OVERTHRESHOLD = 4,
> +	VRB2_PF_INT_DMA_DL_DESC_IRQ = 5,
> +	VRB2_PF_INT_DMA_UL_DESC_IRQ = 6,
> +	VRB2_PF_INT_DMA_FFT_DESC_IRQ = 7,
> +	VRB2_PF_INT_DMA_UL5G_DESC_IRQ = 8,
> +	VRB2_PF_INT_DMA_DL5G_DESC_IRQ = 9,
> +	VRB2_PF_INT_DMA_MLD_DESC_IRQ = 10,
> +	VRB2_PF_INT_ARAM_ACCESS_ERR = 11,
> +	VRB2_PF_INT_ARAM_ECC_1BIT_ERR = 12,
> +	VRB2_PF_INT_PARITY_ERR = 13,
> +	VRB2_PF_INT_QMGR_OVERFLOW = 14,
> +	VRB2_PF_INT_QMGR_ERR = 15,
> +	VRB2_PF_INT_ATS_ERR = 22,
> +	VRB2_PF_INT_ARAM_FUUL = 23,
> +	VRB2_PF_INT_EXTRA_READ = 24,
> +	VRB2_PF_INT_COMPLETION_TIMEOUT = 25,
> +	VRB2_PF_INT_CORE_HANG = 26,
> +	VRB2_PF_INT_DMA_HANG = 28,
> +	VRB2_PF_INT_DS_HANG = 27,
> +};
> +
> +#endif /* VRB2_PF_ENUM_H */
> diff --git a/drivers/baseband/acc/vrb2_vf_enum.h b/drivers/baseband/acc/vrb2_vf_enum.h
> new file mode 100644
> index 0000000000..9c6e451010
> --- /dev/null
> +++ b/drivers/baseband/acc/vrb2_vf_enum.h
> @@ -0,0 +1,121 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2023 Intel Corporation
> + */
> +
> +#ifndef VRB2_VF_ENUM_H
> +#define VRB2_VF_ENUM_H
> +
> +/*
> + * VRB2 Register mapping on VF BAR0
> + * This is automatically generated from RDL, format may change with new RDL
> + */
> +enum {
> +	VRB2_VfHiVfToPfDbellVf           = 0x00000000,
> +	VRB2_VfHiPfToVfDbellVf           = 0x00000008,
> +	VRB2_VfHiInfoRingBaseLoVf        = 0x00000010,
> +	VRB2_VfHiInfoRingBaseHiVf        = 0x00000014,
> +	VRB2_VfHiInfoRingPointerVf       = 0x00000018,
> +	VRB2_VfHiInfoRingIntWrEnVf       = 0x00000020,
> +	VRB2_VfHiInfoRingPf2VfWrEnVf     = 0x00000024,
> +	VRB2_VfHiMsixVectorMapperVf      = 0x00000060,
> +	VRB2_VfHiDeviceStatus            = 0x00000068,
> +	VRB2_VfHiInterruptSrc            = 0x00000070,
> +	VRB2_VfDmaFec5GulDescBaseLoRegVf = 0x00000120,
> +	VRB2_VfDmaFec5GulDescBaseHiRegVf = 0x00000124,
> +	VRB2_VfDmaFec5GulRespPtrLoRegVf  = 0x00000128,
> +	VRB2_VfDmaFec5GulRespPtrHiRegVf  = 0x0000012C,
> +	VRB2_VfDmaFec5GdlDescBaseLoRegVf = 0x00000140,
> +	VRB2_VfDmaFec5GdlDescBaseHiRegVf = 0x00000144,
> +	VRB2_VfDmaFec5GdlRespPtrLoRegVf  = 0x00000148,
> +	VRB2_VfDmaFec5GdlRespPtrHiRegVf  = 0x0000014C,
> +	VRB2_VfDmaFec4GulDescBaseLoRegVf = 0x00000160,
> +	VRB2_VfDmaFec4GulDescBaseHiRegVf = 0x00000164,
> +	VRB2_VfDmaFec4GulRespPtrLoRegVf  = 0x00000168,
> +	VRB2_VfDmaFec4GulRespPtrHiRegVf  = 0x0000016C,
> +	VRB2_VfDmaFec4GdlDescBaseLoRegVf = 0x00000180,
> +	VRB2_VfDmaFec4GdlDescBaseHiRegVf = 0x00000184,
> +	VRB2_VfDmaFec4GdlRespPtrLoRegVf  = 0x00000188,
> +	VRB2_VfDmaFec4GdlRespPtrHiRegVf  = 0x0000018C,
> +	VRB2_VfDmaFftDescBaseLoRegVf     = 0x000001A0,
> +	VRB2_VfDmaFftDescBaseHiRegVf     = 0x000001A4,
> +	VRB2_VfDmaFftRespPtrLoRegVf      = 0x000001A8,
> +	VRB2_VfDmaFftRespPtrHiRegVf      = 0x000001AC,
> +	VRB2_VfDmaMldDescBaseLoRegVf     = 0x000001C0,
> +	VRB2_VfDmaMldDescBaseHiRegVf     = 0x000001C4,
> +	VRB2_VfDmaMldRespPtrLoRegVf      = 0x000001C8,
> +	VRB2_VfDmaMldRespPtrHiRegVf      = 0x000001CC,
> +	VRB2_VfPmACntrlRegVf             = 0x00000200,
> +	VRB2_VfPmACountVf                = 0x00000208,
> +	VRB2_VfPmAKCntLoVf               = 0x00000210,
> +	VRB2_VfPmAKCntHiVf               = 0x00000214,
> +	VRB2_VfPmADeltaCntLoVf           = 0x00000220,
> +	VRB2_VfPmADeltaCntHiVf           = 0x00000224,
> +	VRB2_VfPmBCntrlRegVf             = 0x00000240,
> +	VRB2_VfPmBCountVf                = 0x00000248,
> +	VRB2_VfPmBKCntLoVf               = 0x00000250,
> +	VRB2_VfPmBKCntHiVf               = 0x00000254,
> +	VRB2_VfPmBDeltaCntLoVf           = 0x00000260,
> +	VRB2_VfPmBDeltaCntHiVf           = 0x00000264,
> +	VRB2_VfPmCCntrlRegVf             = 0x00000280,
> +	VRB2_VfPmCCountVf                = 0x00000288,
> +	VRB2_VfPmCKCntLoVf               = 0x00000290,
> +	VRB2_VfPmCKCntHiVf               = 0x00000294,
> +	VRB2_VfPmCDeltaCntLoVf           = 0x000002A0,
> +	VRB2_VfPmCDeltaCntHiVf           = 0x000002A4,
> +	VRB2_VfPmDCntrlRegVf             = 0x000002C0,
> +	VRB2_VfPmDCountVf                = 0x000002C8,
> +	VRB2_VfPmDKCntLoVf               = 0x000002D0,
> +	VRB2_VfPmDKCntHiVf               = 0x000002D4,
> +	VRB2_VfPmDDeltaCntLoVf           = 0x000002E0,
> +	VRB2_VfPmDDeltaCntHiVf           = 0x000002E4,
> +	VRB2_VfPmECntrlRegVf             = 0x00000300,
> +	VRB2_VfPmECountVf                = 0x00000308,
> +	VRB2_VfPmEKCntLoVf               = 0x00000310,
> +	VRB2_VfPmEKCntHiVf               = 0x00000314,
> +	VRB2_VfPmEDeltaCntLoVf           = 0x00000320,
> +	VRB2_VfPmEDeltaCntHiVf           = 0x00000324,
> +	VRB2_VfPmFCntrlRegVf             = 0x00000340,
> +	VRB2_VfPmFCountVf                = 0x00000348,
> +	VRB2_VfPmFKCntLoVf               = 0x00000350,
> +	VRB2_VfPmFKCntHiVf               = 0x00000354,
> +	VRB2_VfPmFDeltaCntLoVf           = 0x00000360,
> +	VRB2_VfPmFDeltaCntHiVf           = 0x00000364,
> +	VRB2_VfQmgrAqReset0              = 0x00000600,
> +	VRB2_VfQmgrAqReset1              = 0x00000604,
> +	VRB2_VfQmgrAqReset2              = 0x00000608,
> +	VRB2_VfQmgrAqReset3              = 0x0000060C,
> +	VRB2_VfQmgrRingSizeVf            = 0x00000610,
> +	VRB2_VfQmgrGrpDepthLog20Vf       = 0x00000620,
> +	VRB2_VfQmgrGrpDepthLog21Vf       = 0x00000624,
> +	VRB2_VfQmgrGrpDepthLog22Vf       = 0x00000628,
> +	VRB2_VfQmgrGrpDepthLog23Vf       = 0x0000062C,
> +	VRB2_VfQmgrGrpFunction0Vf        = 0x00000630,
> +	VRB2_VfQmgrGrpFunction1Vf        = 0x00000634,
> +	VRB2_VfQmgrAramUsageN0           = 0x00000640,
> +	VRB2_VfQmgrAramUsageN1           = 0x00000644,
> +	VRB2_VfQmgrAramUsageN2           = 0x00000648,
> +	VRB2_VfQmgrAramUsageN3           = 0x0000064C,
> +	VRB2_VfHiMSIXBaseLoRegVf         = 0x00001000,
> +	VRB2_VfHiMSIXBaseHiRegVf         = 0x00001004,
> +	VRB2_VfHiMSIXBaseDataRegVf       = 0x00001008,
> +	VRB2_VfHiMSIXBaseMaskRegVf       = 0x0000100C,
> +	VRB2_VfHiMSIXPBABaseLoRegVf      = 0x00003000,
> +	VRB2_VfQmgrIngressAq             = 0x00004000,
> +};
> +
> +/* TIP VF Interrupt numbers */
> +enum {
> +	VRB2_VF_INT_QMGR_AQ_OVERFLOW = 0,
> +	VRB2_VF_INT_DOORBELL_PF_2_VF = 1,
> +	VRB2_VF_INT_ILLEGAL_FORMAT = 2,
> +	VRB2_VF_INT_QMGR_DISABLED_ACCESS = 3,
> +	VRB2_VF_INT_QMGR_AQ_OVERTHRESHOLD = 4,
> +	VRB2_VF_INT_DMA_DL_DESC_IRQ = 5,
> +	VRB2_VF_INT_DMA_UL_DESC_IRQ = 6,
> +	VRB2_VF_INT_DMA_FFT_DESC_IRQ = 7,
> +	VRB2_VF_INT_DMA_UL5G_DESC_IRQ = 8,
> +	VRB2_VF_INT_DMA_DL5G_DESC_IRQ = 9,
> +	VRB2_VF_INT_DMA_MLD_DESC_IRQ = 10,
> +};
> +
> +#endif /* VRB2_VF_ENUM_H */
> diff --git a/drivers/baseband/acc/vrb_pmd.h b/drivers/baseband/acc/vrb_pmd.h
> index 1cabc0b7f4..0371db9972 100644
> --- a/drivers/baseband/acc/vrb_pmd.h
> +++ b/drivers/baseband/acc/vrb_pmd.h
> @@ -8,6 +8,8 @@
>   #include "acc_common.h"
>   #include "vrb1_pf_enum.h"
>   #include "vrb1_vf_enum.h"
> +#include "vrb2_pf_enum.h"
> +#include "vrb2_vf_enum.h"
>   #include "vrb_cfg.h"
>   
>   /* Helper macro for logging */
> @@ -31,12 +33,13 @@
>   #define RTE_VRB1_VENDOR_ID           (0x8086)
>   #define RTE_VRB1_PF_DEVICE_ID        (0x57C0)
>   #define RTE_VRB1_VF_DEVICE_ID        (0x57C1)
> -
> -#define VRB1_VARIANT               2
> +#define RTE_VRB2_VENDOR_ID           (0x8086)
> +#define RTE_VRB2_PF_DEVICE_ID        (0x57C2)
> +#define RTE_VRB2_VF_DEVICE_ID        (0x57C3)
>   
>   #define VRB_NUM_ACCS                 6
>   #define VRB_MAX_QGRPS                32
> -#define VRB_MAX_AQS                  32
> +#define VRB_MAX_AQS                  64
>   
>   #define ACC_STATUS_WAIT      10
>   #define ACC_STATUS_TO        100
> @@ -46,8 +49,6 @@
>   #define VRB1_NUM_VFS                  16
>   #define VRB1_NUM_QGRPS                16
>   #define VRB1_NUM_AQS                  16
> -#define VRB1_GRP_ID_SHIFT    10 /* Queue Index Hierarchy */
> -#define VRB1_VF_ID_SHIFT     4  /* Queue Index Hierarchy */
>   #define VRB1_WORDS_IN_ARAM_SIZE (256 * 1024 / 4)
>   
>   /* VRB1 Mapping of signals for the available engines */
> @@ -61,7 +62,6 @@
>   #define VRB1_SIG_DL_4G_LAST 23
>   #define VRB1_SIG_FFT        24
>   #define VRB1_SIG_FFT_LAST   24
> -
>   #define VRB1_NUM_ACCS       5
>   
>   /* VRB1 Configuration */
> @@ -90,6 +90,67 @@
>   #define VRB1_MAX_PF_MSIX            (256+32)
>   #define VRB1_MAX_VF_MSIX            (256+7)
>   
> +/* VRB2 specific flags */
> +
> +#define VRB2_NUM_VFS        64
> +#define VRB2_NUM_QGRPS      32
> +#define VRB2_NUM_AQS        64
> +#define VRB2_WORDS_IN_ARAM_SIZE (512 * 1024 / 4)
> +#define VRB2_NUM_ACCS        6
> +#define VRB2_AQ_REG_NUM      4
> +
> +/* VRB2 Mapping of signals for the available engines */
> +#define VRB2_SIG_UL_5G       0
> +#define VRB2_SIG_UL_5G_LAST  5
> +#define VRB2_SIG_DL_5G       9
> +#define VRB2_SIG_DL_5G_LAST 11
> +#define VRB2_SIG_UL_4G      12
> +#define VRB2_SIG_UL_4G_LAST 16
> +#define VRB2_SIG_DL_4G      21
> +#define VRB2_SIG_DL_4G_LAST 23
> +#define VRB2_SIG_FFT        24
> +#define VRB2_SIG_FFT_LAST   26
> +#define VRB2_SIG_MLD        30
> +#define VRB2_SIG_MLD_LAST   31
> +#define VRB2_FFT_NUM        3
> +
> +#define VRB2_FCW_MLDTS_BLEN 32
> +#define VRB2_MLD_MIN_LAYER   2
> +#define VRB2_MLD_MAX_LAYER   4
> +#define VRB2_MLD_MAX_RREP    5
> +#define VRB2_MLD_LAY_SIZE    3
> +#define VRB2_MLD_RREP_SIZE   6
> +#define VRB2_MLD_M2DLEN      3
> +
> +#define VRB2_MAX_PF_MSIX      (256+32)
> +#define VRB2_MAX_VF_MSIX      (64+7)
> +#define VRB2_REG_IRQ_EN_ALL   0xFFFFFFFF  /* Enable all interrupts */
> +#define VRB2_FABRIC_MODE      0x8000103
> +#define VRB2_CFG_DMA_ERROR    0x7DF
> +#define VRB2_CFG_AXI_CACHE    0x11
> +#define VRB2_CFG_QMGR_HI_P    0x0F0F
> +#define VRB2_RESET_HARD       0x1FF
> +#define VRB2_ENGINES_MAX      9
> +#define VRB2_GPEX_AXIMAP_NUM  17
> +#define VRB2_CLOCK_GATING_EN  0x30000
> +#define VRB2_FFT_CFG_0        0x2001
> +#define VRB2_FFT_ECC          0x60
> +#define VRB2_FFT_RAM_EN       0x80008000
> +#define VRB2_FFT_RAM_DIS      0x0
> +#define VRB2_FFT_RAM_SIZE     512
> +#define VRB2_CLK_EN           0x00010A01
> +#define VRB2_CLK_DIS          0x01F10A01
> +#define VRB2_PG_MASK_0        0x1F
> +#define VRB2_PG_MASK_1        0xF
> +#define VRB2_PG_MASK_2        0x1
> +#define VRB2_PG_MASK_3        0x0
> +#define VRB2_PG_MASK_FFT      1
> +#define VRB2_PG_MASK_4GUL     4
> +#define VRB2_PG_MASK_5GUL     8
> +#define VRB2_PF_PM_REG_OFFSET 0x10000
> +#define VRB2_VF_PM_REG_OFFSET 0x40
> +#define VRB2_PM_START         0x2
> +
>   struct acc_registry_addr {
>   	unsigned int dma_ring_dl5g_hi;
>   	unsigned int dma_ring_dl5g_lo;
> @@ -218,4 +279,92 @@ static const struct acc_registry_addr vrb1_vf_reg_addr = {
>   	.pf2vf_doorbell = VRB1_VfHiPfToVfDbellVf,
>   };
>   
> +
> +/* Structure holding registry addresses for PF */
> +static const struct acc_registry_addr vrb2_pf_reg_addr = {
> +	.dma_ring_dl5g_hi =  VRB2_PfDmaFec5GdlDescBaseHiRegVf,
> +	.dma_ring_dl5g_lo =  VRB2_PfDmaFec5GdlDescBaseLoRegVf,
> +	.dma_ring_ul5g_hi =  VRB2_PfDmaFec5GulDescBaseHiRegVf,
> +	.dma_ring_ul5g_lo =  VRB2_PfDmaFec5GulDescBaseLoRegVf,
> +	.dma_ring_dl4g_hi =  VRB2_PfDmaFec4GdlDescBaseHiRegVf,
> +	.dma_ring_dl4g_lo =  VRB2_PfDmaFec4GdlDescBaseLoRegVf,
> +	.dma_ring_ul4g_hi =  VRB2_PfDmaFec4GulDescBaseHiRegVf,
> +	.dma_ring_ul4g_lo =  VRB2_PfDmaFec4GulDescBaseLoRegVf,
> +	.dma_ring_fft_hi =   VRB2_PfDmaFftDescBaseHiRegVf,
> +	.dma_ring_fft_lo =   VRB2_PfDmaFftDescBaseLoRegVf,
> +	.dma_ring_mld_hi =   VRB2_PfDmaMldDescBaseHiRegVf,
> +	.dma_ring_mld_lo =   VRB2_PfDmaMldDescBaseLoRegVf,
> +	.ring_size =         VRB2_PfQmgrRingSizeVf,
> +	.info_ring_hi =      VRB2_PfHiInfoRingBaseHiRegPf,
> +	.info_ring_lo =      VRB2_PfHiInfoRingBaseLoRegPf,
> +	.info_ring_en =      VRB2_PfHiInfoRingIntWrEnRegPf,
> +	.info_ring_ptr =     VRB2_PfHiInfoRingPointerRegPf,
> +	.tail_ptrs_dl5g_hi = VRB2_PfDmaFec5GdlRespPtrHiRegVf,
> +	.tail_ptrs_dl5g_lo = VRB2_PfDmaFec5GdlRespPtrLoRegVf,
> +	.tail_ptrs_ul5g_hi = VRB2_PfDmaFec5GulRespPtrHiRegVf,
> +	.tail_ptrs_ul5g_lo = VRB2_PfDmaFec5GulRespPtrLoRegVf,
> +	.tail_ptrs_dl4g_hi = VRB2_PfDmaFec4GdlRespPtrHiRegVf,
> +	.tail_ptrs_dl4g_lo = VRB2_PfDmaFec4GdlRespPtrLoRegVf,
> +	.tail_ptrs_ul4g_hi = VRB2_PfDmaFec4GulRespPtrHiRegVf,
> +	.tail_ptrs_ul4g_lo = VRB2_PfDmaFec4GulRespPtrLoRegVf,
> +	.tail_ptrs_fft_hi =  VRB2_PfDmaFftRespPtrHiRegVf,
> +	.tail_ptrs_fft_lo =  VRB2_PfDmaFftRespPtrLoRegVf,
> +	.tail_ptrs_mld_hi =  VRB2_PfDmaFftRespPtrHiRegVf,
> +	.tail_ptrs_mld_lo =  VRB2_PfDmaFftRespPtrLoRegVf,
> +	.depth_log0_offset = VRB2_PfQmgrGrpDepthLog20Vf,
> +	.depth_log1_offset = VRB2_PfQmgrGrpDepthLog21Vf,
> +	.qman_group_func =   VRB2_PfQmgrGrpFunction0,
> +	.hi_mode =           VRB2_PfHiMsixVectorMapperPf,
> +	.pf_mode =           VRB2_PfHiPfMode,
> +	.pmon_ctrl_a =       VRB2_PfPermonACntrlRegVf,
> +	.pmon_ctrl_b =       VRB2_PfPermonBCntrlRegVf,
> +	.pmon_ctrl_c =       VRB2_PfPermonCCntrlRegVf,
> +	.vf2pf_doorbell =    0,
> +	.pf2vf_doorbell =    0,
> +};
> +
> +/* Structure holding registry addresses for VF */
> +static const struct acc_registry_addr vrb2_vf_reg_addr = {
> +	.dma_ring_dl5g_hi =  VRB2_VfDmaFec5GdlDescBaseHiRegVf,
> +	.dma_ring_dl5g_lo =  VRB2_VfDmaFec5GdlDescBaseLoRegVf,
> +	.dma_ring_ul5g_hi =  VRB2_VfDmaFec5GulDescBaseHiRegVf,
> +	.dma_ring_ul5g_lo =  VRB2_VfDmaFec5GulDescBaseLoRegVf,
> +	.dma_ring_dl4g_hi =  VRB2_VfDmaFec4GdlDescBaseHiRegVf,
> +	.dma_ring_dl4g_lo =  VRB2_VfDmaFec4GdlDescBaseLoRegVf,
> +	.dma_ring_ul4g_hi =  VRB2_VfDmaFec4GulDescBaseHiRegVf,
> +	.dma_ring_ul4g_lo =  VRB2_VfDmaFec4GulDescBaseLoRegVf,
> +	.dma_ring_fft_hi =   VRB2_VfDmaFftDescBaseHiRegVf,
> +	.dma_ring_fft_lo =   VRB2_VfDmaFftDescBaseLoRegVf,
> +	.dma_ring_mld_hi =   VRB2_VfDmaMldDescBaseHiRegVf,
> +	.dma_ring_mld_lo =   VRB2_VfDmaMldDescBaseLoRegVf,
> +	.ring_size =         VRB2_VfQmgrRingSizeVf,
> +	.info_ring_hi =      VRB2_VfHiInfoRingBaseHiVf,
> +	.info_ring_lo =      VRB2_VfHiInfoRingBaseLoVf,
> +	.info_ring_en =      VRB2_VfHiInfoRingIntWrEnVf,
> +	.info_ring_ptr =     VRB2_VfHiInfoRingPointerVf,
> +	.tail_ptrs_dl5g_hi = VRB2_VfDmaFec5GdlRespPtrHiRegVf,
> +	.tail_ptrs_dl5g_lo = VRB2_VfDmaFec5GdlRespPtrLoRegVf,
> +	.tail_ptrs_ul5g_hi = VRB2_VfDmaFec5GulRespPtrHiRegVf,
> +	.tail_ptrs_ul5g_lo = VRB2_VfDmaFec5GulRespPtrLoRegVf,
> +	.tail_ptrs_dl4g_hi = VRB2_VfDmaFec4GdlRespPtrHiRegVf,
> +	.tail_ptrs_dl4g_lo = VRB2_VfDmaFec4GdlRespPtrLoRegVf,
> +	.tail_ptrs_ul4g_hi = VRB2_VfDmaFec4GulRespPtrHiRegVf,
> +	.tail_ptrs_ul4g_lo = VRB2_VfDmaFec4GulRespPtrLoRegVf,
> +	.tail_ptrs_fft_hi =  VRB2_VfDmaFftRespPtrHiRegVf,
> +	.tail_ptrs_fft_lo =  VRB2_VfDmaFftRespPtrLoRegVf,
> +	.tail_ptrs_mld_hi =  VRB2_VfDmaMldRespPtrHiRegVf,
> +	.tail_ptrs_mld_lo =  VRB2_VfDmaMldRespPtrLoRegVf,
> +	.depth_log0_offset = VRB2_VfQmgrGrpDepthLog20Vf,
> +	.depth_log1_offset = VRB2_VfQmgrGrpDepthLog21Vf,
> +	.qman_group_func =   VRB2_VfQmgrGrpFunction0Vf,
> +	.hi_mode =           VRB2_VfHiMsixVectorMapperVf,
> +	.pf_mode =           0,
> +	.pmon_ctrl_a =       VRB2_VfPmACntrlRegVf,
> +	.pmon_ctrl_b =       VRB2_VfPmBCntrlRegVf,
> +	.pmon_ctrl_c =       VRB2_VfPmCCntrlRegVf,
> +	.vf2pf_doorbell =    VRB2_VfHiVfToPfDbellVf,
> +	.pf2vf_doorbell =    VRB2_VfHiPfToVfDbellVf,
> +};
> +
> +
>   #endif /* _VRB_PMD_H_ */

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 08/12] baseband/acc: add FEC capabilities for the VRB2 variant
  2023-09-29 16:35 ` [PATCH v3 08/12] baseband/acc: add FEC capabilities for the VRB2 variant Nicolas Chautru
@ 2023-10-03 14:28   ` Maxime Coquelin
  2023-10-04 21:11     ` Chautru, Nicolas
  0 siblings, 1 reply; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-03 14:28 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas



On 9/29/23 18:35, Nicolas Chautru wrote:
> New implementation for some of the FEC features
> specific to the VRB2 variant.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>   drivers/baseband/acc/rte_vrb_pmd.c | 567 ++++++++++++++++++++++++++++-
>   1 file changed, 548 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
> index 48e779ce77..93add82947 100644
> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> @@ -1235,6 +1235,94 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
>   	};
>   
>   	static const struct rte_bbdev_op_cap vrb2_bbdev_capabilities[] = {
> +		{
> +			.type = RTE_BBDEV_OP_TURBO_DEC,
> +			.cap.turbo_dec = {
> +				.capability_flags =
> +					RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE |
> +					RTE_BBDEV_TURBO_CRC_TYPE_24B |
> +					RTE_BBDEV_TURBO_DEC_CRC_24B_DROP |
> +					RTE_BBDEV_TURBO_EQUALIZER |
> +					RTE_BBDEV_TURBO_SOFT_OUT_SATURATE |
> +					RTE_BBDEV_TURBO_HALF_ITERATION_EVEN |
> +					RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH |
> +					RTE_BBDEV_TURBO_SOFT_OUTPUT |
> +					RTE_BBDEV_TURBO_EARLY_TERMINATION |
> +					RTE_BBDEV_TURBO_DEC_INTERRUPTS |
> +					RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN |
> +					RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT |
> +					RTE_BBDEV_TURBO_MAP_DEC |
> +					RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP |
> +					RTE_BBDEV_TURBO_DEC_SCATTER_GATHER,
> +				.max_llr_modulus = INT8_MAX,
> +				.num_buffers_src =
> +						RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
> +				.num_buffers_hard_out =
> +						RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
> +				.num_buffers_soft_out =
> +						RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
> +			}
> +		},
> +		{
> +			.type = RTE_BBDEV_OP_TURBO_ENC,
> +			.cap.turbo_enc = {
> +				.capability_flags =
> +					RTE_BBDEV_TURBO_CRC_24B_ATTACH |
> +					RTE_BBDEV_TURBO_RV_INDEX_BYPASS |
> +					RTE_BBDEV_TURBO_RATE_MATCH |
> +					RTE_BBDEV_TURBO_ENC_INTERRUPTS |
> +					RTE_BBDEV_TURBO_ENC_SCATTER_GATHER,
> +				.num_buffers_src =
> +						RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
> +				.num_buffers_dst =
> +						RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
> +			}
> +		},
> +		{
> +			.type   = RTE_BBDEV_OP_LDPC_ENC,
> +			.cap.ldpc_enc = {
> +				.capability_flags =
> +					RTE_BBDEV_LDPC_RATE_MATCH |
> +					RTE_BBDEV_LDPC_CRC_24B_ATTACH |
> +					RTE_BBDEV_LDPC_INTERLEAVER_BYPASS |
> +					RTE_BBDEV_LDPC_ENC_INTERRUPTS |
> +					RTE_BBDEV_LDPC_ENC_SCATTER_GATHER |
> +					RTE_BBDEV_LDPC_ENC_CONCATENATION,
> +				.num_buffers_src =
> +						RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
> +				.num_buffers_dst =
> +						RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
> +			}
> +		},
> +		{
> +			.type   = RTE_BBDEV_OP_LDPC_DEC,
> +			.cap.ldpc_dec = {
> +			.capability_flags =
> +				RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK |
> +				RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP |
> +				RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK |
> +				RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK |
> +				RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE |
> +				RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE |
> +				RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE |
> +				RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS |
> +				RTE_BBDEV_LDPC_DEC_SCATTER_GATHER |
> +				RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION |
> +				RTE_BBDEV_LDPC_HARQ_4BIT_COMPRESSION |
> +				RTE_BBDEV_LDPC_LLR_COMPRESSION |
> +				RTE_BBDEV_LDPC_SOFT_OUT_ENABLE |
> +				RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS |
> +				RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS |
> +				RTE_BBDEV_LDPC_DEC_INTERRUPTS,
> +			.llr_size = 8,
> +			.llr_decimals = 2,
> +			.num_buffers_src =
> +					RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
> +			.num_buffers_hard_out =
> +					RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
> +			.num_buffers_soft_out = 0,
> +			}
> +		},
>   		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
>   	};
>   
> @@ -1774,6 +1862,141 @@ vrb1_dma_desc_ld_fill(struct rte_bbdev_dec_op *op,
>   	return 0;
>   }
>   
> +/* Fill in a frame control word for LDPC decoding. */
> +static inline void
> +vrb2_fcw_ld_fill(struct rte_bbdev_dec_op *op, struct acc_fcw_ld *fcw,
> +		union acc_harq_layout_data *harq_layout)
> +{
> +	uint16_t harq_out_length, harq_in_length, ncb_p, k0_p, parity_offset;
> +	uint32_t harq_index;
> +	uint32_t l;


This is so similar with vrb1_fcw_ld_fill() that it does not make sense
to duplicate so much code.

Do you confirm there are no other difference than the SOFT_OUT stuff,
and reusing vrb2_fcw_ld_fill on VRB1 would just work as the op_flags are
checked (and they should not be set if capability is not advertized)?

> +	fcw->qm = op->ldpc_dec.q_m;
> +	fcw->nfiller = op->ldpc_dec.n_filler;
> +	fcw->BG = (op->ldpc_dec.basegraph - 1);
> +	fcw->Zc = op->ldpc_dec.z_c;
> +	fcw->ncb = op->ldpc_dec.n_cb;
> +	fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_dec.basegraph,
> +			op->ldpc_dec.rv_index);
> +	if (op->ldpc_dec.code_block_mode == RTE_BBDEV_CODE_BLOCK)
> +		fcw->rm_e = op->ldpc_dec.cb_params.e;
> +	else
> +		fcw->rm_e = (op->ldpc_dec.tb_params.r <
> +				op->ldpc_dec.tb_params.cab) ?
> +						op->ldpc_dec.tb_params.ea :
> +						op->ldpc_dec.tb_params.eb;
> +
> +	if (unlikely(check_bit(op->ldpc_dec.op_flags,
> +			RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE) &&
> +			(op->ldpc_dec.harq_combined_input.length == 0))) {
> +		rte_bbdev_log(WARNING, "Null HARQ input size provided");
> +		/* Disable HARQ input in that case to carry forward. */
> +		op->ldpc_dec.op_flags ^= RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE;
> +	}
> +	if (unlikely(fcw->rm_e == 0)) {
> +		rte_bbdev_log(WARNING, "Null E input provided");
> +		fcw->rm_e = 2;
> +	}
> +
> +	fcw->hcin_en = check_bit(op->ldpc_dec.op_flags,
> +			RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE);
> +	fcw->hcout_en = check_bit(op->ldpc_dec.op_flags,
> +			RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE);
> +	fcw->crc_select = check_bit(op->ldpc_dec.op_flags,
> +			RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK);
> +	fcw->so_en = check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_SOFT_OUT_ENABLE);
> +	fcw->so_bypass_intlv = check_bit(op->ldpc_dec.op_flags,
> +			RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS);
> +	fcw->so_bypass_rm = check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS);
> +	fcw->bypass_dec = 0;
> +	fcw->bypass_intlv = check_bit(op->ldpc_dec.op_flags,
> +			RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS);
> +	if (op->ldpc_dec.q_m == 1) {
> +		fcw->bypass_intlv = 1;
> +		fcw->qm = 2;
> +	}
> +	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION)) {
> +		fcw->hcin_decomp_mode = 1;
> +		fcw->hcout_comp_mode = 1;
> +	} else if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_HARQ_4BIT_COMPRESSION)) {
> +		fcw->hcin_decomp_mode = 4;
> +		fcw->hcout_comp_mode = 4;
> +	} else {
> +		fcw->hcin_decomp_mode = 0;
> +		fcw->hcout_comp_mode = 0;
> +	}
> +
> +	fcw->llr_pack_mode = check_bit(op->ldpc_dec.op_flags,
> +			RTE_BBDEV_LDPC_LLR_COMPRESSION);
> +	harq_index = hq_index(op->ldpc_dec.harq_combined_output.offset);
> +	if (fcw->hcin_en > 0) {
> +		harq_in_length = op->ldpc_dec.harq_combined_input.length;
> +		if (fcw->hcin_decomp_mode == 1)
> +			harq_in_length = harq_in_length * 8 / 6;
> +		else if (fcw->hcin_decomp_mode == 4)
> +			harq_in_length = harq_in_length * 2;
> +		harq_in_length = RTE_MIN(harq_in_length, op->ldpc_dec.n_cb
> +				- op->ldpc_dec.n_filler);
> +		harq_in_length = RTE_ALIGN_CEIL(harq_in_length, 64);
> +		fcw->hcin_size0 = harq_in_length;
> +		fcw->hcin_offset = 0;
> +		fcw->hcin_size1 = 0;
> +	} else {
> +		fcw->hcin_size0 = 0;
> +		fcw->hcin_offset = 0;
> +		fcw->hcin_size1 = 0;
> +	}
> +
> +	fcw->itmax = op->ldpc_dec.iter_max;
> +	fcw->so_it = op->ldpc_dec.iter_max;
> +	fcw->itstop = check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE);
> +	fcw->cnu_algo = ACC_ALGO_MSA;
> +	fcw->synd_precoder = fcw->itstop;
> +
> +	fcw->minsum_offset = 1;
> +	fcw->dec_llrclip   = 2;
> +
> +	/*
> +	 * These are all implicitly set
> +	 * fcw->synd_post = 0;
> +	 * fcw->dec_convllr = 0;
> +	 * fcw->hcout_convllr = 0;
> +	 * fcw->hcout_size1 = 0;
> +	 * fcw->hcout_offset = 0;
> +	 * fcw->negstop_th = 0;
> +	 * fcw->negstop_it = 0;
> +	 * fcw->negstop_en = 0;
> +	 * fcw->gain_i = 1;
> +	 * fcw->gain_h = 1;
> +	 */
> +	if (fcw->hcout_en > 0) {
> +		parity_offset = (op->ldpc_dec.basegraph == 1 ? 20 : 8)
> +			* op->ldpc_dec.z_c - op->ldpc_dec.n_filler;
> +		k0_p = (fcw->k0 > parity_offset) ?
> +				fcw->k0 - op->ldpc_dec.n_filler : fcw->k0;
> +		ncb_p = fcw->ncb - op->ldpc_dec.n_filler;
> +		l = k0_p + fcw->rm_e;
> +		harq_out_length = (uint16_t) fcw->hcin_size0;
> +		harq_out_length = RTE_MIN(RTE_MAX(harq_out_length, l), ncb_p);
> +		harq_out_length = RTE_ALIGN_CEIL(harq_out_length, 64);
> +		fcw->hcout_size0 = harq_out_length;
> +		fcw->hcout_size1 = 0;
> +		fcw->hcout_offset = 0;
> +		harq_layout[harq_index].offset = fcw->hcout_offset;
> +		harq_layout[harq_index].size0 = fcw->hcout_size0;
> +	} else {
> +		fcw->hcout_size0 = 0;
> +		fcw->hcout_size1 = 0;
> +		fcw->hcout_offset = 0;
> +	}
> +
> +	fcw->tb_crc_select = 0;
> +	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK))
> +		fcw->tb_crc_select = 2;
> +	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK))
> +		fcw->tb_crc_select = 1;
> +}
> +
>   static inline void
>   vrb_dma_desc_ld_update(struct rte_bbdev_dec_op *op,
>   		struct acc_dma_req_desc *desc,
> @@ -1817,6 +2040,139 @@ vrb_dma_desc_ld_update(struct rte_bbdev_dec_op *op,
>   	desc->op_addr = op;
>   }
>   
> +static inline int
> +vrb2_dma_desc_ld_fill(struct rte_bbdev_dec_op *op,
> +		struct acc_dma_req_desc *desc,
> +		struct rte_mbuf **input, struct rte_mbuf *h_output,
> +		uint32_t *in_offset, uint32_t *h_out_offset,
> +		uint32_t *h_out_length, uint32_t *mbuf_total_left,
> +		uint32_t *seg_total_left, struct acc_fcw_ld *fcw)
> +{
Same here.

I compared with vrb1_dma_desc_ld_fill(), and I don't see why we need two 
functions.

The only differences are either backed by capability checks, and vrb1 
already sets fcw->hcin_decomp_mode, so this code should work as-is on 
vrb1 if I'm not mistaken.

> +	struct rte_bbdev_op_ldpc_dec *dec = &op->ldpc_dec;
> +	int next_triplet = 1; /* FCW already done. */
> +	uint32_t input_length;
> +	uint16_t output_length, crc24_overlap = 0;
> +	uint16_t sys_cols, K, h_p_size, h_np_size;
> +
> +	acc_header_init(desc);
> +
> +	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP))
> +		crc24_overlap = 24;
> +
> +	/* Compute some LDPC BG lengths. */
> +	input_length = fcw->rm_e;
> +	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_LLR_COMPRESSION))
> +		input_length = (input_length * 3 + 3) / 4;
> +	sys_cols = (dec->basegraph == 1) ? 22 : 10;
> +	K = sys_cols * dec->z_c;
> +	output_length = K - dec->n_filler - crc24_overlap;
> +
> +	if (unlikely((*mbuf_total_left == 0) || (*mbuf_total_left < input_length))) {
> +		rte_bbdev_log(ERR,
> +				"Mismatch between mbuf length and included CB sizes: mbuf len %u, cb len %u",
> +				*mbuf_total_left, input_length);
> +		return -1;
> +	}
> +
> +	next_triplet = acc_dma_fill_blk_type_in(desc, input,
> +			in_offset, input_length,
> +			seg_total_left, next_triplet,
> +			check_bit(op->ldpc_dec.op_flags,
> +			RTE_BBDEV_LDPC_DEC_SCATTER_GATHER));
> +
> +	if (unlikely(next_triplet < 0)) {
> +		rte_bbdev_log(ERR,
> +				"Mismatch between data to process and mbuf data length in bbdev_op: %p",
> +				op);
> +		return -1;
> +	}
> +
> +	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE)) {
> +		if (op->ldpc_dec.harq_combined_input.data == 0) {
> +			rte_bbdev_log(ERR, "HARQ input is not defined");
> +			return -1;
> +		}
> +		h_p_size = fcw->hcin_size0 + fcw->hcin_size1;
> +		if (fcw->hcin_decomp_mode == 1)
> +			h_p_size = (h_p_size * 3 + 3) / 4;
> +		else if (fcw->hcin_decomp_mode == 4)
> +			h_p_size = h_p_size / 2;
> +		if (op->ldpc_dec.harq_combined_input.data == 0) {
> +			rte_bbdev_log(ERR, "HARQ input is not defined");
> +			return -1;
> +		}
> +		acc_dma_fill_blk_type(
> +				desc,
> +				op->ldpc_dec.harq_combined_input.data,
> +				op->ldpc_dec.harq_combined_input.offset,
> +				h_p_size,
> +				next_triplet,
> +				ACC_DMA_BLKID_IN_HARQ);
> +		next_triplet++;
> +	}
> +
> +	desc->data_ptrs[next_triplet - 1].last = 1;
> +	desc->m2dlen = next_triplet;
> +	*mbuf_total_left -= input_length;
> +
> +	next_triplet = acc_dma_fill_blk_type(desc, h_output,
> +			*h_out_offset, output_length >> 3, next_triplet,
> +			ACC_DMA_BLKID_OUT_HARD);
> +
> +	if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_SOFT_OUT_ENABLE)) {
> +		if (op->ldpc_dec.soft_output.data == 0) {
> +			rte_bbdev_log(ERR, "Soft output is not defined");
> +			return -1;
> +		}
> +		dec->soft_output.length = fcw->rm_e;
> +		acc_dma_fill_blk_type(desc, dec->soft_output.data, dec->soft_output.offset,
> +				fcw->rm_e, next_triplet, ACC_DMA_BLKID_OUT_SOFT);
> +		next_triplet++;
> +	}
> +
> +	if (check_bit(op->ldpc_dec.op_flags,
> +				RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE)) {
> +		if (op->ldpc_dec.harq_combined_output.data == 0) {
> +			rte_bbdev_log(ERR, "HARQ output is not defined");
> +			return -1;
> +		}
> +
> +		/* Pruned size of the HARQ */
> +		h_p_size = fcw->hcout_size0 + fcw->hcout_size1;
> +		/* Non-Pruned size of the HARQ */
> +		h_np_size = fcw->hcout_offset > 0 ?
> +				fcw->hcout_offset + fcw->hcout_size1 :
> +				h_p_size;
> +		if (fcw->hcin_decomp_mode == 1) {
> +			h_np_size = (h_np_size * 3 + 3) / 4;
> +			h_p_size = (h_p_size * 3 + 3) / 4;
> +		} else if (fcw->hcin_decomp_mode == 4) {
> +			h_np_size = h_np_size / 2;
> +			h_p_size = h_p_size / 2;
> +		}
> +		dec->harq_combined_output.length = h_np_size;
> +		acc_dma_fill_blk_type(
> +				desc,
> +				dec->harq_combined_output.data,
> +				dec->harq_combined_output.offset,
> +				h_p_size,
> +				next_triplet,
> +				ACC_DMA_BLKID_OUT_HARQ);
> +
> +		next_triplet++;
> +	}
> +
> +	*h_out_length = output_length >> 3;
> +	dec->hard_output.length += *h_out_length;
> +	*h_out_offset += *h_out_length;
> +	desc->data_ptrs[next_triplet - 1].last = 1;
> +	desc->d2mlen = next_triplet - desc->m2dlen;
> +
> +	desc->op_addr = op;
> +
> +	return 0;
> +}
> +
>   /* Enqueue one encode operations for device in CB mode. */
>   static inline int
>   enqueue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op *op,
> @@ -1877,6 +2233,7 @@ enqueue_ldpc_enc_n_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ops,
>   	/** This could be done at polling. */
>   	acc_header_init(&desc->req);
>   	desc->req.numCBs = num;
> +	desc->req.dltb = 0;
>   
>   	in_length_in_bytes = ops[0]->ldpc_enc.input.data->data_len;
>   	out_length = (enc->cb_params.e + 7) >> 3;
> @@ -2102,6 +2459,105 @@ vrb1_enqueue_ldpc_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op *op
>   	return return_descs;
>   }
>   
> +/* Fill in a frame control word for LDPC encoding. */
> +static inline void
> +vrb2_fcw_letb_fill(const struct rte_bbdev_enc_op *op, struct acc_fcw_le *fcw)
> +{
> +	fcw->qm = op->ldpc_enc.q_m;
> +	fcw->nfiller = op->ldpc_enc.n_filler;
> +	fcw->BG = (op->ldpc_enc.basegraph - 1);
> +	fcw->Zc = op->ldpc_enc.z_c;
> +	fcw->ncb = op->ldpc_enc.n_cb;
> +	fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_enc.basegraph,
> +			op->ldpc_enc.rv_index);
> +	fcw->rm_e = op->ldpc_enc.tb_params.ea;
> +	fcw->rm_e_b = op->ldpc_enc.tb_params.eb;
> +	fcw->crc_select = check_bit(op->ldpc_enc.op_flags,
> +			RTE_BBDEV_LDPC_CRC_24B_ATTACH);
> +	fcw->bypass_intlv = 0;
> +	if (op->ldpc_enc.tb_params.c > 1) {
> +		fcw->mcb_count = 0;
> +		fcw->C = op->ldpc_enc.tb_params.c;
> +		fcw->Cab = op->ldpc_enc.tb_params.cab;
> +	} else {
> +		fcw->mcb_count = 1;
> +		fcw->C = 0;
> +	}
> +}
> +
> +/* Enqueue one encode operations for device in TB mode.
> + * returns the number of descs used.
> + */
> +static inline int
> +vrb2_enqueue_ldpc_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op *op,
> +		uint16_t enq_descs)
> +{
> +	union acc_dma_desc *desc = NULL;
> +	uint32_t in_offset, out_offset, out_length, seg_total_left;
> +	struct rte_mbuf *input, *output_head, *output;
> +
> +	uint16_t desc_idx = ((q->sw_ring_head + enq_descs) & q->sw_ring_wrap_mask);
> +	desc = q->ring_addr + desc_idx;

Use acc_desc()?

> +	vrb2_fcw_letb_fill(op, &desc->req.fcw_le);
> +	struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc;
> +	int next_triplet = 1; /* FCW already done */
> +	uint32_t in_length_in_bytes;
> +	uint16_t K, in_length_in_bits;
> +
> +	input = enc->input.data;
> +	output_head = output = enc->output.data;
> +	in_offset = enc->input.offset;
> +	out_offset = enc->output.offset;
> +	seg_total_left = rte_pktmbuf_data_len(enc->input.data) - in_offset;
> +
> +	acc_header_init(&desc->req);
> +	K = (enc->basegraph == 1 ? 22 : 10) * enc->z_c;
> +	in_length_in_bits = K - enc->n_filler;
> +	if ((enc->op_flags & RTE_BBDEV_LDPC_CRC_24A_ATTACH) ||
> +			(enc->op_flags & RTE_BBDEV_LDPC_CRC_24B_ATTACH))
> +		in_length_in_bits -= 24;
> +	in_length_in_bytes = (in_length_in_bits >> 3) * enc->tb_params.c;
> +
> +	next_triplet = acc_dma_fill_blk_type_in(&desc->req, &input, &in_offset,
> +			in_length_in_bytes, &seg_total_left, next_triplet,
> +			check_bit(enc->op_flags, RTE_BBDEV_LDPC_ENC_SCATTER_GATHER));
> +	if (unlikely(next_triplet < 0)) {
> +		rte_bbdev_log(ERR,
> +				"Mismatch between data to process and mbuf data length in bbdev_op: %p",
> +				op);
> +		return -1;
> +	}
> +	desc->req.data_ptrs[next_triplet - 1].last = 1;
> +	desc->req.m2dlen = next_triplet;
> +
> +	/* Set output length */
> +	/* Integer round up division by 8 */
> +	out_length = (enc->tb_params.ea * enc->tb_params.cab +
> +			enc->tb_params.eb * (enc->tb_params.c - enc->tb_params.cab)  + 7) >> 3;
> +
> +	next_triplet = acc_dma_fill_blk_type(&desc->req, output, out_offset,
> +			out_length, next_triplet, ACC_DMA_BLKID_OUT_ENC);
> +	enc->output.length = out_length;
> +	out_offset += out_length;
> +	desc->req.data_ptrs[next_triplet - 1].last = 1;
> +	desc->req.data_ptrs[next_triplet - 1].dma_ext = 0;
> +	desc->req.d2mlen = next_triplet - desc->req.m2dlen;
> +	desc->req.numCBs = enc->tb_params.c;
> +	if (desc->req.numCBs > 1)
> +		desc->req.dltb = 1;
> +	desc->req.op_addr = op;
> +
> +	if (out_length < ACC_MAX_E_MBUF)
> +		mbuf_append(output_head, output, out_length);
> +
> +#ifdef RTE_LIBRTE_BBDEV_DEBUG
> +	rte_memdump(stderr, "FCW", &desc->req.fcw_le, sizeof(desc->req.fcw_le));
> +	rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
> +#endif
> +	/* One CB (one op) was successfully prepared to enqueue */
> +	return 1;

This function is quite different from the VRB1 variant.
Is the underlying hardware completely different, or just a different
implementation?

> +}
> +
>   /** Enqueue one decode operations for device in CB mode. */
>   static inline int
>   enqueue_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
> @@ -2215,10 +2671,16 @@ vrb_enqueue_ldpc_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
>   		else
>   			seg_total_left = fcw->rm_e;
>   
> -		ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input, h_output,
> -				&in_offset, &h_out_offset,
> -				&h_out_length, &mbuf_total_left,
> -				&seg_total_left, fcw);
> +		if (q->d->device_variant == VRB1_VARIANT)
> +			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input, h_output,
> +					&in_offset, &h_out_offset,
> +					&h_out_length, &mbuf_total_left,
> +					&seg_total_left, fcw);
> +		else
> +			ret = vrb2_dma_desc_ld_fill(op, &desc->req, &input, h_output,
> +					&in_offset, &h_out_offset,
> +					&h_out_length, &mbuf_total_left,
> +					&seg_total_left, fcw);
>   		if (unlikely(ret < 0))
>   			return ret;
>   	}
> @@ -2308,11 +2770,18 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
>   		rte_memcpy(&desc->req.fcw_ld, &desc_first->req.fcw_ld, ACC_FCW_LD_BLEN);
>   		desc->req.fcw_ld.tb_trailer_size = (c - r - 1) * trail_len;
>   
> -		ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
> -				h_output, &in_offset, &h_out_offset,
> -				&h_out_length,
> -				&mbuf_total_left, &seg_total_left,
> -				&desc->req.fcw_ld);
> +		if (q->d->device_variant == VRB1_VARIANT)
> +			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
> +					h_output, &in_offset, &h_out_offset,
> +					&h_out_length,
> +					&mbuf_total_left, &seg_total_left,
> +					&desc->req.fcw_ld);
> +		else
> +			ret = vrb2_dma_desc_ld_fill(op, &desc->req, &input,
> +					h_output, &in_offset, &h_out_offset,
> +					&h_out_length,
> +					&mbuf_total_left, &seg_total_left,
> +					&desc->req.fcw_ld);
>   
>   		if (unlikely(ret < 0))
>   			return ret;
> @@ -2576,14 +3045,22 @@ vrb_enqueue_ldpc_enc_tb(struct rte_bbdev_queue_data *q_data,
>   	int descs_used;
>   
>   	for (i = 0; i < num; ++i) {
> -		cbs_in_tb = get_num_cbs_in_tb_ldpc_enc(&ops[i]->ldpc_enc);
> -		/* Check if there are available space for further processing. */
> -		if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) {
> -			acc_enqueue_ring_full(q_data);
> -			break;
> +		if (q->d->device_variant == VRB1_VARIANT) {
> +			cbs_in_tb = get_num_cbs_in_tb_ldpc_enc(&ops[i]->ldpc_enc);
> +			/* Check if there are available space for further processing. */
> +			if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) {
> +				acc_enqueue_ring_full(q_data);
> +				break;
> +			}
> +			descs_used = vrb1_enqueue_ldpc_enc_one_op_tb(q, ops[i],
> +					enqueued_descs, cbs_in_tb);
> +		} else {
> +			if (unlikely(avail < 1)) {
> +				acc_enqueue_ring_full(q_data);
> +				break;
> +			}
> +			descs_used = vrb2_enqueue_ldpc_enc_one_op_tb(q, ops[i], enqueued_descs);
>   		}
> -
> -		descs_used = vrb1_enqueue_ldpc_enc_one_op_tb(q, ops[i], enqueued_descs, cbs_in_tb);
>   		if (descs_used < 0) {
>   			acc_enqueue_invalid(q_data);
>   			break;
> @@ -2865,6 +3342,52 @@ vrb_dequeue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op,
>   	return desc->req.numCBs;
>   }
>   
> +/* Dequeue one LDPC encode operations from VRB2 device in TB mode. */
> +static inline int
> +vrb2_dequeue_ldpc_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op,
> +		uint16_t *dequeued_ops, uint32_t *aq_dequeued,
> +		uint16_t *dequeued_descs)
> +{
> +	union acc_dma_desc *desc, atom_desc;
> +	union acc_dma_rsp_desc rsp;
> +	struct rte_bbdev_enc_op *op;
> +	int desc_idx = ((q->sw_ring_tail + *dequeued_descs) & q->sw_ring_wrap_mask);
> +
> +	desc = q->ring_addr + desc_idx;
> +	atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, __ATOMIC_RELAXED);
> +
> +	/* Check fdone bit. */
> +	if (!(atom_desc.rsp.val & ACC_FDONE))
> +		return -1;
> +
> +	rsp.val = atom_desc.rsp.val;
> +	rte_bbdev_log_debug("Resp. desc %p: %x", desc, rsp.val);
> +
> +	/* Dequeue. */
> +	op = desc->req.op_addr;
> +
> +	/* Clearing status, it will be set based on response. */
> +	op->status = 0;
> +	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
> +	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
> +	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
> +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
> +
> +	if (desc->req.last_desc_in_batch) {
> +		(*aq_dequeued)++;
> +		desc->req.last_desc_in_batch = 0;
> +	}
> +	desc->rsp.val = ACC_DMA_DESC_TYPE;
> +	desc->rsp.add_info_0 = 0; /* Reserved bits. */
> +	desc->rsp.add_info_1 = 0; /* Reserved bits. */
> +
> +	/* One op was successfully dequeued */
> +	ref_op[0] = op;
> +	(*dequeued_descs)++;
> +	(*dequeued_ops)++;
> +	return 1;
> +}
> +
>   /* Dequeue one LDPC encode operations from device in TB mode.
>    * That operation may cover multiple descriptors.
>    */
> @@ -3189,9 +3712,14 @@ vrb_dequeue_ldpc_enc(struct rte_bbdev_queue_data *q_data,
>   
>   	for (i = 0; i < avail; i++) {
>   		if (cbm == RTE_BBDEV_TRANSPORT_BLOCK)
> -			ret = vrb_dequeue_enc_one_op_tb(q, &ops[dequeued_ops],
> -					&dequeued_ops, &aq_dequeued,
> -					&dequeued_descs, num);
> +			if (q->d->device_variant == VRB1_VARIANT)
> +				ret = vrb_dequeue_enc_one_op_tb(q, &ops[dequeued_ops],
> +						&dequeued_ops, &aq_dequeued,
> +						&dequeued_descs, num);
> +			else
> +				ret = vrb2_dequeue_ldpc_enc_one_op_tb(q, &ops[dequeued_ops],
> +						&dequeued_ops, &aq_dequeued,
> +						&dequeued_descs);
>   		else
>   			ret = vrb_dequeue_enc_one_op_cb(q, &ops[dequeued_ops],
>   					&dequeued_ops, &aq_dequeued,
> @@ -3536,6 +4064,7 @@ vrb_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
>   	} else {
>   		d->device_variant = VRB2_VARIANT;
>   		d->queue_offset = vrb2_queue_offset;
> +		d->fcw_ld_fill = vrb2_fcw_ld_fill;
>   		d->num_qgroups = VRB2_NUM_QGRPS;
>   		d->num_aqs = VRB2_NUM_AQS;
>   		if (d->pf_device)


It looks like most (60%+) of the code in this patch could be removed if
duplication was avoided.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
  2023-09-29 16:35 ` [PATCH v3 09/12] baseband/acc: add FFT support to " Nicolas Chautru
@ 2023-10-03 14:36   ` Maxime Coquelin
  2023-10-03 18:20     ` Chautru, Nicolas
  0 siblings, 1 reply; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-03 14:36 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas



On 9/29/23 18:35, Nicolas Chautru wrote:
> Support for the FFT the processing specific to the
> VRB2 variant.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>   drivers/baseband/acc/rte_vrb_pmd.c | 132 ++++++++++++++++++++++++++++-
>   1 file changed, 128 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
> index 93add82947..ce4b90d8e7 100644
> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> @@ -903,6 +903,9 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t queue_id,
>   			ACC_FCW_LD_BLEN : (conf->op_type == RTE_BBDEV_OP_FFT ?
>   			ACC_FCW_FFT_BLEN : ACC_FCW_MLDTS_BLEN))));
>   
> +	if ((q->d->device_variant == VRB2_VARIANT) && (conf->op_type == RTE_BBDEV_OP_FFT))
> +		fcw_len = ACC_FCW_FFT_BLEN_3;
> +
>   	for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) {
>   		desc = q->ring_addr + desc_idx;
>   		desc->req.word0 = ACC_DMA_DESC_TYPE;
> @@ -1323,6 +1326,24 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
>   			.num_buffers_soft_out = 0,
>   			}
>   		},
> +		{
> +			.type	= RTE_BBDEV_OP_FFT,
> +			.cap.fft = {
> +				.capability_flags =
> +						RTE_BBDEV_FFT_WINDOWING |
> +						RTE_BBDEV_FFT_CS_ADJUSTMENT |
> +						RTE_BBDEV_FFT_DFT_BYPASS |
> +						RTE_BBDEV_FFT_IDFT_BYPASS |
> +						RTE_BBDEV_FFT_FP16_INPUT |
> +						RTE_BBDEV_FFT_FP16_OUTPUT |
> +						RTE_BBDEV_FFT_POWER_MEAS |
> +						RTE_BBDEV_FFT_WINDOWING_BYPASS,
> +				.num_buffers_src =
> +						1,
> +				.num_buffers_dst =
> +						1,
> +			}
> +		},
>   		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
>   	};
>   
> @@ -3849,6 +3870,47 @@ vrb1_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct acc_fcw_fft *fcw)
>   		fcw->bypass = 0;
>   }
>   
> +/* Fill in a frame control word for FFT processing. */
> +static inline void
> +vrb2_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct acc_fcw_fft_3 *fcw)
> +{
> +	fcw->in_frame_size = op->fft.input_sequence_size;
> +	fcw->leading_pad_size = op->fft.input_leading_padding;
> +	fcw->out_frame_size = op->fft.output_sequence_size;
> +	fcw->leading_depad_size = op->fft.output_leading_depadding;
> +	fcw->cs_window_sel = op->fft.window_index[0] +
> +			(op->fft.window_index[1] << 8) +
> +			(op->fft.window_index[2] << 16) +
> +			(op->fft.window_index[3] << 24);
> +	fcw->cs_window_sel2 = op->fft.window_index[4] +
> +			(op->fft.window_index[5] << 8);
> +	fcw->cs_enable_bmap = op->fft.cs_bitmap;
> +	fcw->num_antennas = op->fft.num_antennas_log2;
> +	fcw->idft_size = op->fft.idft_log2;
> +	fcw->dft_size = op->fft.dft_log2;
> +	fcw->cs_offset = op->fft.cs_time_adjustment;
> +	fcw->idft_shift = op->fft.idft_shift;
> +	fcw->dft_shift = op->fft.dft_shift;
> +	fcw->cs_multiplier = op->fft.ncs_reciprocal;
> +	fcw->power_shift = op->fft.power_shift; > +	fcw->exp_adj = op->fft.fp16_exp_adjust;
> +	fcw->fp16_in = check_bit(op->fft.op_flags, RTE_BBDEV_FFT_FP16_INPUT);
> +	fcw->fp16_out = check_bit(op->fft.op_flags, RTE_BBDEV_FFT_FP16_OUTPUT);
> +	fcw->power_en = check_bit(op->fft.op_flags, RTE_BBDEV_FFT_POWER_MEAS);
> +	if (check_bit(op->fft.op_flags,
> +			RTE_BBDEV_FFT_IDFT_BYPASS)) {
> +		if (check_bit(op->fft.op_flags,
> +				RTE_BBDEV_FFT_WINDOWING_BYPASS))
> +			fcw->bypass = 2;
> +		else
> +			fcw->bypass = 1;
> +	} else if (check_bit(op->fft.op_flags,
> +			RTE_BBDEV_FFT_DFT_BYPASS))
> +		fcw->bypass = 3;
> +	else
> +		fcw->bypass = 0;

The only difference I see with VRB1 are backed by corresponding op_flags
(POWER & FP16), is that correct? If so, it does not make sense to me to
have a specific fucntion for VRB2.

> +}
> +
>   static inline int
>   vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
>   		struct acc_dma_req_desc *desc,
> @@ -3882,6 +3944,58 @@ vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
>   	return 0;
>   }
>   
> +static inline int
> +vrb2_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
> +		struct acc_dma_req_desc *desc,
> +		struct rte_mbuf *input, struct rte_mbuf *output, struct rte_mbuf *win_input,
> +		struct rte_mbuf *pwr, uint32_t *in_offset, uint32_t *out_offset,
> +		uint32_t *win_offset, uint32_t *pwr_offset)
> +{
> +	bool pwr_en = check_bit(op->fft.op_flags, RTE_BBDEV_FFT_POWER_MEAS);
> +	bool win_en = check_bit(op->fft.op_flags, RTE_BBDEV_FFT_DEWINDOWING);
> +	int num_cs = 0, i, bd_idx = 1;
> +
> +	/* FCW already done */
> +	acc_header_init(desc);
> +
> +	RTE_SET_USED(win_input);
> +	RTE_SET_USED(win_offset);
> +
> +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(input, *in_offset);
> +	desc->data_ptrs[bd_idx].blen = op->fft.input_sequence_size * ACC_IQ_SIZE;
> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_IN;
> +	desc->data_ptrs[bd_idx].last = 1;
> +	desc->data_ptrs[bd_idx].dma_ext = 0;
> +	bd_idx++;
> +
> +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(output, *out_offset);
> +	desc->data_ptrs[bd_idx].blen = op->fft.output_sequence_size * ACC_IQ_SIZE;
> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_HARD;
> +	desc->data_ptrs[bd_idx].last = pwr_en ? 0 : 1;
> +	desc->data_ptrs[bd_idx].dma_ext = 0;
> +	desc->m2dlen = win_en ? 3 : 2;
> +	desc->d2mlen = pwr_en ? 2 : 1;
> +	desc->ib_ant_offset = op->fft.input_sequence_size;
> +	desc->num_ant = op->fft.num_antennas_log2 - 3;
> +
> +	for (i = 0; i < RTE_BBDEV_MAX_CS; i++)
> +		if (check_bit(op->fft.cs_bitmap, 1 << i))
> +			num_cs++;
> +	desc->num_cs = num_cs;
> +
> +	if (pwr_en && pwr) {
> +		bd_idx++;
> +		desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(pwr, *pwr_offset);
> +		desc->data_ptrs[bd_idx].blen = num_cs * (1 << op->fft.num_antennas_log2) * 4;
> +		desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_SOFT;
> +		desc->data_ptrs[bd_idx].last = 1;
> +		desc->data_ptrs[bd_idx].dma_ext = 0;
> +	}
> +	desc->ob_cyc_offset = op->fft.output_sequence_size;
> +	desc->ob_ant_offset = op->fft.output_sequence_size * num_cs;
> +	desc->op_addr = op;
> +	return 0;
> +}
>   
>   /** Enqueue one FFT operation for device. */
>   static inline int
> @@ -3889,22 +4003,32 @@ vrb_enqueue_fft_one_op(struct acc_queue *q, struct rte_bbdev_fft_op *op,
>   		uint16_t total_enqueued_cbs)
>   {
>   	union acc_dma_desc *desc;
> -	struct rte_mbuf *input, *output;
> -	uint32_t in_offset, out_offset;
> +	struct rte_mbuf *input, *output, *pwr, *win;
> +	uint32_t in_offset, out_offset, pwr_offset, win_offset;
>   	struct acc_fcw_fft *fcw;
>   
>   	desc = acc_desc(q, total_enqueued_cbs);
>   	input = op->fft.base_input.data;
>   	output = op->fft.base_output.data;
> +	pwr = op->fft.power_meas_output.data;
> +	win = op->fft.dewindowing_input.data;
>   	in_offset = op->fft.base_input.offset;
>   	out_offset = op->fft.base_output.offset;
> +	pwr_offset = op->fft.power_meas_output.offset;
> +	win_offset = op->fft.dewindowing_input.offset;
>   
>   	fcw = (struct acc_fcw_fft *) (q->fcw_ring +
>   			((q->sw_ring_head + total_enqueued_cbs) & q->sw_ring_wrap_mask)
>   			* ACC_MAX_FCW_SIZE);
>   
> -	vrb1_fcw_fft_fill(op, fcw);
> -	vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset, &out_offset);
> +	if (q->d->device_variant == VRB1_VARIANT) {
> +		vrb1_fcw_fft_fill(op, fcw);
> +		vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset, &out_offset);
> +	} else {
> +		vrb2_fcw_fft_fill(op, (struct acc_fcw_fft_3 *) fcw);
> +		vrb2_dma_desc_fft_fill(op, &desc->req, input, output, win, pwr,
> +				&in_offset, &out_offset, &win_offset, &pwr_offset);
> +	}
>   #ifdef RTE_LIBRTE_BBDEV_DEBUG
>   	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
>   			sizeof(desc->req.fcw_fft));


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 10/12] baseband/acc: add MLD support in VRB2 variant
  2023-09-29 16:35 ` [PATCH v3 10/12] baseband/acc: add MLD support in " Nicolas Chautru
@ 2023-10-03 15:12   ` Maxime Coquelin
  2023-10-03 18:12     ` Chautru, Nicolas
  0 siblings, 1 reply; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-03 15:12 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas



On 9/29/23 18:35, Nicolas Chautru wrote:
> Adding the capability for the MLD-TS processing specific to
> the VRB2 variant.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>   drivers/baseband/acc/rte_vrb_pmd.c | 378 +++++++++++++++++++++++++++++
>   1 file changed, 378 insertions(+)
> 
> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
> index ce4b90d8e7..a9d3db86e6 100644
> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> @@ -1344,6 +1344,17 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info *dev_info)
>   						1,
>   			}
>   		},
> +		{
> +			.type	= RTE_BBDEV_OP_MLDTS,
> +			.cap.mld = {
> +				.capability_flags =
> +						RTE_BBDEV_MLDTS_REP,
> +				.num_buffers_src =
> +						1,
> +				.num_buffers_dst =
> +						1,
> +			}
> +		},
>   		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
>   	};
>   
> @@ -4151,6 +4162,371 @@ vrb_dequeue_fft(struct rte_bbdev_queue_data *q_data,
>   	return i;
>   }
>   
> +/* Fill in a frame control word for MLD-TS processing. */
> +static inline void
> +vrb2_fcw_mldts_fill(struct rte_bbdev_mldts_op *op, struct acc_fcw_mldts *fcw)
> +{
> +	fcw->nrb = op->mldts.num_rbs;
> +	fcw->NLayers = op->mldts.num_layers - 1;
> +	fcw->Qmod0 = (op->mldts.q_m[0] >> 1) - 1;
> +	fcw->Qmod1 = (op->mldts.q_m[1] >> 1) - 1;
> +	fcw->Qmod2 = (op->mldts.q_m[2] >> 1) - 1;
> +	fcw->Qmod3 = (op->mldts.q_m[3] >> 1) - 1;
> +	/* Mark some layers as disabled */
> +	if (op->mldts.num_layers == 2) {
> +		fcw->Qmod2 = 3;
> +		fcw->Qmod3 = 3;
> +	}
> +	if (op->mldts.num_layers == 3)
> +		fcw->Qmod3 = 3;
> +	fcw->Rrep = op->mldts.r_rep;
> +	fcw->Crep = op->mldts.c_rep;
> +}
> +
> +/* Fill in descriptor for one MLD-TS processing operation. */
> +static inline int
> +vrb2_dma_desc_mldts_fill(struct rte_bbdev_mldts_op *op,
> +		struct acc_dma_req_desc *desc,
> +		struct rte_mbuf *input_q, struct rte_mbuf *input_r,
> +		struct rte_mbuf *output,
> +		uint32_t *in_offset, uint32_t *out_offset)
> +{
> +	uint16_t qsize_per_re[VRB2_MLD_LAY_SIZE] = {8, 12, 16}; /* Layer 2 to 4. */
> +	uint16_t rsize_per_re[VRB2_MLD_LAY_SIZE] = {14, 26, 42};
> +	uint16_t sc_factor_per_rrep[VRB2_MLD_RREP_SIZE] = {12, 6, 4, 3, 0, 2};
> +	uint16_t i, outsize_per_re = 0;
> +	uint32_t sc_num, r_num, q_size, r_size, out_size;
> +
> +	/* Prevent out of range access. */
> +	if (op->mldts.r_rep > 5)
> +		op->mldts.r_rep = 5;
> +	if (op->mldts.num_layers < 2)
> +		op->mldts.num_layers = 2;
> +	if (op->mldts.num_layers > 4)
> +		op->mldts.num_layers = 4;
> +	for (i = 0; i < op->mldts.num_layers; i++)
> +		outsize_per_re += op->mldts.q_m[i];
> +	sc_num = op->mldts.num_rbs * RTE_BBDEV_SCPERRB * (op->mldts.c_rep + 1);
> +	r_num = op->mldts.num_rbs * sc_factor_per_rrep[op->mldts.r_rep];
> +	q_size = qsize_per_re[op->mldts.num_layers - 2] * sc_num;
> +	r_size = rsize_per_re[op->mldts.num_layers - 2] * r_num;
> +	out_size =  sc_num * outsize_per_re;
> +	/* printf("Sc %d R num %d Size %d %d %d\n", sc_num, r_num, q_size, r_size, out_size); */

rte_bbdev_log_debug()? Otherwise just remove it.

> +
> +	/* FCW already done. */
> +	acc_header_init(desc);
> +	desc->data_ptrs[1].address = rte_pktmbuf_iova_offset(input_q, *in_offset);
> +	desc->data_ptrs[1].blen = q_size;
> +	desc->data_ptrs[1].blkid = ACC_DMA_BLKID_IN;
> +	desc->data_ptrs[1].last = 0;
> +	desc->data_ptrs[1].dma_ext = 0;
> +	desc->data_ptrs[2].address = rte_pktmbuf_iova_offset(input_r, *in_offset);
> +	desc->data_ptrs[2].blen = r_size;
> +	desc->data_ptrs[2].blkid = ACC_DMA_BLKID_IN_MLD_R;
> +	desc->data_ptrs[2].last = 1;
> +	desc->data_ptrs[2].dma_ext = 0;
> +	desc->data_ptrs[3].address = rte_pktmbuf_iova_offset(output, *out_offset);
> +	desc->data_ptrs[3].blen = out_size;
> +	desc->data_ptrs[3].blkid = ACC_DMA_BLKID_OUT_HARD;
> +	desc->data_ptrs[3].last = 1;
> +	desc->data_ptrs[3].dma_ext = 0;
> +	desc->m2dlen = 3;
> +	desc->d2mlen = 1;
> +	desc->op_addr = op;
> +	desc->cbs_in_tb = 1;
> +
> +	return 0;
> +}
> +
> +/* Check whether the MLD operation can be processed as a single operation. */
> +static inline bool
> +vrb2_check_mld_r_constraint(struct rte_bbdev_mldts_op *op) {
> +	uint8_t layer_idx, rrep_idx;
> +	uint16_t max_rb[VRB2_MLD_LAY_SIZE][VRB2_MLD_RREP_SIZE] = {
> +			{188, 275, 275, 275, 0, 275},
> +			{101, 202, 275, 275, 0, 275},
> +			{62, 124, 186, 248, 0, 275} };
> +
> +	if (op->mldts.c_rep == 0)
> +		return true;
> +
> +	layer_idx = RTE_MIN(op->mldts.num_layers - VRB2_MLD_MIN_LAYER,
> +			VRB2_MLD_MAX_LAYER - VRB2_MLD_MIN_LAYER);
> +	rrep_idx = RTE_MIN(op->mldts.r_rep, VRB2_MLD_MAX_RREP);
> +	rte_bbdev_log_debug("RB %d index %d %d max %d\n", op->mldts.num_rbs, layer_idx, rrep_idx,
> +			max_rb[layer_idx][rrep_idx]);
> +
> +	return (op->mldts.num_rbs <= max_rb[layer_idx][rrep_idx]);
> +}
> +
> +/** Enqueue MLDTS operation split across symbols. */
> +static inline int
> +enqueue_mldts_split_op(struct acc_queue *q, struct rte_bbdev_mldts_op *op,
> +		uint16_t total_enqueued_descs)
> +{
> +	uint16_t qsize_per_re[VRB2_MLD_LAY_SIZE] = {8, 12, 16}; /* Layer 2 to 4. */
> +	uint16_t rsize_per_re[VRB2_MLD_LAY_SIZE] = {14, 26, 42};
> +	uint16_t sc_factor_per_rrep[VRB2_MLD_RREP_SIZE] = {12, 6, 4, 3, 0, 2};
> +	uint32_t i, outsize_per_re = 0, sc_num, r_num, q_size, r_size, out_size, num_syms;
> +	union acc_dma_desc *desc, *first_desc;
> +	uint16_t desc_idx, symb;
> +	struct rte_mbuf *input_q, *input_r, *output;
> +	uint32_t in_offset, out_offset;
> +	struct acc_fcw_mldts *fcw;
> +
> +	desc_idx = ((q->sw_ring_head + total_enqueued_descs) & q->sw_ring_wrap_mask);
> +	first_desc = q->ring_addr + desc_idx;

acc_desc()?

> +	input_q = op->mldts.qhy_input.data;
> +	input_r = op->mldts.r_input.data;
> +	output = op->mldts.output.data;
> +	in_offset = op->mldts.qhy_input.offset;
> +	out_offset = op->mldts.output.offset;
> +	num_syms = op->mldts.c_rep + 1;
> +	fcw = &first_desc->req.fcw_mldts;
> +	vrb2_fcw_mldts_fill(op, fcw);
> +	fcw->Crep = 0; /* C rep forced to zero. */
> +
> +	/* Prevent out of range access. */
> +	if (op->mldts.r_rep > 5)
> +		op->mldts.r_rep = 5;
> +	if (op->mldts.num_layers < 2)
> +		op->mldts.num_layers = 2;
> +	if (op->mldts.num_layers > 4)
> +		op->mldts.num_layers = 4;
> +
> +	for (i = 0; i < op->mldts.num_layers; i++)
> +		outsize_per_re += op->mldts.q_m[i];
> +	sc_num = op->mldts.num_rbs * RTE_BBDEV_SCPERRB; /* C rep forced to zero. */
> +	r_num = op->mldts.num_rbs * sc_factor_per_rrep[op->mldts.r_rep];
> +	q_size = qsize_per_re[op->mldts.num_layers - 2] * sc_num;
> +	r_size = rsize_per_re[op->mldts.num_layers - 2] * r_num;
> +	out_size =  sc_num * outsize_per_re;
> +
> +	for (symb = 0; symb < num_syms; symb++) {
> +		desc_idx = ((q->sw_ring_head + total_enqueued_descs + symb) & q->sw_ring_wrap_mask);
> +		desc = q->ring_addr + desc_idx;
> +		acc_header_init(&desc->req);
> +		if (symb == 0)
> +			desc->req.cbs_in_tb = num_syms;
> +		else
> +			rte_memcpy(&desc->req.fcw_mldts, fcw, ACC_FCW_MLDTS_BLEN);
> +		desc->req.data_ptrs[1].address = rte_pktmbuf_iova_offset(input_q, in_offset);
> +		desc->req.data_ptrs[1].blen = q_size;
> +		in_offset += q_size;
> +		desc->req.data_ptrs[1].blkid = ACC_DMA_BLKID_IN;
> +		desc->req.data_ptrs[1].last = 0;
> +		desc->req.data_ptrs[1].dma_ext = 0;
> +		desc->req.data_ptrs[2].address = rte_pktmbuf_iova_offset(input_r, 0);
> +		desc->req.data_ptrs[2].blen = r_size;
> +		desc->req.data_ptrs[2].blkid = ACC_DMA_BLKID_IN_MLD_R;
> +		desc->req.data_ptrs[2].last = 1;
> +		desc->req.data_ptrs[2].dma_ext = 0;
> +		desc->req.data_ptrs[3].address = rte_pktmbuf_iova_offset(output, out_offset);
> +		desc->req.data_ptrs[3].blen = out_size;
> +		out_offset += out_size;
> +		desc->req.data_ptrs[3].blkid = ACC_DMA_BLKID_OUT_HARD;
> +		desc->req.data_ptrs[3].last = 1;
> +		desc->req.data_ptrs[3].dma_ext = 0;
> +		desc->req.m2dlen = VRB2_MLD_M2DLEN;
> +		desc->req.d2mlen = 1;
> +		desc->req.op_addr = op;
> +
> +#ifdef RTE_LIBRTE_BBDEV_DEBUG
> +		rte_memdump(stderr, "FCW", &desc->req.fcw_mldts, sizeof(desc->req.fcw_mldts));
> +		rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
> +#endif
> +	}
> +	desc->req.sdone_enable = 0;
> +
> +	return num_syms;
> +}
> +
> +/** Enqueue one MLDTS operation. */
> +static inline int
> +enqueue_mldts_one_op(struct acc_queue *q, struct rte_bbdev_mldts_op *op,
> +		uint16_t total_enqueued_descs)
> +{
> +	union acc_dma_desc *desc;
> +	uint16_t desc_idx;
> +	struct rte_mbuf *input_q, *input_r, *output;
> +	uint32_t in_offset, out_offset;
> +	struct acc_fcw_mldts *fcw;
> +
> +	desc_idx = ((q->sw_ring_head + total_enqueued_descs) & q->sw_ring_wrap_mask);
> +	desc = q->ring_addr + desc_idx;

acc_desc()?

> +	input_q = op->mldts.qhy_input.data;
> +	input_r = op->mldts.r_input.data;
> +	output = op->mldts.output.data;
> +	in_offset = op->mldts.qhy_input.offset;
> +	out_offset = op->mldts.output.offset;
> +	fcw = &desc->req.fcw_mldts;
> +	vrb2_fcw_mldts_fill(op, fcw);
> +	vrb2_dma_desc_mldts_fill(op, &desc->req, input_q, input_r, output,
> +			&in_offset, &out_offset);
> +#ifdef RTE_LIBRTE_BBDEV_DEBUG
> +	rte_memdump(stderr, "FCW", &desc->req.fcw_mldts, sizeof(desc->req.fcw_mldts));
> +	rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
> +#endif
> +	return 1;
> +}
> +
> +/* Enqueue MLDTS operations. */
> +static uint16_t
> +vrb2_enqueue_mldts(struct rte_bbdev_queue_data *q_data,
> +		struct rte_bbdev_mldts_op **ops, uint16_t num)
> +{
> +	int32_t aq_avail, avail;
> +	struct acc_queue *q = q_data->queue_private;
> +	uint16_t i, enqueued_descs = 0, descs_in_op;
> +	int ret;
> +	bool as_one_op;
> +
> +	aq_avail = acc_aq_avail(q_data, num);
> +	if (unlikely((aq_avail <= 0) || (num == 0)))
> +		return 0;
> +	avail = acc_ring_avail_enq(q);
> +
> +	for (i = 0; i < num; ++i) {
> +		as_one_op = vrb2_check_mld_r_constraint(ops[i]);
> +		descs_in_op = as_one_op ? 1 : ops[i]->mldts.c_rep + 1;
> +
> +		/* Check if there are available space for further processing. */
> +		if (unlikely(avail < descs_in_op)) {
> +			acc_enqueue_ring_full(q_data);
> +			break;
> +		}
> +		avail -= descs_in_op;
> +
> +		if (as_one_op)
> +			ret = enqueue_mldts_one_op(q, ops[i], enqueued_descs);
> +		else
> +			ret = enqueue_mldts_split_op(q, ops[i], enqueued_descs);
> +
> +		if (ret < 0) {
> +			acc_enqueue_invalid(q_data);
> +			break;
> +		}
> +
> +		enqueued_descs += ret;
> +	}
> +
> +	if (unlikely(i == 0))
> +		return 0; /* Nothing to enqueue. */
> +
> +	acc_dma_enqueue(q, enqueued_descs, &q_data->queue_stats);
> +
> +	/* Update stats. */
> +	q_data->queue_stats.enqueued_count += i;
> +	q_data->queue_stats.enqueue_err_count += num - i;
> +	return i;
> +}
> +
> +/*
> + * Dequeue one MLDTS operation.
> + * This may have been split over multiple descriptors.
> + */
> +static inline int
> +dequeue_mldts_one_op(struct rte_bbdev_queue_data *q_data,
> +		struct acc_queue *q, struct rte_bbdev_mldts_op **ref_op,
> +		uint16_t dequeued_ops, uint32_t *aq_dequeued)
> +{
> +	union acc_dma_desc *desc, atom_desc, *last_desc;
> +	union acc_dma_rsp_desc rsp;
> +	struct rte_bbdev_mldts_op *op;
> +	uint8_t descs_in_op, i;
> +
> +	desc = acc_desc_tail(q, dequeued_ops);
> +	atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, __ATOMIC_RELAXED);
> +
> +	/* Check fdone bit. */
> +	if (!(atom_desc.rsp.val & ACC_FDONE))
> +		return -1;
> +
> +	descs_in_op = desc->req.cbs_in_tb;
> +	if (descs_in_op > 1) {
> +		/* Get last CB. */
> +		last_desc = q->ring_addr + ((q->sw_ring_tail + dequeued_ops + descs_in_op - 1)
> +				& q->sw_ring_wrap_mask);
> +		/* Check if last op is ready to dequeue by checking fdone bit. If not exit. */
> +		atom_desc.atom_hdr = __atomic_load_n((uint64_t *)last_desc, __ATOMIC_RELAXED);
> +		if (!(atom_desc.rsp.val & ACC_FDONE))
> +			return -1;
> +#ifdef RTE_LIBRTE_BBDEV_DEBUG
> +		rte_memdump(stderr, "Last Resp", &last_desc->rsp.val, sizeof(desc->rsp.val));
> +#endif
> +		/* Check each operation iteratively using fdone. */
> +		for (i = 1; i < descs_in_op - 1; i++) {
> +			last_desc = q->ring_addr + ((q->sw_ring_tail + dequeued_ops + i)
> +					& q->sw_ring_wrap_mask);
> +			atom_desc.atom_hdr = __atomic_load_n((uint64_t *)last_desc,
> +					__ATOMIC_RELAXED);
> +			if (!(atom_desc.rsp.val & ACC_FDONE))
> +				return -1;
> +		}
> +	}
> +#ifdef RTE_LIBRTE_BBDEV_DEBUG
> +	rte_memdump(stderr, "Resp", &desc->rsp.val, sizeof(desc->rsp.val));
> +#endif
> +	/* Dequeue. */
> +	op = desc->req.op_addr;
> +
> +	/* Clearing status, it will be set based on response. */
> +	op->status = 0;
> +
> +	for (i = 0; i < descs_in_op; i++) {
> +		desc = q->ring_addr + ((q->sw_ring_tail + dequeued_ops + i) & q->sw_ring_wrap_mask);

acc_desc()

> +		atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc, __ATOMIC_RELAXED);
> +		rsp.val = atom_desc.rsp.val;
> +		op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
> +		op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
> +		op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
> +		op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
> +	}
> +
> +	if (op->status != 0)
> +		q_data->queue_stats.dequeue_err_count++;
> +	if (op->status & (1 << RTE_BBDEV_DRV_ERROR))
> +		vrb_check_ir(q->d);
> +
> +	/* Check if this is the last desc in batch (Atomic Queue). */
> +	if (desc->req.last_desc_in_batch) {
> +		(*aq_dequeued)++;
> +		desc->req.last_desc_in_batch = 0;
> +	}
> +	desc->rsp.val = ACC_DMA_DESC_TYPE;
> +	desc->rsp.add_info_0 = 0;
> +	*ref_op = op;

There seems to be a pattern with other ops (FFT/LDPC/...).
Maybe we should work on some refactoring. It does not have to be done in
this series.

> +	return descs_in_op;
> +}
> +
> +/* Dequeue MLDTS operations from VRB2 device. */
> +static uint16_t
> +vrb2_dequeue_mldts(struct rte_bbdev_queue_data *q_data,
> +		struct rte_bbdev_mldts_op **ops, uint16_t num)
> +{
> +	struct acc_queue *q = q_data->queue_private;
> +	uint16_t dequeue_num, i, dequeued_cbs = 0;
> +	uint32_t avail = acc_ring_avail_deq(q);
> +	uint32_t aq_dequeued = 0;
> +	int ret;
> +
> +	dequeue_num = RTE_MIN(avail, num);
> +
> +	for (i = 0; i < dequeue_num; ++i) {
> +		ret = dequeue_mldts_one_op(q_data, q, &ops[i], dequeued_cbs, &aq_dequeued);
> +		if (ret <= 0)
> +			break;
> +		dequeued_cbs += ret;
> +	}
> +
> +	q->aq_dequeued += aq_dequeued;
> +	q->sw_ring_tail += dequeued_cbs;
> +	/* Update enqueue stats. */
> +	q_data->queue_stats.dequeued_count += i;
> +	return i;
> +}
> +
>   /* Initialization Function */
>   static void
>   vrb_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
> @@ -4169,6 +4545,8 @@ vrb_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
>   	dev->dequeue_ldpc_dec_ops = vrb_dequeue_ldpc_dec;
>   	dev->enqueue_fft_ops = vrb_enqueue_fft;
>   	dev->dequeue_fft_ops = vrb_dequeue_fft;
> +	dev->enqueue_mldts_ops = vrb2_enqueue_mldts;
> +	dev->dequeue_mldts_ops = vrb2_dequeue_mldts;
>   
>   	d->pf_device = !strcmp(drv->driver.name, RTE_STR(VRB_PF_DRIVER_NAME));
>   	d->mmio_base = pci_dev->mem_resource[0].addr;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 11/12] baseband/acc: add support for VRB2 engine error detection
  2023-09-29 16:35 ` [PATCH v3 11/12] baseband/acc: add support for VRB2 engine error detection Nicolas Chautru
@ 2023-10-03 15:16   ` Maxime Coquelin
  2023-10-03 17:22     ` Chautru, Nicolas
  0 siblings, 1 reply; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-03 15:16 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas



On 9/29/23 18:35, Nicolas Chautru wrote:
> Adding missing incremental functionality for the VRB2
> variant. Notably detection of engine error during the
> dequeue. Minor cosmetic edits.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>   drivers/baseband/acc/rte_vrb_pmd.c  | 20 ++++++++++++--------
>   drivers/baseband/acc/vrb1_pf_enum.h | 17 ++++++++++++-----
>   2 files changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
> index a9d3db86e6..3eb1a380fc 100644
> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> @@ -1504,6 +1504,7 @@ vrb_fcw_td_fill(const struct rte_bbdev_dec_op *op, struct acc_fcw_td *fcw)
>   				fcw->ea = op->turbo_dec.cb_params.e;
>   				fcw->eb = op->turbo_dec.cb_params.e;
>   			}
> +
>   			if (op->turbo_dec.rv_index == 0)
>   				fcw->k0_start_col = ACC_FCW_TD_RVIDX_0;
>   			else if (op->turbo_dec.rv_index == 1)
> @@ -2304,7 +2305,7 @@ enqueue_ldpc_enc_n_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ops,
>   	return num;
>   }
>   
> -/* Enqueue one encode operations for device for a partial TB
> +/* Enqueue one encode operations for VRB1 device for a partial TB
>    * all codes blocks have same configuration multiplexed on the same descriptor.
>    */
>   static inline void
> @@ -2649,7 +2650,7 @@ enqueue_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
>   	return 1;
>   }
>   
> -/** Enqueue one decode operations for device in CB mode */
> +/** Enqueue one decode operations for device in CB mode. */
>   static inline int
>   vrb_enqueue_ldpc_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
>   		uint16_t total_enqueued_cbs, bool same_op)
> @@ -2801,7 +2802,6 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op *op,
>   		desc->req.data_ptrs[0].blen = ACC_FCW_LD_BLEN;
>   		rte_memcpy(&desc->req.fcw_ld, &desc_first->req.fcw_ld, ACC_FCW_LD_BLEN);
>   		desc->req.fcw_ld.tb_trailer_size = (c - r - 1) * trail_len;
> -
>   		if (q->d->device_variant == VRB1_VARIANT)
>   			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
>   					h_output, &in_offset, &h_out_offset,
> @@ -3226,7 +3226,6 @@ vrb_enqueue_ldpc_dec_cb(struct rte_bbdev_queue_data *q_data,
>   			break;
>   		}
>   		avail -= 1;
> -

Is it intentionnally removed?

>   		rte_bbdev_log(INFO, "Op %d %d %d %d %d %d %d %d %d %d %d %d\n",
>   			i, ops[i]->ldpc_dec.op_flags, ops[i]->ldpc_dec.rv_index,
>   			ops[i]->ldpc_dec.iter_max, ops[i]->ldpc_dec.iter_count,
> @@ -3354,6 +3353,7 @@ vrb_dequeue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op,
>   	op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
>   	op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
>   	op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
> +	op->status |= ((rsp.engine_hung) ? (1 << RTE_BBDEV_ENGINE_ERROR) : 0);
>   
>   	if (desc->req.last_desc_in_batch) {
>   		(*aq_dequeued)++;
> @@ -3470,6 +3470,7 @@ vrb_dequeue_enc_one_op_tb(struct acc_queue *q, struct rte_bbdev_enc_op **ref_op,
>   		op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
>   		op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
>   		op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
> +		op->status |= ((rsp.engine_hung) ? (1 << RTE_BBDEV_ENGINE_ERROR) : 0);
>   
>   		if (desc->req.last_desc_in_batch) {
>   			(*aq_dequeued)++;
> @@ -3516,6 +3517,8 @@ vrb_dequeue_dec_one_op_cb(struct rte_bbdev_queue_data *q_data,
>   	op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
>   	op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
>   	op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
> +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
> +
>   	if (op->status != 0) {
>   		/* These errors are not expected. */
>   		q_data->queue_stats.dequeue_err_count++;
> @@ -3569,6 +3572,7 @@ vrb_dequeue_ldpc_dec_one_op_cb(struct rte_bbdev_queue_data *q_data,
>   	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
>   	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
>   	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
> +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
>   	if (op->status != 0)
>   		q_data->queue_stats.dequeue_err_count++;
>   
> @@ -3650,6 +3654,7 @@ vrb_dequeue_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op **ref_op,
>   		op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
>   		op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
>   		op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
> +		op->status |= ((rsp.engine_hung) ? (1 << RTE_BBDEV_ENGINE_ERROR) : 0);

It kinf of highlights the need for refactoring I suggested in previous
patch! It would have been done in one place.

>   
>   		if (check_bit(op->ldpc_dec.op_flags, RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK))
>   			tb_crc_check ^= desc->rsp.add_info_1;
> @@ -3701,7 +3706,6 @@ vrb_dequeue_enc(struct rte_bbdev_queue_data *q_data,
>   	if (avail == 0)
>   		return 0;
>   	op = acc_op_tail(q, 0);
> -
>   	cbm = op->turbo_enc.code_block_mode;
>   
>   	for (i = 0; i < avail; i++) {
> @@ -4041,9 +4045,8 @@ vrb_enqueue_fft_one_op(struct acc_queue *q, struct rte_bbdev_fft_op *op,
>   				&in_offset, &out_offset, &win_offset, &pwr_offset);
>   	}
>   #ifdef RTE_LIBRTE_BBDEV_DEBUG
> -	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
> -			sizeof(desc->req.fcw_fft));
> -	rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
> +	rte_memdump(stderr, "FCW", fcw, 128);
> +	rte_memdump(stderr, "Req Desc.", desc, 128);
>   #endif
>   	return 1;
>   }
> @@ -4116,6 +4119,7 @@ vrb_dequeue_fft_one_op(struct rte_bbdev_queue_data *q_data,
>   	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
>   	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
>   	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
> +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
>   	if (op->status != 0)
>   		q_data->queue_stats.dequeue_err_count++;
>   
> diff --git a/drivers/baseband/acc/vrb1_pf_enum.h b/drivers/baseband/acc/vrb1_pf_enum.h
> index 82a36685e9..6dc359800f 100644
> --- a/drivers/baseband/acc/vrb1_pf_enum.h
> +++ b/drivers/baseband/acc/vrb1_pf_enum.h
> @@ -98,11 +98,18 @@ enum {
>   	ACC_PF_INT_DMA_UL5G_DESC_IRQ = 8,
>   	ACC_PF_INT_DMA_DL5G_DESC_IRQ = 9,
>   	ACC_PF_INT_DMA_MLD_DESC_IRQ = 10,
> -	ACC_PF_INT_ARAM_ECC_1BIT_ERR = 11,
> -	ACC_PF_INT_PARITY_ERR = 12,
> -	ACC_PF_INT_QMGR_ERR = 13,
> -	ACC_PF_INT_INT_REQ_OVERFLOW = 14,
> -	ACC_PF_INT_APB_TIMEOUT = 15,
> +	ACC_PF_INT_ARAM_ACCESS_ERR = 11,
> +	ACC_PF_INT_ARAM_ECC_1BIT_ERR = 12,
> +	ACC_PF_INT_PARITY_ERR = 13,
> +	ACC_PF_INT_QMGR_OVERFLOW = 14,
> +	ACC_PF_INT_QMGR_ERR = 15,
> +	ACC_PF_INT_ATS_ERR = 22,
> +	ACC_PF_INT_ARAM_FUUL = 23,
> +	ACC_PF_INT_EXTRA_READ = 24,
> +	ACC_PF_INT_COMPLETION_TIMEOUT = 25,
> +	ACC_PF_INT_CORE_HANG = 26,
> +	ACC_PF_INT_DMA_HANG = 28,
> +	ACC_PF_INT_DS_HANG = 27,
>   };
>   
>   #endif /* VRB1_PF_ENUM_H */


Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 12/12] baseband/acc: add configure helper for VRB2
  2023-09-29 16:35 ` [PATCH v3 12/12] baseband/acc: add configure helper for VRB2 Nicolas Chautru
@ 2023-10-03 15:30   ` Maxime Coquelin
  0 siblings, 0 replies; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-03 15:30 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas



On 9/29/23 18:35, Nicolas Chautru wrote:
> This allows to configure the VRB2 device using a
> companion configuration function within the DPDK
> bbdev-test environment.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>   drivers/baseband/acc/acc100_pmd.h     |   2 +
>   drivers/baseband/acc/rte_acc100_pmd.c |   6 +-
>   drivers/baseband/acc/rte_vrb_pmd.c    | 321 ++++++++++++++++++++++++++
>   drivers/baseband/acc/vrb_cfg.h        |  16 ++
>   4 files changed, 344 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/baseband/acc/acc100_pmd.h b/drivers/baseband/acc/acc100_pmd.h
> index a48298650c..5a8965fa53 100644
> --- a/drivers/baseband/acc/acc100_pmd.h
> +++ b/drivers/baseband/acc/acc100_pmd.h
> @@ -34,6 +34,8 @@
>   #define ACC100_VENDOR_ID           (0x8086)
>   #define ACC100_PF_DEVICE_ID        (0x0d5c)
>   #define ACC100_VF_DEVICE_ID        (0x0d5d)
> +#define VRB1_PF_DEVICE_ID          (0x57C0)
> +#define VRB2_PF_DEVICE_ID          (0x57C2)
>   
>   /* Values used in writing to the registers */
>   #define ACC100_REG_IRQ_EN_ALL          0x1FF83FF  /* Enable all interrupts */
> diff --git a/drivers/baseband/acc/rte_acc100_pmd.c b/drivers/baseband/acc/rte_acc100_pmd.c
> index 7f8d05b5a9..699a227d13 100644
> --- a/drivers/baseband/acc/rte_acc100_pmd.c
> +++ b/drivers/baseband/acc/rte_acc100_pmd.c
> @@ -5187,6 +5187,10 @@ rte_acc_configure(const char *dev_name, struct rte_acc_conf *conf)
>   		return acc100_configure(dev_name, conf);
>   	else if (pci_dev->id.device_id == ACC101_PF_DEVICE_ID)
>   		return acc101_configure(dev_name, conf);
> -	else
> +	else if (pci_dev->id.device_id == VRB1_PF_DEVICE_ID)
>   		return vrb1_configure(dev_name, conf);
> +	else if (pci_dev->id.device_id == VRB2_PF_DEVICE_ID)
> +		return vrb2_configure(dev_name, conf);
> +
> +	return -ENXIO;
>   }
> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
> index 3eb1a380fc..d0bc74b53f 100644
> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> @@ -5052,3 +5052,324 @@ vrb1_configure(const char *dev_name, struct rte_acc_conf *conf)
>   	rte_bbdev_log_debug("PF Tip configuration complete for %s", dev_name);
>   	return 0;
>   }
> +
> +/* Initial configuration of a VRB2 device prior to running configure(). */
> +int
> +vrb2_configure(const char *dev_name, struct rte_acc_conf *conf)
> +{
> +	rte_bbdev_log(INFO, "vrb2_configure");
> +	uint32_t value, address, status;
> +	int qg_idx, template_idx, vf_idx, acc, i, aq_reg, static_allocation, numEngines;
> +	int numQgs, numQqsAcc, totalQgs;
> +	int qman_func_id[8] = {0, 2, 1, 3, 4, 5, 0, 0};
> +	struct rte_bbdev *bbdev = rte_bbdev_get_named_dev(dev_name);
> +	int rlim, alen, timestamp;
> +
> +	/* Compile time checks. */
> +	RTE_BUILD_BUG_ON(sizeof(struct acc_dma_req_desc) != 256);
> +	RTE_BUILD_BUG_ON(sizeof(union acc_dma_desc) != 256);
> +	RTE_BUILD_BUG_ON(sizeof(struct acc_fcw_td) != 24);
> +	RTE_BUILD_BUG_ON(sizeof(struct acc_fcw_te) != 32);
> +
> +	if (bbdev == NULL) {
> +		rte_bbdev_log(ERR,
> +		"Invalid dev_name (%s), or device is not yet initialised",
> +		dev_name);
> +		return -ENODEV;
> +	}
> +	struct acc_device *d = bbdev->data->dev_private;
> +
> +	/* Store configuration. */
> +	rte_memcpy(&d->acc_conf, conf, sizeof(d->acc_conf));
> +
> +	/* Explicitly releasing AXI as this may be stopped after PF FLR/BME. */
> +	address = VRB2_PfDmaAxiControl;
> +	value = 1;
> +	acc_reg_write(d, address, value);
> +
> +	/* Set the fabric mode. */
> +	address = VRB2_PfFabricM2iBufferReg;
> +	value = VRB2_FABRIC_MODE;
> +	acc_reg_write(d, address, value);
> +
> +	/* Set default descriptor signature. */
> +	address = VRB2_PfDmaDescriptorSignature;
> +	value = 0;
> +	acc_reg_write(d, address, value);
> +
> +	/* Enable the Error Detection in DMA. */
> +	value = VRB2_CFG_DMA_ERROR;
> +	address = VRB2_PfDmaErrorDetectionEn;
> +	acc_reg_write(d, address, value);
> +
> +	/* AXI Cache configuration. */
> +	value = VRB2_CFG_AXI_CACHE;
> +	address = VRB2_PfDmaAxcacheReg;
> +	acc_reg_write(d, address, value);
> +
> +	/* AXI Response configuration. */
> +	acc_reg_write(d, VRB2_PfDmaCfgRrespBresp, 0x0);
> +
> +	/* Default DMA Configuration (Qmgr Enabled) */
> +	acc_reg_write(d, VRB2_PfDmaConfig0Reg, 0);
> +	acc_reg_write(d, VRB2_PfDmaQmanenSelect, 0xFFFFFFFF);
> +	acc_reg_write(d, VRB2_PfDmaQmanen, 0);
> +
> +	/* Default RLIM/ALEN configuration. */
> +	rlim = 0;
> +	alen = 3;
> +	timestamp = 0;
> +	address = VRB2_PfDmaConfig1Reg;
> +	value = (1 << 31) + (rlim << 8) + (timestamp << 6) + alen;
> +	acc_reg_write(d, address, value);
> +
> +	/* Default FFT configuration. */
> +	for (template_idx = 0; template_idx < VRB2_FFT_NUM; template_idx++) {
> +		acc_reg_write(d, VRB2_PfFftConfig0 + template_idx * 0x1000, VRB2_FFT_CFG_0);
> +		acc_reg_write(d, VRB2_PfFftParityMask8 + template_idx * 0x1000, VRB2_FFT_ECC);
> +	}
> +
> +	/* Configure DMA Qmanager addresses. */
> +	address = VRB2_PfDmaQmgrAddrReg;
> +	value = VRB2_PfQmgrEgressQueuesTemplate;
> +	acc_reg_write(d, address, value);
> +
> +	/* ===== Qmgr Configuration ===== */
> +	/* Configuration of the AQueue Depth QMGR_GRP_0_DEPTH_LOG2 for UL. */
> +	totalQgs = conf->q_ul_4g.num_qgroups + conf->q_ul_5g.num_qgroups +
> +			conf->q_dl_4g.num_qgroups + conf->q_dl_5g.num_qgroups +
> +			conf->q_fft.num_qgroups + conf->q_mld.num_qgroups;
> +	for (qg_idx = 0; qg_idx < VRB2_NUM_QGRPS; qg_idx++) {
> +		address = VRB2_PfQmgrDepthLog2Grp + ACC_BYTES_IN_WORD * qg_idx;
> +		value = aqDepth(qg_idx, conf);
> +		acc_reg_write(d, address, value);
> +		address = VRB2_PfQmgrTholdGrp + ACC_BYTES_IN_WORD * qg_idx;
> +		value = (1 << 16) + (1 << (aqDepth(qg_idx, conf) - 1));
> +		acc_reg_write(d, address, value);
> +	}
> +
> +	/* Template Priority in incremental order. */
> +	for (template_idx = 0; template_idx < ACC_NUM_TMPL; template_idx++) {
> +		address = VRB2_PfQmgrGrpTmplateReg0Indx + ACC_BYTES_IN_WORD * template_idx;
> +		value = ACC_TMPL_PRI_0;
> +		acc_reg_write(d, address, value);
> +		address = VRB2_PfQmgrGrpTmplateReg1Indx + ACC_BYTES_IN_WORD * template_idx;
> +		value = ACC_TMPL_PRI_1;
> +		acc_reg_write(d, address, value);
> +		address = VRB2_PfQmgrGrpTmplateReg2Indx + ACC_BYTES_IN_WORD * template_idx;
> +		value = ACC_TMPL_PRI_2;
> +		acc_reg_write(d, address, value);
> +		address = VRB2_PfQmgrGrpTmplateReg3Indx + ACC_BYTES_IN_WORD * template_idx;
> +		value = ACC_TMPL_PRI_3;
> +		acc_reg_write(d, address, value);
> +		address = VRB2_PfQmgrGrpTmplateReg4Indx + ACC_BYTES_IN_WORD * template_idx;
> +		value = ACC_TMPL_PRI_4;
> +		acc_reg_write(d, address, value);
> +		address = VRB2_PfQmgrGrpTmplateReg5Indx + ACC_BYTES_IN_WORD * template_idx;
> +		value = ACC_TMPL_PRI_5;
> +		acc_reg_write(d, address, value);
> +		address = VRB2_PfQmgrGrpTmplateReg6Indx + ACC_BYTES_IN_WORD * template_idx;
> +		value = ACC_TMPL_PRI_6;
> +		acc_reg_write(d, address, value);
> +		address = VRB2_PfQmgrGrpTmplateReg7Indx + ACC_BYTES_IN_WORD * template_idx;
> +		value = ACC_TMPL_PRI_7;
> +		acc_reg_write(d, address, value);
> +	}
> +
> +	address = VRB2_PfQmgrGrpPriority;
> +	value = VRB2_CFG_QMGR_HI_P;
> +	acc_reg_write(d, address, value);
> +
> +	/* Template Configuration. */
> +	for (template_idx = 0; template_idx < ACC_NUM_TMPL; template_idx++) {
> +		value = 0;
> +		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
> +		acc_reg_write(d, address, value);
> +	}
> +	/* 4GUL */
> +	numQgs = conf->q_ul_4g.num_qgroups;
> +	numQqsAcc = 0;
> +	value = 0;
> +	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
> +		value |= (1 << qg_idx);
> +	for (template_idx = VRB2_SIG_UL_4G; template_idx <= VRB2_SIG_UL_4G_LAST;
> +			template_idx++) {
> +		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
> +		acc_reg_write(d, address, value);
> +	}
> +	/* 5GUL */
> +	numQqsAcc += numQgs;
> +	numQgs = conf->q_ul_5g.num_qgroups;
> +	value = 0;
> +	numEngines = 0;
> +	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
> +		value |= (1 << qg_idx);
> +	for (template_idx = VRB2_SIG_UL_5G; template_idx <= VRB2_SIG_UL_5G_LAST;
> +			template_idx++) {
> +		/* Check engine power-on status. */
> +		address = VRB2_PfFecUl5gIbDebug0Reg + ACC_ENGINE_OFFSET * template_idx;
> +		status = (acc_reg_read(d, address) >> 4) & 0x7;
> +		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
> +		if (status == 1) {
> +			acc_reg_write(d, address, value);
> +			numEngines++;
> +		} else
> +			acc_reg_write(d, address, 0);
> +	}
> +	rte_bbdev_log(INFO, "Number of 5GUL engines %d", numEngines);
> +	/* 4GDL */
> +	numQqsAcc += numQgs;
> +	numQgs	= conf->q_dl_4g.num_qgroups;
> +	value = 0;
> +	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
> +		value |= (1 << qg_idx);
> +	for (template_idx = VRB2_SIG_DL_4G; template_idx <= VRB2_SIG_DL_4G_LAST;
> +			template_idx++) {
> +		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
> +		acc_reg_write(d, address, value);
> +	}
> +	/* 5GDL */
> +	numQqsAcc += numQgs;
> +	numQgs	= conf->q_dl_5g.num_qgroups;
> +	value = 0;
> +	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
> +		value |= (1 << qg_idx);
> +	for (template_idx = VRB2_SIG_DL_5G; template_idx <= VRB2_SIG_DL_5G_LAST;
> +			template_idx++) {
> +		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
> +		acc_reg_write(d, address, value);
> +	}
> +	/* FFT */
> +	numQqsAcc += numQgs;
> +	numQgs	= conf->q_fft.num_qgroups;
> +	value = 0;
> +	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
> +		value |= (1 << qg_idx);
> +	for (template_idx = VRB2_SIG_FFT; template_idx <= VRB2_SIG_FFT_LAST;
> +			template_idx++) {
> +		address = VRB2_PfQmgrGrpTmplateEnRegIndx + ACC_BYTES_IN_WORD * template_idx;
> +		acc_reg_write(d, address, value);
> +	}
> +	/* MLD */
> +	numQqsAcc += numQgs;
> +	numQgs	= conf->q_mld.num_qgroups;
> +	value = 0;
> +	for (qg_idx = numQqsAcc; qg_idx < (numQgs + numQqsAcc); qg_idx++)
> +		value |= (1 << qg_idx);
> +	for (template_idx = VRB2_SIG_MLD; template_idx <= VRB2_SIG_MLD_LAST;
> +			template_idx++) {
> +		address = VRB2_PfQmgrGrpTmplateEnRegIndx
> +				+ ACC_BYTES_IN_WORD * template_idx;
> +		acc_reg_write(d, address, value);
> +	}
> +
> +	/* Queue Group Function mapping. */
> +	for (i = 0; i < 4; i++) {
> +		value = 0;
> +		for (qg_idx = 0; qg_idx < ACC_NUM_QGRPS_PER_WORD; qg_idx++) {
> +			acc = accFromQgid(qg_idx + i * ACC_NUM_QGRPS_PER_WORD, conf);
> +			value |= qman_func_id[acc] << (qg_idx * 4);
> +		}
> +		acc_reg_write(d, VRB2_PfQmgrGrpFunction0 + i * ACC_BYTES_IN_WORD, value);
> +	}
> +
> +	/* Configuration of the Arbitration QGroup depth to 1. */
> +	for (qg_idx = 0; qg_idx < VRB2_NUM_QGRPS; qg_idx++) {
> +		address = VRB2_PfQmgrArbQDepthGrp + ACC_BYTES_IN_WORD * qg_idx;
> +		value = 0;
> +		acc_reg_write(d, address, value);
> +	}
> +
> +	static_allocation = 1;
> +	if (static_allocation == 1) {
> +		/* This pointer to ARAM (512kB) is shifted by 2 (4B per register). */
> +		uint32_t aram_address = 0;
> +		for (qg_idx = 0; qg_idx < totalQgs; qg_idx++) {
> +			for (vf_idx = 0; vf_idx < conf->num_vf_bundles; vf_idx++) {
> +				address = VRB2_PfQmgrVfBaseAddr + vf_idx
> +						* ACC_BYTES_IN_WORD + qg_idx
> +						* ACC_BYTES_IN_WORD * 64;
> +				value = aram_address;
> +				acc_reg_fast_write(d, address, value);
> +				/* Offset ARAM Address for next memory bank  - increment of 4B. */
> +				aram_address += aqNum(qg_idx, conf) *
> +						(1 << aqDepth(qg_idx, conf));
> +			}
> +		}
> +		if (aram_address > VRB2_WORDS_IN_ARAM_SIZE) {
> +			rte_bbdev_log(ERR, "ARAM Configuration not fitting %d %d\n",
> +					aram_address, VRB2_WORDS_IN_ARAM_SIZE);
> +			return -EINVAL;
> +		}
> +	} else {
> +		/* Dynamic Qmgr allocation. */
> +		acc_reg_write(d, VRB2_PfQmgrAramAllocEn, 1);
> +		acc_reg_write(d, VRB2_PfQmgrAramAllocSetupN0, 0x1000);
> +		acc_reg_write(d, VRB2_PfQmgrAramAllocSetupN1, 0);
> +		acc_reg_write(d, VRB2_PfQmgrAramAllocSetupN2, 0);
> +		acc_reg_write(d, VRB2_PfQmgrAramAllocSetupN3, 0);
> +		acc_reg_write(d, VRB2_PfQmgrSoftReset, 1);
> +		acc_reg_write(d, VRB2_PfQmgrSoftReset, 0);
> +	}
> +
> +	/* ==== HI Configuration ==== */
> +
> +	/* No Info Ring/MSI by default. */
> +	address = VRB2_PfHiInfoRingIntWrEnRegPf;
> +	value = 0;
> +	acc_reg_write(d, address, value);
> +	address = VRB2_PfHiCfgMsiIntWrEnRegPf;
> +	value = 0xFFFFFFFF;
> +	acc_reg_write(d, address, value);
> +	/* Prevent Block on Transmit Error. */
> +	address = VRB2_PfHiBlockTransmitOnErrorEn;
> +	value = 0;
> +	acc_reg_write(d, address, value);
> +	/* Prevents to drop MSI */
> +	address = VRB2_PfHiMsiDropEnableReg;
> +	value = 0;
> +	acc_reg_write(d, address, value);
> +	/* Set the PF Mode register */
> +	address = VRB2_PfHiPfMode;
> +	value = ((conf->pf_mode_en) ? ACC_PF_VAL : 0) | 0x1F07F0;
> +	acc_reg_write(d, address, value);
> +	/* Explicitly releasing AXI after PF Mode. */
> +	acc_reg_write(d, VRB2_PfDmaAxiControl, 1);
> +
> +	/* QoS overflow init. */
> +	value = 1;
> +	address = VRB2_PfQosmonAEvalOverflow0;
> +	acc_reg_write(d, address, value);
> +	address = VRB2_PfQosmonBEvalOverflow0;
> +	acc_reg_write(d, address, value);
> +
> +	/* Enabling AQueues through the Queue hierarchy. */
> +	unsigned int  en_bitmask[VRB2_AQ_REG_NUM];
> +	for (vf_idx = 0; vf_idx < VRB2_NUM_VFS; vf_idx++) {
> +		for (qg_idx = 0; qg_idx < VRB2_NUM_QGRPS; qg_idx++) {
> +			for (aq_reg = 0;  aq_reg < VRB2_AQ_REG_NUM; aq_reg++)
> +				en_bitmask[aq_reg] = 0;
> +			if (vf_idx < conf->num_vf_bundles && qg_idx < totalQgs) {
> +				for (aq_reg = 0;  aq_reg < VRB2_AQ_REG_NUM; aq_reg++) {
> +					if (aqNum(qg_idx, conf) >= 16 * (aq_reg + 1))
> +						en_bitmask[aq_reg] = 0xFFFF;
> +					else if (aqNum(qg_idx, conf) <= 16 * aq_reg)
> +						en_bitmask[aq_reg] = 0x0;
> +					else
> +						en_bitmask[aq_reg] = (1 << (aqNum(qg_idx,
> +								conf) - aq_reg * 16)) - 1;
> +				}
> +			}
> +			for (aq_reg = 0; aq_reg < VRB2_AQ_REG_NUM; aq_reg++) {
> +				address = VRB2_PfQmgrAqEnableVf + vf_idx * 16 + aq_reg * 4;
> +				value = (qg_idx << 16) + en_bitmask[aq_reg];
> +				acc_reg_fast_write(d, address, value);
> +			}
> +		}
> +	}
> +
> +	rte_bbdev_log(INFO,
> +			"VRB2 basic config complete for %s - pf_bb_config should ideally be used instead",
> +			dev_name);
> +	return 0;
> +}
> diff --git a/drivers/baseband/acc/vrb_cfg.h b/drivers/baseband/acc/vrb_cfg.h
> index e3c8902b46..79487c4e04 100644
> --- a/drivers/baseband/acc/vrb_cfg.h
> +++ b/drivers/baseband/acc/vrb_cfg.h
> @@ -29,4 +29,20 @@
>   int
>   vrb1_configure(const char *dev_name, struct rte_acc_conf *conf);
>   
> +/**
> + * Configure a VRB2 device.
> + *
> + * @param dev_name
> + *   The name of the device. This is the short form of PCI BDF, e.g. 00:01.0.
> + *   It can also be retrieved for a bbdev device from the dev_name field in the
> + *   rte_bbdev_info structure returned by rte_bbdev_info_get().
> + * @param conf
> + *   Configuration to apply to VRB2 HW.
> + *
> + * @return
> + *   Zero on success, negative value on failure.
> + */
> +int
> +vrb2_configure(const char *dev_name, struct rte_acc_conf *conf);
> +
>   #endif /* _VRB_CFG_H_ */

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 11/12] baseband/acc: add support for VRB2 engine error detection
  2023-10-03 15:16   ` Maxime Coquelin
@ 2023-10-03 17:22     ` Chautru, Nicolas
  2023-10-03 17:26       ` Maxime Coquelin
  0 siblings, 1 reply; 42+ messages in thread
From: Chautru, Nicolas @ 2023-10-03 17:22 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan

Hi Maxime, 

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, October 3, 2023 8:16 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v3 11/12] baseband/acc: add support for VRB2 engine
> error detection
> 
> 
> 
> On 9/29/23 18:35, Nicolas Chautru wrote:
> > Adding missing incremental functionality for the VRB2 variant. Notably
> > detection of engine error during the dequeue. Minor cosmetic edits.
> >
> > Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> > ---
> >   drivers/baseband/acc/rte_vrb_pmd.c  | 20 ++++++++++++--------
> >   drivers/baseband/acc/vrb1_pf_enum.h | 17 ++++++++++++-----
> >   2 files changed, 24 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
> > b/drivers/baseband/acc/rte_vrb_pmd.c
> > index a9d3db86e6..3eb1a380fc 100644
> > --- a/drivers/baseband/acc/rte_vrb_pmd.c
> > +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> > @@ -1504,6 +1504,7 @@ vrb_fcw_td_fill(const struct rte_bbdev_dec_op
> *op, struct acc_fcw_td *fcw)
> >   				fcw->ea = op->turbo_dec.cb_params.e;
> >   				fcw->eb = op->turbo_dec.cb_params.e;
> >   			}
> > +
> >   			if (op->turbo_dec.rv_index == 0)
> >   				fcw->k0_start_col = ACC_FCW_TD_RVIDX_0;
> >   			else if (op->turbo_dec.rv_index == 1) @@ -2304,7
> +2305,7 @@
> > enqueue_ldpc_enc_n_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op
> **ops,
> >   	return num;
> >   }
> >
> > -/* Enqueue one encode operations for device for a partial TB
> > +/* Enqueue one encode operations for VRB1 device for a partial TB
> >    * all codes blocks have same configuration multiplexed on the same
> descriptor.
> >    */
> >   static inline void
> > @@ -2649,7 +2650,7 @@ enqueue_dec_one_op_cb(struct acc_queue *q,
> struct rte_bbdev_dec_op *op,
> >   	return 1;
> >   }
> >
> > -/** Enqueue one decode operations for device in CB mode */
> > +/** Enqueue one decode operations for device in CB mode. */
> >   static inline int
> >   vrb_enqueue_ldpc_dec_one_op_cb(struct acc_queue *q, struct
> rte_bbdev_dec_op *op,
> >   		uint16_t total_enqueued_cbs, bool same_op) @@ -2801,7
> +2802,6 @@
> > vrb_enqueue_ldpc_dec_one_op_tb(struct acc_queue *q, struct
> rte_bbdev_dec_op *op,
> >   		desc->req.data_ptrs[0].blen = ACC_FCW_LD_BLEN;
> >   		rte_memcpy(&desc->req.fcw_ld, &desc_first->req.fcw_ld,
> ACC_FCW_LD_BLEN);
> >   		desc->req.fcw_ld.tb_trailer_size = (c - r - 1) * trail_len;
> > -
> >   		if (q->d->device_variant == VRB1_VARIANT)
> >   			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
> >   					h_output, &in_offset, &h_out_offset,
> @@ -3226,7 +3226,6 @@
> > vrb_enqueue_ldpc_dec_cb(struct rte_bbdev_queue_data *q_data,
> >   			break;
> >   		}
> >   		avail -= 1;
> > -
> 
> Is it intentionnally removed?

Cosmetic but slightly more readable. I don’t have a strong rule for these.

> 
> >   		rte_bbdev_log(INFO, "Op %d %d %d %d %d %d %d %d %d %d
> %d %d\n",
> >   			i, ops[i]->ldpc_dec.op_flags, ops[i]-
> >ldpc_dec.rv_index,
> >   			ops[i]->ldpc_dec.iter_max, ops[i]-
> >ldpc_dec.iter_count, @@
> > -3354,6 +3353,7 @@ vrb_dequeue_enc_one_op_cb(struct acc_queue *q,
> struct rte_bbdev_enc_op **ref_op,
> >   	op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
> >   	op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
> >   	op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
> > +	op->status |= ((rsp.engine_hung) ? (1 << RTE_BBDEV_ENGINE_ERROR)
> :
> > +0);
> >
> >   	if (desc->req.last_desc_in_batch) {
> >   		(*aq_dequeued)++;
> > @@ -3470,6 +3470,7 @@ vrb_dequeue_enc_one_op_tb(struct acc_queue
> *q, struct rte_bbdev_enc_op **ref_op,
> >   		op->status |= ((rsp.input_err) ? (1 <<
> RTE_BBDEV_DATA_ERROR) : 0);
> >   		op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR)
> : 0);
> >   		op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) :
> 0);
> > +		op->status |= ((rsp.engine_hung) ? (1 <<
> RTE_BBDEV_ENGINE_ERROR) :
> > +0);
> >
> >   		if (desc->req.last_desc_in_batch) {
> >   			(*aq_dequeued)++;
> > @@ -3516,6 +3517,8 @@ vrb_dequeue_dec_one_op_cb(struct
> rte_bbdev_queue_data *q_data,
> >   	op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
> >   	op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
> >   	op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
> > +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
> > +
> >   	if (op->status != 0) {
> >   		/* These errors are not expected. */
> >   		q_data->queue_stats.dequeue_err_count++;
> > @@ -3569,6 +3572,7 @@ vrb_dequeue_ldpc_dec_one_op_cb(struct
> rte_bbdev_queue_data *q_data,
> >   	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
> >   	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
> >   	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
> > +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
> >   	if (op->status != 0)
> >   		q_data->queue_stats.dequeue_err_count++;
> >
> > @@ -3650,6 +3654,7 @@ vrb_dequeue_dec_one_op_tb(struct acc_queue
> *q, struct rte_bbdev_dec_op **ref_op,
> >   		op->status |= ((rsp.input_err) ? (1 <<
> RTE_BBDEV_DATA_ERROR) : 0);
> >   		op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR)
> : 0);
> >   		op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) :
> 0);
> > +		op->status |= ((rsp.engine_hung) ? (1 <<
> RTE_BBDEV_ENGINE_ERROR) :
> > +0);
> 
> It kinf of highlights the need for refactoring I suggested in previous patch! It
> would have been done in one place.

That is fair, some of the logic is fairly common indeed.
I would create now an internal ticket to refactor some of this for next release. Thanks. 

> 
> >
> >   		if (check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK))
> >   			tb_crc_check ^= desc->rsp.add_info_1; @@ -3701,7
> +3706,6 @@
> > vrb_dequeue_enc(struct rte_bbdev_queue_data *q_data,
> >   	if (avail == 0)
> >   		return 0;
> >   	op = acc_op_tail(q, 0);
> > -
> >   	cbm = op->turbo_enc.code_block_mode;
> >
> >   	for (i = 0; i < avail; i++) {
> > @@ -4041,9 +4045,8 @@ vrb_enqueue_fft_one_op(struct acc_queue *q,
> struct rte_bbdev_fft_op *op,
> >   				&in_offset, &out_offset, &win_offset,
> &pwr_offset);
> >   	}
> >   #ifdef RTE_LIBRTE_BBDEV_DEBUG
> > -	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
> > -			sizeof(desc->req.fcw_fft));
> > -	rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
> > +	rte_memdump(stderr, "FCW", fcw, 128);
> > +	rte_memdump(stderr, "Req Desc.", desc, 128);
> >   #endif
> >   	return 1;
> >   }
> > @@ -4116,6 +4119,7 @@ vrb_dequeue_fft_one_op(struct
> rte_bbdev_queue_data *q_data,
> >   	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
> >   	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
> >   	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
> > +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
> >   	if (op->status != 0)
> >   		q_data->queue_stats.dequeue_err_count++;
> >
> > diff --git a/drivers/baseband/acc/vrb1_pf_enum.h
> > b/drivers/baseband/acc/vrb1_pf_enum.h
> > index 82a36685e9..6dc359800f 100644
> > --- a/drivers/baseband/acc/vrb1_pf_enum.h
> > +++ b/drivers/baseband/acc/vrb1_pf_enum.h
> > @@ -98,11 +98,18 @@ enum {
> >   	ACC_PF_INT_DMA_UL5G_DESC_IRQ = 8,
> >   	ACC_PF_INT_DMA_DL5G_DESC_IRQ = 9,
> >   	ACC_PF_INT_DMA_MLD_DESC_IRQ = 10,
> > -	ACC_PF_INT_ARAM_ECC_1BIT_ERR = 11,
> > -	ACC_PF_INT_PARITY_ERR = 12,
> > -	ACC_PF_INT_QMGR_ERR = 13,
> > -	ACC_PF_INT_INT_REQ_OVERFLOW = 14,
> > -	ACC_PF_INT_APB_TIMEOUT = 15,
> > +	ACC_PF_INT_ARAM_ACCESS_ERR = 11,
> > +	ACC_PF_INT_ARAM_ECC_1BIT_ERR = 12,
> > +	ACC_PF_INT_PARITY_ERR = 13,
> > +	ACC_PF_INT_QMGR_OVERFLOW = 14,
> > +	ACC_PF_INT_QMGR_ERR = 15,
> > +	ACC_PF_INT_ATS_ERR = 22,
> > +	ACC_PF_INT_ARAM_FUUL = 23,
> > +	ACC_PF_INT_EXTRA_READ = 24,
> > +	ACC_PF_INT_COMPLETION_TIMEOUT = 25,
> > +	ACC_PF_INT_CORE_HANG = 26,
> > +	ACC_PF_INT_DMA_HANG = 28,
> > +	ACC_PF_INT_DS_HANG = 27,
> >   };
> >
> >   #endif /* VRB1_PF_ENUM_H */
> 
> 
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Thanks,
> Maxime


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 11/12] baseband/acc: add support for VRB2 engine error detection
  2023-10-03 17:22     ` Chautru, Nicolas
@ 2023-10-03 17:26       ` Maxime Coquelin
  0 siblings, 0 replies; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-03 17:26 UTC (permalink / raw)
  To: Chautru, Nicolas, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan



On 10/3/23 19:22, Chautru, Nicolas wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Tuesday, October 3, 2023 8:16 AM
>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
>> <hernan.vargas@intel.com>
>> Subject: Re: [PATCH v3 11/12] baseband/acc: add support for VRB2 engine
>> error detection
>>
>>
>>
>> On 9/29/23 18:35, Nicolas Chautru wrote:
>>> Adding missing incremental functionality for the VRB2 variant. Notably
>>> detection of engine error during the dequeue. Minor cosmetic edits.
>>>
>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
>>> ---
>>>    drivers/baseband/acc/rte_vrb_pmd.c  | 20 ++++++++++++--------
>>>    drivers/baseband/acc/vrb1_pf_enum.h | 17 ++++++++++++-----
>>>    2 files changed, 24 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
>>> b/drivers/baseband/acc/rte_vrb_pmd.c
>>> index a9d3db86e6..3eb1a380fc 100644
>>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
>>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
>>> @@ -1504,6 +1504,7 @@ vrb_fcw_td_fill(const struct rte_bbdev_dec_op
>> *op, struct acc_fcw_td *fcw)
>>>    				fcw->ea = op->turbo_dec.cb_params.e;
>>>    				fcw->eb = op->turbo_dec.cb_params.e;
>>>    			}
>>> +
>>>    			if (op->turbo_dec.rv_index == 0)
>>>    				fcw->k0_start_col = ACC_FCW_TD_RVIDX_0;
>>>    			else if (op->turbo_dec.rv_index == 1) @@ -2304,7
>> +2305,7 @@
>>> enqueue_ldpc_enc_n_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op
>> **ops,
>>>    	return num;
>>>    }
>>>
>>> -/* Enqueue one encode operations for device for a partial TB
>>> +/* Enqueue one encode operations for VRB1 device for a partial TB
>>>     * all codes blocks have same configuration multiplexed on the same
>> descriptor.
>>>     */
>>>    static inline void
>>> @@ -2649,7 +2650,7 @@ enqueue_dec_one_op_cb(struct acc_queue *q,
>> struct rte_bbdev_dec_op *op,
>>>    	return 1;
>>>    }
>>>
>>> -/** Enqueue one decode operations for device in CB mode */
>>> +/** Enqueue one decode operations for device in CB mode. */
>>>    static inline int
>>>    vrb_enqueue_ldpc_dec_one_op_cb(struct acc_queue *q, struct
>> rte_bbdev_dec_op *op,
>>>    		uint16_t total_enqueued_cbs, bool same_op) @@ -2801,7
>> +2802,6 @@
>>> vrb_enqueue_ldpc_dec_one_op_tb(struct acc_queue *q, struct
>> rte_bbdev_dec_op *op,
>>>    		desc->req.data_ptrs[0].blen = ACC_FCW_LD_BLEN;
>>>    		rte_memcpy(&desc->req.fcw_ld, &desc_first->req.fcw_ld,
>> ACC_FCW_LD_BLEN);
>>>    		desc->req.fcw_ld.tb_trailer_size = (c - r - 1) * trail_len;
>>> -
>>>    		if (q->d->device_variant == VRB1_VARIANT)
>>>    			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
>>>    					h_output, &in_offset, &h_out_offset,
>> @@ -3226,7 +3226,6 @@
>>> vrb_enqueue_ldpc_dec_cb(struct rte_bbdev_queue_data *q_data,
>>>    			break;
>>>    		}
>>>    		avail -= 1;
>>> -
>>
>> Is it intentionnally removed?
> 
> Cosmetic but slightly more readable. I don’t have a strong rule for these.

OK, if that's intentionnal that's OK to me.

>>
>>>    		rte_bbdev_log(INFO, "Op %d %d %d %d %d %d %d %d %d %d
>> %d %d\n",
>>>    			i, ops[i]->ldpc_dec.op_flags, ops[i]-
>>> ldpc_dec.rv_index,
>>>    			ops[i]->ldpc_dec.iter_max, ops[i]-
>>> ldpc_dec.iter_count, @@
>>> -3354,6 +3353,7 @@ vrb_dequeue_enc_one_op_cb(struct acc_queue *q,
>> struct rte_bbdev_enc_op **ref_op,
>>>    	op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
>>>    	op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
>>>    	op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
>>> +	op->status |= ((rsp.engine_hung) ? (1 << RTE_BBDEV_ENGINE_ERROR)
>> :
>>> +0);
>>>
>>>    	if (desc->req.last_desc_in_batch) {
>>>    		(*aq_dequeued)++;
>>> @@ -3470,6 +3470,7 @@ vrb_dequeue_enc_one_op_tb(struct acc_queue
>> *q, struct rte_bbdev_enc_op **ref_op,
>>>    		op->status |= ((rsp.input_err) ? (1 <<
>> RTE_BBDEV_DATA_ERROR) : 0);
>>>    		op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR)
>> : 0);
>>>    		op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) :
>> 0);
>>> +		op->status |= ((rsp.engine_hung) ? (1 <<
>> RTE_BBDEV_ENGINE_ERROR) :
>>> +0);
>>>
>>>    		if (desc->req.last_desc_in_batch) {
>>>    			(*aq_dequeued)++;
>>> @@ -3516,6 +3517,8 @@ vrb_dequeue_dec_one_op_cb(struct
>> rte_bbdev_queue_data *q_data,
>>>    	op->status |= ((rsp.input_err) ? (1 << RTE_BBDEV_DATA_ERROR) : 0);
>>>    	op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
>>>    	op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) : 0);
>>> +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
>>> +
>>>    	if (op->status != 0) {
>>>    		/* These errors are not expected. */
>>>    		q_data->queue_stats.dequeue_err_count++;
>>> @@ -3569,6 +3572,7 @@ vrb_dequeue_ldpc_dec_one_op_cb(struct
>> rte_bbdev_queue_data *q_data,
>>>    	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
>>>    	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
>>>    	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
>>> +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
>>>    	if (op->status != 0)
>>>    		q_data->queue_stats.dequeue_err_count++;
>>>
>>> @@ -3650,6 +3654,7 @@ vrb_dequeue_dec_one_op_tb(struct acc_queue
>> *q, struct rte_bbdev_dec_op **ref_op,
>>>    		op->status |= ((rsp.input_err) ? (1 <<
>> RTE_BBDEV_DATA_ERROR) : 0);
>>>    		op->status |= ((rsp.dma_err) ? (1 << RTE_BBDEV_DRV_ERROR)
>> : 0);
>>>    		op->status |= ((rsp.fcw_err) ? (1 << RTE_BBDEV_DRV_ERROR) :
>> 0);
>>> +		op->status |= ((rsp.engine_hung) ? (1 <<
>> RTE_BBDEV_ENGINE_ERROR) :
>>> +0);
>>
>> It kinf of highlights the need for refactoring I suggested in previous patch! It
>> would have been done in one place.
> 
> That is fair, some of the logic is fairly common indeed.
> I would create now an internal ticket to refactor some of this for next release. Thanks.

Thanks,
Maxime

>>
>>>
>>>    		if (check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK))
>>>    			tb_crc_check ^= desc->rsp.add_info_1; @@ -3701,7
>> +3706,6 @@
>>> vrb_dequeue_enc(struct rte_bbdev_queue_data *q_data,
>>>    	if (avail == 0)
>>>    		return 0;
>>>    	op = acc_op_tail(q, 0);
>>> -
>>>    	cbm = op->turbo_enc.code_block_mode;
>>>
>>>    	for (i = 0; i < avail; i++) {
>>> @@ -4041,9 +4045,8 @@ vrb_enqueue_fft_one_op(struct acc_queue *q,
>> struct rte_bbdev_fft_op *op,
>>>    				&in_offset, &out_offset, &win_offset,
>> &pwr_offset);
>>>    	}
>>>    #ifdef RTE_LIBRTE_BBDEV_DEBUG
>>> -	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
>>> -			sizeof(desc->req.fcw_fft));
>>> -	rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
>>> +	rte_memdump(stderr, "FCW", fcw, 128);
>>> +	rte_memdump(stderr, "Req Desc.", desc, 128);
>>>    #endif
>>>    	return 1;
>>>    }
>>> @@ -4116,6 +4119,7 @@ vrb_dequeue_fft_one_op(struct
>> rte_bbdev_queue_data *q_data,
>>>    	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
>>>    	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
>>>    	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
>>> +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
>>>    	if (op->status != 0)
>>>    		q_data->queue_stats.dequeue_err_count++;
>>>
>>> diff --git a/drivers/baseband/acc/vrb1_pf_enum.h
>>> b/drivers/baseband/acc/vrb1_pf_enum.h
>>> index 82a36685e9..6dc359800f 100644
>>> --- a/drivers/baseband/acc/vrb1_pf_enum.h
>>> +++ b/drivers/baseband/acc/vrb1_pf_enum.h
>>> @@ -98,11 +98,18 @@ enum {
>>>    	ACC_PF_INT_DMA_UL5G_DESC_IRQ = 8,
>>>    	ACC_PF_INT_DMA_DL5G_DESC_IRQ = 9,
>>>    	ACC_PF_INT_DMA_MLD_DESC_IRQ = 10,
>>> -	ACC_PF_INT_ARAM_ECC_1BIT_ERR = 11,
>>> -	ACC_PF_INT_PARITY_ERR = 12,
>>> -	ACC_PF_INT_QMGR_ERR = 13,
>>> -	ACC_PF_INT_INT_REQ_OVERFLOW = 14,
>>> -	ACC_PF_INT_APB_TIMEOUT = 15,
>>> +	ACC_PF_INT_ARAM_ACCESS_ERR = 11,
>>> +	ACC_PF_INT_ARAM_ECC_1BIT_ERR = 12,
>>> +	ACC_PF_INT_PARITY_ERR = 13,
>>> +	ACC_PF_INT_QMGR_OVERFLOW = 14,
>>> +	ACC_PF_INT_QMGR_ERR = 15,
>>> +	ACC_PF_INT_ATS_ERR = 22,
>>> +	ACC_PF_INT_ARAM_FUUL = 23,
>>> +	ACC_PF_INT_EXTRA_READ = 24,
>>> +	ACC_PF_INT_COMPLETION_TIMEOUT = 25,
>>> +	ACC_PF_INT_CORE_HANG = 26,
>>> +	ACC_PF_INT_DMA_HANG = 28,
>>> +	ACC_PF_INT_DS_HANG = 27,
>>>    };
>>>
>>>    #endif /* VRB1_PF_ENUM_H */
>>
>>
>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>
>> Thanks,
>> Maxime
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 10/12] baseband/acc: add MLD support in VRB2 variant
  2023-10-03 15:12   ` Maxime Coquelin
@ 2023-10-03 18:12     ` Chautru, Nicolas
  0 siblings, 0 replies; 42+ messages in thread
From: Chautru, Nicolas @ 2023-10-03 18:12 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan

Hi Maxime, 

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, October 3, 2023 8:13 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v3 10/12] baseband/acc: add MLD support in VRB2
> variant
> 
> 
> 
> On 9/29/23 18:35, Nicolas Chautru wrote:
> > Adding the capability for the MLD-TS processing specific to the VRB2
> > variant.
> >
> > Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> > ---
> >   drivers/baseband/acc/rte_vrb_pmd.c | 378
> +++++++++++++++++++++++++++++
> >   1 file changed, 378 insertions(+)
> >
> > diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
> > b/drivers/baseband/acc/rte_vrb_pmd.c
> > index ce4b90d8e7..a9d3db86e6 100644
> > --- a/drivers/baseband/acc/rte_vrb_pmd.c
> > +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> > @@ -1344,6 +1344,17 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct
> rte_bbdev_driver_info *dev_info)
> >   						1,
> >   			}
> >   		},
> > +		{
> > +			.type	= RTE_BBDEV_OP_MLDTS,
> > +			.cap.mld = {
> > +				.capability_flags =
> > +						RTE_BBDEV_MLDTS_REP,
> > +				.num_buffers_src =
> > +						1,
> > +				.num_buffers_dst =
> > +						1,
> > +			}
> > +		},
> >   		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
> >   	};
> >
> > @@ -4151,6 +4162,371 @@ vrb_dequeue_fft(struct rte_bbdev_queue_data
> *q_data,
> >   	return i;
> >   }
> >
> > +/* Fill in a frame control word for MLD-TS processing. */ static
> > +inline void vrb2_fcw_mldts_fill(struct rte_bbdev_mldts_op *op, struct
> > +acc_fcw_mldts *fcw) {
> > +	fcw->nrb = op->mldts.num_rbs;
> > +	fcw->NLayers = op->mldts.num_layers - 1;
> > +	fcw->Qmod0 = (op->mldts.q_m[0] >> 1) - 1;
> > +	fcw->Qmod1 = (op->mldts.q_m[1] >> 1) - 1;
> > +	fcw->Qmod2 = (op->mldts.q_m[2] >> 1) - 1;
> > +	fcw->Qmod3 = (op->mldts.q_m[3] >> 1) - 1;
> > +	/* Mark some layers as disabled */
> > +	if (op->mldts.num_layers == 2) {
> > +		fcw->Qmod2 = 3;
> > +		fcw->Qmod3 = 3;
> > +	}
> > +	if (op->mldts.num_layers == 3)
> > +		fcw->Qmod3 = 3;
> > +	fcw->Rrep = op->mldts.r_rep;
> > +	fcw->Crep = op->mldts.c_rep;
> > +}
> > +
> > +/* Fill in descriptor for one MLD-TS processing operation. */ static
> > +inline int vrb2_dma_desc_mldts_fill(struct rte_bbdev_mldts_op *op,
> > +		struct acc_dma_req_desc *desc,
> > +		struct rte_mbuf *input_q, struct rte_mbuf *input_r,
> > +		struct rte_mbuf *output,
> > +		uint32_t *in_offset, uint32_t *out_offset) {
> > +	uint16_t qsize_per_re[VRB2_MLD_LAY_SIZE] = {8, 12, 16}; /* Layer 2
> to 4. */
> > +	uint16_t rsize_per_re[VRB2_MLD_LAY_SIZE] = {14, 26, 42};
> > +	uint16_t sc_factor_per_rrep[VRB2_MLD_RREP_SIZE] = {12, 6, 4, 3, 0,
> 2};
> > +	uint16_t i, outsize_per_re = 0;
> > +	uint32_t sc_num, r_num, q_size, r_size, out_size;
> > +
> > +	/* Prevent out of range access. */
> > +	if (op->mldts.r_rep > 5)
> > +		op->mldts.r_rep = 5;
> > +	if (op->mldts.num_layers < 2)
> > +		op->mldts.num_layers = 2;
> > +	if (op->mldts.num_layers > 4)
> > +		op->mldts.num_layers = 4;
> > +	for (i = 0; i < op->mldts.num_layers; i++)
> > +		outsize_per_re += op->mldts.q_m[i];
> > +	sc_num = op->mldts.num_rbs * RTE_BBDEV_SCPERRB * (op-
> >mldts.c_rep + 1);
> > +	r_num = op->mldts.num_rbs * sc_factor_per_rrep[op->mldts.r_rep];
> > +	q_size = qsize_per_re[op->mldts.num_layers - 2] * sc_num;
> > +	r_size = rsize_per_re[op->mldts.num_layers - 2] * r_num;
> > +	out_size =  sc_num * outsize_per_re;
> > +	/* printf("Sc %d R num %d Size %d %d %d\n", sc_num, r_num, q_size,
> > +r_size, out_size); */
> 
> rte_bbdev_log_debug()? Otherwise just remove it.

Thanks. Removing. 

> 
> > +
> > +	/* FCW already done. */
> > +	acc_header_init(desc);
> > +	desc->data_ptrs[1].address = rte_pktmbuf_iova_offset(input_q,
> *in_offset);
> > +	desc->data_ptrs[1].blen = q_size;
> > +	desc->data_ptrs[1].blkid = ACC_DMA_BLKID_IN;
> > +	desc->data_ptrs[1].last = 0;
> > +	desc->data_ptrs[1].dma_ext = 0;
> > +	desc->data_ptrs[2].address = rte_pktmbuf_iova_offset(input_r,
> *in_offset);
> > +	desc->data_ptrs[2].blen = r_size;
> > +	desc->data_ptrs[2].blkid = ACC_DMA_BLKID_IN_MLD_R;
> > +	desc->data_ptrs[2].last = 1;
> > +	desc->data_ptrs[2].dma_ext = 0;
> > +	desc->data_ptrs[3].address = rte_pktmbuf_iova_offset(output,
> *out_offset);
> > +	desc->data_ptrs[3].blen = out_size;
> > +	desc->data_ptrs[3].blkid = ACC_DMA_BLKID_OUT_HARD;
> > +	desc->data_ptrs[3].last = 1;
> > +	desc->data_ptrs[3].dma_ext = 0;
> > +	desc->m2dlen = 3;
> > +	desc->d2mlen = 1;
> > +	desc->op_addr = op;
> > +	desc->cbs_in_tb = 1;
> > +
> > +	return 0;
> > +}
> > +
> > +/* Check whether the MLD operation can be processed as a single
> > +operation. */ static inline bool vrb2_check_mld_r_constraint(struct
> > +rte_bbdev_mldts_op *op) {
> > +	uint8_t layer_idx, rrep_idx;
> > +	uint16_t max_rb[VRB2_MLD_LAY_SIZE][VRB2_MLD_RREP_SIZE] = {
> > +			{188, 275, 275, 275, 0, 275},
> > +			{101, 202, 275, 275, 0, 275},
> > +			{62, 124, 186, 248, 0, 275} };
> > +
> > +	if (op->mldts.c_rep == 0)
> > +		return true;
> > +
> > +	layer_idx = RTE_MIN(op->mldts.num_layers -
> VRB2_MLD_MIN_LAYER,
> > +			VRB2_MLD_MAX_LAYER - VRB2_MLD_MIN_LAYER);
> > +	rrep_idx = RTE_MIN(op->mldts.r_rep, VRB2_MLD_MAX_RREP);
> > +	rte_bbdev_log_debug("RB %d index %d %d max %d\n", op-
> >mldts.num_rbs, layer_idx, rrep_idx,
> > +			max_rb[layer_idx][rrep_idx]);
> > +
> > +	return (op->mldts.num_rbs <= max_rb[layer_idx][rrep_idx]); }
> > +
> > +/** Enqueue MLDTS operation split across symbols. */ static inline
> > +int enqueue_mldts_split_op(struct acc_queue *q, struct
> > +rte_bbdev_mldts_op *op,
> > +		uint16_t total_enqueued_descs)
> > +{
> > +	uint16_t qsize_per_re[VRB2_MLD_LAY_SIZE] = {8, 12, 16}; /* Layer 2
> to 4. */
> > +	uint16_t rsize_per_re[VRB2_MLD_LAY_SIZE] = {14, 26, 42};
> > +	uint16_t sc_factor_per_rrep[VRB2_MLD_RREP_SIZE] = {12, 6, 4, 3, 0,
> 2};
> > +	uint32_t i, outsize_per_re = 0, sc_num, r_num, q_size, r_size,
> out_size, num_syms;
> > +	union acc_dma_desc *desc, *first_desc;
> > +	uint16_t desc_idx, symb;
> > +	struct rte_mbuf *input_q, *input_r, *output;
> > +	uint32_t in_offset, out_offset;
> > +	struct acc_fcw_mldts *fcw;
> > +
> > +	desc_idx = ((q->sw_ring_head + total_enqueued_descs) & q-
> >sw_ring_wrap_mask);
> > +	first_desc = q->ring_addr + desc_idx;
> 
> acc_desc()?

acc_desc_idx() here, but thanks. 

> 
> > +	input_q = op->mldts.qhy_input.data;
> > +	input_r = op->mldts.r_input.data;
> > +	output = op->mldts.output.data;
> > +	in_offset = op->mldts.qhy_input.offset;
> > +	out_offset = op->mldts.output.offset;
> > +	num_syms = op->mldts.c_rep + 1;
> > +	fcw = &first_desc->req.fcw_mldts;
> > +	vrb2_fcw_mldts_fill(op, fcw);
> > +	fcw->Crep = 0; /* C rep forced to zero. */
> > +
> > +	/* Prevent out of range access. */
> > +	if (op->mldts.r_rep > 5)
> > +		op->mldts.r_rep = 5;
> > +	if (op->mldts.num_layers < 2)
> > +		op->mldts.num_layers = 2;
> > +	if (op->mldts.num_layers > 4)
> > +		op->mldts.num_layers = 4;
> > +
> > +	for (i = 0; i < op->mldts.num_layers; i++)
> > +		outsize_per_re += op->mldts.q_m[i];
> > +	sc_num = op->mldts.num_rbs * RTE_BBDEV_SCPERRB; /* C rep forced
> to zero. */
> > +	r_num = op->mldts.num_rbs * sc_factor_per_rrep[op->mldts.r_rep];
> > +	q_size = qsize_per_re[op->mldts.num_layers - 2] * sc_num;
> > +	r_size = rsize_per_re[op->mldts.num_layers - 2] * r_num;
> > +	out_size =  sc_num * outsize_per_re;
> > +
> > +	for (symb = 0; symb < num_syms; symb++) {
> > +		desc_idx = ((q->sw_ring_head + total_enqueued_descs +
> symb) & q->sw_ring_wrap_mask);
> > +		desc = q->ring_addr + desc_idx;
> > +		acc_header_init(&desc->req);
> > +		if (symb == 0)
> > +			desc->req.cbs_in_tb = num_syms;
> > +		else
> > +			rte_memcpy(&desc->req.fcw_mldts, fcw,
> ACC_FCW_MLDTS_BLEN);
> > +		desc->req.data_ptrs[1].address =
> rte_pktmbuf_iova_offset(input_q, in_offset);
> > +		desc->req.data_ptrs[1].blen = q_size;
> > +		in_offset += q_size;
> > +		desc->req.data_ptrs[1].blkid = ACC_DMA_BLKID_IN;
> > +		desc->req.data_ptrs[1].last = 0;
> > +		desc->req.data_ptrs[1].dma_ext = 0;
> > +		desc->req.data_ptrs[2].address =
> rte_pktmbuf_iova_offset(input_r, 0);
> > +		desc->req.data_ptrs[2].blen = r_size;
> > +		desc->req.data_ptrs[2].blkid = ACC_DMA_BLKID_IN_MLD_R;
> > +		desc->req.data_ptrs[2].last = 1;
> > +		desc->req.data_ptrs[2].dma_ext = 0;
> > +		desc->req.data_ptrs[3].address =
> rte_pktmbuf_iova_offset(output, out_offset);
> > +		desc->req.data_ptrs[3].blen = out_size;
> > +		out_offset += out_size;
> > +		desc->req.data_ptrs[3].blkid = ACC_DMA_BLKID_OUT_HARD;
> > +		desc->req.data_ptrs[3].last = 1;
> > +		desc->req.data_ptrs[3].dma_ext = 0;
> > +		desc->req.m2dlen = VRB2_MLD_M2DLEN;
> > +		desc->req.d2mlen = 1;
> > +		desc->req.op_addr = op;
> > +
> > +#ifdef RTE_LIBRTE_BBDEV_DEBUG
> > +		rte_memdump(stderr, "FCW", &desc->req.fcw_mldts,
> sizeof(desc->req.fcw_mldts));
> > +		rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
> #endif
> > +	}
> > +	desc->req.sdone_enable = 0;
> > +
> > +	return num_syms;
> > +}
> > +
> > +/** Enqueue one MLDTS operation. */
> > +static inline int
> > +enqueue_mldts_one_op(struct acc_queue *q, struct rte_bbdev_mldts_op
> *op,
> > +		uint16_t total_enqueued_descs)
> > +{
> > +	union acc_dma_desc *desc;
> > +	uint16_t desc_idx;
> > +	struct rte_mbuf *input_q, *input_r, *output;
> > +	uint32_t in_offset, out_offset;
> > +	struct acc_fcw_mldts *fcw;
> > +
> > +	desc_idx = ((q->sw_ring_head + total_enqueued_descs) & q-
> >sw_ring_wrap_mask);
> > +	desc = q->ring_addr + desc_idx;
> 
> acc_desc()?

Will do now, thanks.

> 
> > +	input_q = op->mldts.qhy_input.data;
> > +	input_r = op->mldts.r_input.data;
> > +	output = op->mldts.output.data;
> > +	in_offset = op->mldts.qhy_input.offset;
> > +	out_offset = op->mldts.output.offset;
> > +	fcw = &desc->req.fcw_mldts;
> > +	vrb2_fcw_mldts_fill(op, fcw);
> > +	vrb2_dma_desc_mldts_fill(op, &desc->req, input_q, input_r, output,
> > +			&in_offset, &out_offset);
> > +#ifdef RTE_LIBRTE_BBDEV_DEBUG
> > +	rte_memdump(stderr, "FCW", &desc->req.fcw_mldts, sizeof(desc-
> >req.fcw_mldts));
> > +	rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc)); #endif
> > +	return 1;
> > +}
> > +
> > +/* Enqueue MLDTS operations. */
> > +static uint16_t
> > +vrb2_enqueue_mldts(struct rte_bbdev_queue_data *q_data,
> > +		struct rte_bbdev_mldts_op **ops, uint16_t num) {
> > +	int32_t aq_avail, avail;
> > +	struct acc_queue *q = q_data->queue_private;
> > +	uint16_t i, enqueued_descs = 0, descs_in_op;
> > +	int ret;
> > +	bool as_one_op;
> > +
> > +	aq_avail = acc_aq_avail(q_data, num);
> > +	if (unlikely((aq_avail <= 0) || (num == 0)))
> > +		return 0;
> > +	avail = acc_ring_avail_enq(q);
> > +
> > +	for (i = 0; i < num; ++i) {
> > +		as_one_op = vrb2_check_mld_r_constraint(ops[i]);
> > +		descs_in_op = as_one_op ? 1 : ops[i]->mldts.c_rep + 1;
> > +
> > +		/* Check if there are available space for further processing. */
> > +		if (unlikely(avail < descs_in_op)) {
> > +			acc_enqueue_ring_full(q_data);
> > +			break;
> > +		}
> > +		avail -= descs_in_op;
> > +
> > +		if (as_one_op)
> > +			ret = enqueue_mldts_one_op(q, ops[i],
> enqueued_descs);
> > +		else
> > +			ret = enqueue_mldts_split_op(q, ops[i],
> enqueued_descs);
> > +
> > +		if (ret < 0) {
> > +			acc_enqueue_invalid(q_data);
> > +			break;
> > +		}
> > +
> > +		enqueued_descs += ret;
> > +	}
> > +
> > +	if (unlikely(i == 0))
> > +		return 0; /* Nothing to enqueue. */
> > +
> > +	acc_dma_enqueue(q, enqueued_descs, &q_data->queue_stats);
> > +
> > +	/* Update stats. */
> > +	q_data->queue_stats.enqueued_count += i;
> > +	q_data->queue_stats.enqueue_err_count += num - i;
> > +	return i;
> > +}
> > +
> > +/*
> > + * Dequeue one MLDTS operation.
> > + * This may have been split over multiple descriptors.
> > + */
> > +static inline int
> > +dequeue_mldts_one_op(struct rte_bbdev_queue_data *q_data,
> > +		struct acc_queue *q, struct rte_bbdev_mldts_op **ref_op,
> > +		uint16_t dequeued_ops, uint32_t *aq_dequeued) {
> > +	union acc_dma_desc *desc, atom_desc, *last_desc;
> > +	union acc_dma_rsp_desc rsp;
> > +	struct rte_bbdev_mldts_op *op;
> > +	uint8_t descs_in_op, i;
> > +
> > +	desc = acc_desc_tail(q, dequeued_ops);
> > +	atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc,
> > +__ATOMIC_RELAXED);
> > +
> > +	/* Check fdone bit. */
> > +	if (!(atom_desc.rsp.val & ACC_FDONE))
> > +		return -1;
> > +
> > +	descs_in_op = desc->req.cbs_in_tb;
> > +	if (descs_in_op > 1) {
> > +		/* Get last CB. */
> > +		last_desc = q->ring_addr + ((q->sw_ring_tail + dequeued_ops
> + descs_in_op - 1)
> > +				& q->sw_ring_wrap_mask);
> > +		/* Check if last op is ready to dequeue by checking fdone bit.
> If not exit. */
> > +		atom_desc.atom_hdr = __atomic_load_n((uint64_t
> *)last_desc, __ATOMIC_RELAXED);
> > +		if (!(atom_desc.rsp.val & ACC_FDONE))
> > +			return -1;
> > +#ifdef RTE_LIBRTE_BBDEV_DEBUG
> > +		rte_memdump(stderr, "Last Resp", &last_desc->rsp.val,
> > +sizeof(desc->rsp.val)); #endif
> > +		/* Check each operation iteratively using fdone. */
> > +		for (i = 1; i < descs_in_op - 1; i++) {
> > +			last_desc = q->ring_addr + ((q->sw_ring_tail +
> dequeued_ops + i)
> > +					& q->sw_ring_wrap_mask);
> > +			atom_desc.atom_hdr = __atomic_load_n((uint64_t
> *)last_desc,
> > +					__ATOMIC_RELAXED);
> > +			if (!(atom_desc.rsp.val & ACC_FDONE))
> > +				return -1;
> > +		}
> > +	}
> > +#ifdef RTE_LIBRTE_BBDEV_DEBUG
> > +	rte_memdump(stderr, "Resp", &desc->rsp.val, sizeof(desc->rsp.val));
> > +#endif
> > +	/* Dequeue. */
> > +	op = desc->req.op_addr;
> > +
> > +	/* Clearing status, it will be set based on response. */
> > +	op->status = 0;
> > +
> > +	for (i = 0; i < descs_in_op; i++) {
> > +		desc = q->ring_addr + ((q->sw_ring_tail + dequeued_ops + i) &
> > +q->sw_ring_wrap_mask);
> 
> acc_desc()

acc_desc_tail(), fixing now as well. Thanks 


> 
> > +		atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc,
> __ATOMIC_RELAXED);
> > +		rsp.val = atom_desc.rsp.val;
> > +		op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
> > +		op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
> > +		op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
> > +		op->status |= rsp.engine_hung <<
> RTE_BBDEV_ENGINE_ERROR;
> > +	}
> > +
> > +	if (op->status != 0)
> > +		q_data->queue_stats.dequeue_err_count++;
> > +	if (op->status & (1 << RTE_BBDEV_DRV_ERROR))
> > +		vrb_check_ir(q->d);
> > +
> > +	/* Check if this is the last desc in batch (Atomic Queue). */
> > +	if (desc->req.last_desc_in_batch) {
> > +		(*aq_dequeued)++;
> > +		desc->req.last_desc_in_batch = 0;
> > +	}
> > +	desc->rsp.val = ACC_DMA_DESC_TYPE;
> > +	desc->rsp.add_info_0 = 0;
> > +	*ref_op = op;
> 
> There seems to be a pattern with other ops (FFT/LDPC/...).
> Maybe we should work on some refactoring. It does not have to be done in
> this series.

Agreed through other patch, I have a ticket for this. 

Thanks, will resolve all in v4 this week.

> 
> > +	return descs_in_op;
> > +}
> > +
> > +/* Dequeue MLDTS operations from VRB2 device. */ static uint16_t
> > +vrb2_dequeue_mldts(struct rte_bbdev_queue_data *q_data,
> > +		struct rte_bbdev_mldts_op **ops, uint16_t num) {
> > +	struct acc_queue *q = q_data->queue_private;
> > +	uint16_t dequeue_num, i, dequeued_cbs = 0;
> > +	uint32_t avail = acc_ring_avail_deq(q);
> > +	uint32_t aq_dequeued = 0;
> > +	int ret;
> > +
> > +	dequeue_num = RTE_MIN(avail, num);
> > +
> > +	for (i = 0; i < dequeue_num; ++i) {
> > +		ret = dequeue_mldts_one_op(q_data, q, &ops[i],
> dequeued_cbs, &aq_dequeued);
> > +		if (ret <= 0)
> > +			break;
> > +		dequeued_cbs += ret;
> > +	}
> > +
> > +	q->aq_dequeued += aq_dequeued;
> > +	q->sw_ring_tail += dequeued_cbs;
> > +	/* Update enqueue stats. */
> > +	q_data->queue_stats.dequeued_count += i;
> > +	return i;
> > +}
> > +
> >   /* Initialization Function */
> >   static void
> >   vrb_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv) @@
> > -4169,6 +4545,8 @@ vrb_bbdev_init(struct rte_bbdev *dev, struct
> rte_pci_driver *drv)
> >   	dev->dequeue_ldpc_dec_ops = vrb_dequeue_ldpc_dec;
> >   	dev->enqueue_fft_ops = vrb_enqueue_fft;
> >   	dev->dequeue_fft_ops = vrb_dequeue_fft;
> > +	dev->enqueue_mldts_ops = vrb2_enqueue_mldts;
> > +	dev->dequeue_mldts_ops = vrb2_dequeue_mldts;
> >
> >   	d->pf_device = !strcmp(drv->driver.name,
> RTE_STR(VRB_PF_DRIVER_NAME));
> >   	d->mmio_base = pci_dev->mem_resource[0].addr;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
  2023-10-03 14:36   ` Maxime Coquelin
@ 2023-10-03 18:20     ` Chautru, Nicolas
  2023-10-04  7:11       ` Maxime Coquelin
  0 siblings, 1 reply; 42+ messages in thread
From: Chautru, Nicolas @ 2023-10-03 18:20 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan

Hi Maxime, 

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, October 3, 2023 7:37 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
> 
> 
> 
> On 9/29/23 18:35, Nicolas Chautru wrote:
> > Support for the FFT the processing specific to the
> > VRB2 variant.
> >
> > Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> > ---
> >   drivers/baseband/acc/rte_vrb_pmd.c | 132
> ++++++++++++++++++++++++++++-
> >   1 file changed, 128 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
> > b/drivers/baseband/acc/rte_vrb_pmd.c
> > index 93add82947..ce4b90d8e7 100644
> > --- a/drivers/baseband/acc/rte_vrb_pmd.c
> > +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> > @@ -903,6 +903,9 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t
> queue_id,
> >   			ACC_FCW_LD_BLEN : (conf->op_type ==
> RTE_BBDEV_OP_FFT ?
> >   			ACC_FCW_FFT_BLEN : ACC_FCW_MLDTS_BLEN))));
> >
> > +	if ((q->d->device_variant == VRB2_VARIANT) && (conf->op_type ==
> RTE_BBDEV_OP_FFT))
> > +		fcw_len = ACC_FCW_FFT_BLEN_3;
> > +
> >   	for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) {
> >   		desc = q->ring_addr + desc_idx;
> >   		desc->req.word0 = ACC_DMA_DESC_TYPE; @@ -1323,6
> +1326,24 @@
> > vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info
> *dev_info)
> >   			.num_buffers_soft_out = 0,
> >   			}
> >   		},
> > +		{
> > +			.type	= RTE_BBDEV_OP_FFT,
> > +			.cap.fft = {
> > +				.capability_flags =
> > +
> 	RTE_BBDEV_FFT_WINDOWING |
> > +
> 	RTE_BBDEV_FFT_CS_ADJUSTMENT |
> > +
> 	RTE_BBDEV_FFT_DFT_BYPASS |
> > +
> 	RTE_BBDEV_FFT_IDFT_BYPASS |
> > +						RTE_BBDEV_FFT_FP16_INPUT
> |
> > +
> 	RTE_BBDEV_FFT_FP16_OUTPUT |
> > +
> 	RTE_BBDEV_FFT_POWER_MEAS |
> > +
> 	RTE_BBDEV_FFT_WINDOWING_BYPASS,
> > +				.num_buffers_src =
> > +						1,
> > +				.num_buffers_dst =
> > +						1,
> > +			}
> > +		},
> >   		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
> >   	};
> >
> > @@ -3849,6 +3870,47 @@ vrb1_fcw_fft_fill(struct rte_bbdev_fft_op *op,
> struct acc_fcw_fft *fcw)
> >   		fcw->bypass = 0;
> >   }
> >
> > +/* Fill in a frame control word for FFT processing. */ static inline
> > +void vrb2_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct
> > +acc_fcw_fft_3 *fcw) {
> > +	fcw->in_frame_size = op->fft.input_sequence_size;
> > +	fcw->leading_pad_size = op->fft.input_leading_padding;
> > +	fcw->out_frame_size = op->fft.output_sequence_size;
> > +	fcw->leading_depad_size = op->fft.output_leading_depadding;
> > +	fcw->cs_window_sel = op->fft.window_index[0] +
> > +			(op->fft.window_index[1] << 8) +
> > +			(op->fft.window_index[2] << 16) +
> > +			(op->fft.window_index[3] << 24);
> > +	fcw->cs_window_sel2 = op->fft.window_index[4] +
> > +			(op->fft.window_index[5] << 8);
> > +	fcw->cs_enable_bmap = op->fft.cs_bitmap;
> > +	fcw->num_antennas = op->fft.num_antennas_log2;
> > +	fcw->idft_size = op->fft.idft_log2;
> > +	fcw->dft_size = op->fft.dft_log2;
> > +	fcw->cs_offset = op->fft.cs_time_adjustment;
> > +	fcw->idft_shift = op->fft.idft_shift;
> > +	fcw->dft_shift = op->fft.dft_shift;
> > +	fcw->cs_multiplier = op->fft.ncs_reciprocal;
> > +	fcw->power_shift = op->fft.power_shift; > +	fcw->exp_adj = op-
> >fft.fp16_exp_adjust;
> > +	fcw->fp16_in = check_bit(op->fft.op_flags,
> RTE_BBDEV_FFT_FP16_INPUT);
> > +	fcw->fp16_out = check_bit(op->fft.op_flags,
> RTE_BBDEV_FFT_FP16_OUTPUT);
> > +	fcw->power_en = check_bit(op->fft.op_flags,
> RTE_BBDEV_FFT_POWER_MEAS);
> > +	if (check_bit(op->fft.op_flags,
> > +			RTE_BBDEV_FFT_IDFT_BYPASS)) {
> > +		if (check_bit(op->fft.op_flags,
> > +				RTE_BBDEV_FFT_WINDOWING_BYPASS))
> > +			fcw->bypass = 2;
> > +		else
> > +			fcw->bypass = 1;
> > +	} else if (check_bit(op->fft.op_flags,
> > +			RTE_BBDEV_FFT_DFT_BYPASS))
> > +		fcw->bypass = 3;
> > +	else
> > +		fcw->bypass = 0;
> 
> The only difference I see with VRB1 are backed by corresponding op_flags
> (POWER & FP16), is that correct? If so, it does not make sense to me to have a
> specific function for VRB2.

There are more changes but these are only formally enabled in the next stepping hence some of the
related code is not included yet. More generally the FCW and IP is different from VRB1 implementation. 

> 
> > +}
> > +
> >   static inline int
> >   vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
> >   		struct acc_dma_req_desc *desc,
> > @@ -3882,6 +3944,58 @@ vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op
> *op,
> >   	return 0;
> >   }
> >
> > +static inline int
> > +vrb2_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
> > +		struct acc_dma_req_desc *desc,
> > +		struct rte_mbuf *input, struct rte_mbuf *output, struct
> rte_mbuf *win_input,
> > +		struct rte_mbuf *pwr, uint32_t *in_offset, uint32_t
> *out_offset,
> > +		uint32_t *win_offset, uint32_t *pwr_offset) {
> > +	bool pwr_en = check_bit(op->fft.op_flags,
> RTE_BBDEV_FFT_POWER_MEAS);
> > +	bool win_en = check_bit(op->fft.op_flags,
> RTE_BBDEV_FFT_DEWINDOWING);
> > +	int num_cs = 0, i, bd_idx = 1;
> > +
> > +	/* FCW already done */
> > +	acc_header_init(desc);
> > +
> > +	RTE_SET_USED(win_input);
> > +	RTE_SET_USED(win_offset);
> > +
> > +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(input,
> *in_offset);
> > +	desc->data_ptrs[bd_idx].blen = op->fft.input_sequence_size *
> ACC_IQ_SIZE;
> > +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_IN;
> > +	desc->data_ptrs[bd_idx].last = 1;
> > +	desc->data_ptrs[bd_idx].dma_ext = 0;
> > +	bd_idx++;
> > +
> > +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(output,
> *out_offset);
> > +	desc->data_ptrs[bd_idx].blen = op->fft.output_sequence_size *
> ACC_IQ_SIZE;
> > +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_HARD;
> > +	desc->data_ptrs[bd_idx].last = pwr_en ? 0 : 1;
> > +	desc->data_ptrs[bd_idx].dma_ext = 0;
> > +	desc->m2dlen = win_en ? 3 : 2;
> > +	desc->d2mlen = pwr_en ? 2 : 1;
> > +	desc->ib_ant_offset = op->fft.input_sequence_size;
> > +	desc->num_ant = op->fft.num_antennas_log2 - 3;
> > +
> > +	for (i = 0; i < RTE_BBDEV_MAX_CS; i++)
> > +		if (check_bit(op->fft.cs_bitmap, 1 << i))
> > +			num_cs++;
> > +	desc->num_cs = num_cs;
> > +
> > +	if (pwr_en && pwr) {
> > +		bd_idx++;
> > +		desc->data_ptrs[bd_idx].address =
> rte_pktmbuf_iova_offset(pwr, *pwr_offset);
> > +		desc->data_ptrs[bd_idx].blen = num_cs * (1 << op-
> >fft.num_antennas_log2) * 4;
> > +		desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_SOFT;
> > +		desc->data_ptrs[bd_idx].last = 1;
> > +		desc->data_ptrs[bd_idx].dma_ext = 0;
> > +	}
> > +	desc->ob_cyc_offset = op->fft.output_sequence_size;
> > +	desc->ob_ant_offset = op->fft.output_sequence_size * num_cs;
> > +	desc->op_addr = op;
> > +	return 0;
> > +}
> >
> >   /** Enqueue one FFT operation for device. */
> >   static inline int
> > @@ -3889,22 +4003,32 @@ vrb_enqueue_fft_one_op(struct acc_queue *q,
> struct rte_bbdev_fft_op *op,
> >   		uint16_t total_enqueued_cbs)
> >   {
> >   	union acc_dma_desc *desc;
> > -	struct rte_mbuf *input, *output;
> > -	uint32_t in_offset, out_offset;
> > +	struct rte_mbuf *input, *output, *pwr, *win;
> > +	uint32_t in_offset, out_offset, pwr_offset, win_offset;
> >   	struct acc_fcw_fft *fcw;
> >
> >   	desc = acc_desc(q, total_enqueued_cbs);
> >   	input = op->fft.base_input.data;
> >   	output = op->fft.base_output.data;
> > +	pwr = op->fft.power_meas_output.data;
> > +	win = op->fft.dewindowing_input.data;
> >   	in_offset = op->fft.base_input.offset;
> >   	out_offset = op->fft.base_output.offset;
> > +	pwr_offset = op->fft.power_meas_output.offset;
> > +	win_offset = op->fft.dewindowing_input.offset;
> >
> >   	fcw = (struct acc_fcw_fft *) (q->fcw_ring +
> >   			((q->sw_ring_head + total_enqueued_cbs) & q-
> >sw_ring_wrap_mask)
> >   			* ACC_MAX_FCW_SIZE);
> >
> > -	vrb1_fcw_fft_fill(op, fcw);
> > -	vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset,
> &out_offset);
> > +	if (q->d->device_variant == VRB1_VARIANT) {
> > +		vrb1_fcw_fft_fill(op, fcw);
> > +		vrb1_dma_desc_fft_fill(op, &desc->req, input, output,
> &in_offset, &out_offset);
> > +	} else {
> > +		vrb2_fcw_fft_fill(op, (struct acc_fcw_fft_3 *) fcw);
> > +		vrb2_dma_desc_fft_fill(op, &desc->req, input, output, win,
> pwr,
> > +				&in_offset, &out_offset, &win_offset,
> &pwr_offset);
> > +	}
> >   #ifdef RTE_LIBRTE_BBDEV_DEBUG
> >   	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
> >   			sizeof(desc->req.fcw_fft));


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 06/12] baseband/acc: refactor to allow unified driver extension
  2023-10-03 13:14   ` Maxime Coquelin
@ 2023-10-03 18:54     ` Chautru, Nicolas
  2023-10-04  7:35       ` Maxime Coquelin
  0 siblings, 1 reply; 42+ messages in thread
From: Chautru, Nicolas @ 2023-10-03 18:54 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan

Hi Maxime, 

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, October 3, 2023 6:15 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v3 06/12] baseband/acc: refactor to allow unified driver
> extension
> 
> Thanks for doing the split, that will ease review.
> 
> On 9/29/23 18:35, Nicolas Chautru wrote:
> > Adding a few functions and common code prior to extending the VRB
> > driver.
> >
> > Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> > ---
> >   drivers/baseband/acc/acc_common.h     | 164 +++++++++++++++++++++++-
> --
> >   drivers/baseband/acc/rte_acc100_pmd.c |   4 +-
> >   drivers/baseband/acc/rte_vrb_pmd.c    |  62 +++++-----
> >   3 files changed, 184 insertions(+), 46 deletions(-)
> >
> > diff --git a/drivers/baseband/acc/acc_common.h
> > b/drivers/baseband/acc/acc_common.h
> > index 788abf1a3c..89893eae43 100644
> > --- a/drivers/baseband/acc/acc_common.h
> > +++ b/drivers/baseband/acc/acc_common.h
> > @@ -18,6 +18,7 @@
> >   #define ACC_DMA_BLKID_OUT_HARQ      3
> >   #define ACC_DMA_BLKID_IN_HARQ       3
> >   #define ACC_DMA_BLKID_IN_MLD_R      3
> > +#define ACC_DMA_BLKID_DEWIN_IN      3
> >
> >   /* Values used in filling in decode FCWs */
> >   #define ACC_FCW_TD_VER              1
> > @@ -103,6 +104,9 @@
> >   #define ACC_MAX_NUM_QGRPS              32
> >   #define ACC_RING_SIZE_GRANULARITY      64
> >   #define ACC_MAX_FCW_SIZE              128
> > +#define ACC_IQ_SIZE                    4
> > +
> > +#define ACC_FCW_FFT_BLEN_3             28
> >
> >   /* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2 */
> >   #define ACC_N_ZC_1 66 /* N = 66 Zc for BG 1 */ @@ -132,6 +136,17 @@
> >   #define ACC_LIM_21 14 /* 0.21 */
> >   #define ACC_LIM_31 20 /* 0.31 */
> >   #define ACC_MAX_E (128 * 1024 - 2)
> > +#define ACC_MAX_CS 12
> > +
> > +#define ACC100_VARIANT          0
> > +#define VRB1_VARIANT		2
> > +#define VRB2_VARIANT		3
> > +
> > +/* Queue Index Hierarchy */
> > +#define VRB1_GRP_ID_SHIFT    10
> > +#define VRB1_VF_ID_SHIFT     4
> > +#define VRB2_GRP_ID_SHIFT    12
> > +#define VRB2_VF_ID_SHIFT     6
> >
> >   /* Helper macro for logging */
> >   #define rte_acc_log(level, fmt, ...) \ @@ -332,6 +347,37 @@ struct
> > __rte_packed acc_fcw_fft {
> >   		res:19;
> >   };
> >
> > +/* FFT Frame Control Word. */
> > +struct __rte_packed acc_fcw_fft_3 {
> > +	uint32_t in_frame_size:16,
> > +		leading_pad_size:16;
> > +	uint32_t out_frame_size:16,
> > +		leading_depad_size:16;
> > +	uint32_t cs_window_sel;
> > +	uint32_t cs_window_sel2:16,
> > +		cs_enable_bmap:16;
> > +	uint32_t num_antennas:8,
> > +		idft_size:8,
> > +		dft_size:8,
> > +		cs_offset:8;
> > +	uint32_t idft_shift:8,
> > +		dft_shift:8,
> > +		cs_multiplier:16;
> > +	uint32_t bypass:2,
> > +		fp16_in:1,
> > +		fp16_out:1,
> > +		exp_adj:4,
> > +		power_shift:4,
> > +		power_en:1,
> > +		enable_dewin:1,
> > +		freq_resample_mode:2,
> > +		depad_output_size:16;
> > +	uint16_t cs_theta_0[ACC_MAX_CS];
> > +	uint32_t cs_theta_d[ACC_MAX_CS];
> > +	int8_t cs_time_offset[ACC_MAX_CS];
> > +};
> > +
> > +
> >   /* MLD-TS Frame Control Word */
> >   struct __rte_packed acc_fcw_mldts {
> >   	uint32_t fcw_version:4,
> > @@ -473,14 +519,14 @@ union acc_info_ring_data {
> >   		uint16_t valid: 1;
> >   	};
> >   	struct {
> > -		uint32_t aq_id_3: 6;
> > -		uint32_t qg_id_3: 5;
> > -		uint32_t vf_id_3: 6;
> > -		uint32_t int_nb_3: 6;
> > -		uint32_t msi_0_3: 1;
> > -		uint32_t vf2pf_3: 6;
> > -		uint32_t loop_3: 1;
> > -		uint32_t valid_3: 1;
> > +		uint32_t aq_id_vrb2: 6;
> > +		uint32_t qg_id_vrb2: 5;
> > +		uint32_t vf_id_vrb2: 6;
> > +		uint32_t int_nb_vrb2: 6;
> > +		uint32_t msi_0_vrb2: 1;
> > +		uint32_t vf2pf_vrb2: 6;
> > +		uint32_t loop_vrb2: 1;
> > +		uint32_t valid_vrb2: 1;
> >   	};
> >   } __rte_packed;
> >
> > @@ -761,22 +807,105 @@ alloc_sw_rings_min_mem(struct rte_bbdev *dev,
> struct acc_device *d,
> >   	free_base_addresses(base_addrs, i);
> >   }
> >
> > +/* Wrapper to provide VF index from ring data. */ static inline
> > +uint16_t vf_from_ring(const union acc_info_ring_data ring_data,
> > +uint16_t device_variant) {
> 
> curly braces on a new line.

Thanks. 

> 
> > +	if (device_variant == VRB2_VARIANT)
> > +		return ring_data.vf_id_vrb2;
> > +	else
> > +		return ring_data.vf_id;
> > +}
> > +
> > +/* Wrapper to provide QG index from ring data. */ static inline
> > +uint16_t qg_from_ring(const union acc_info_ring_data ring_data,
> > +uint16_t device_variant) {
> > +	if (device_variant == VRB2_VARIANT)
> > +		return ring_data.qg_id_vrb2;
> > +	else
> > +		return ring_data.qg_id;
> > +}
> > +
> > +/* Wrapper to provide AQ index from ring data. */ static inline
> > +uint16_t aq_from_ring(const union acc_info_ring_data ring_data,
> > +uint16_t device_variant) {
> > +	if (device_variant == VRB2_VARIANT)
> > +		return ring_data.aq_id_vrb2;
> > +	else
> > +		return ring_data.aq_id;
> > +}
> > +
> > +/* Wrapper to provide int index from ring data. */ static inline
> > +uint16_t int_from_ring(const union acc_info_ring_data ring_data,
> > +uint16_t device_variant) {
> > +	if (device_variant == VRB2_VARIANT)
> > +		return ring_data.int_nb_vrb2;
> > +	else
> > +		return ring_data.int_nb;
> > +}
> > +
> > +/* Wrapper to provide queue index from group and aq index. */ static
> > +inline int queue_index(uint16_t group_idx, uint16_t aq_idx, uint16_t
> > +device_variant) {
> > +	if (device_variant == VRB2_VARIANT)
> > +		return (group_idx << VRB2_GRP_ID_SHIFT) + aq_idx;
> > +	else
> > +		return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx; }
> > +
> > +/* Wrapper to provide queue group from queue index. */ static inline
> > +int qg_from_q(uint32_t q_idx, uint16_t device_variant) {
> > +	if (device_variant == VRB2_VARIANT)
> > +		return (q_idx >> VRB2_GRP_ID_SHIFT) & 0x1F;
> > +	else
> > +		return (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF; }
> > +
> > +/* Wrapper to provide vf from queue index. */ static inline int32_t
> > +vf_from_q(uint32_t q_idx, uint16_t device_variant) {
> > +	if (device_variant == VRB2_VARIANT)
> > +		return (q_idx >> VRB2_VF_ID_SHIFT)  & 0x3F;
> > +	else
> > +		return (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F; }
> > +
> > +/* Wrapper to provide aq index from queue index. */ static inline
> > +int32_t aq_from_q(uint32_t q_idx, uint16_t device_variant) {
> > +	if (device_variant == VRB2_VARIANT)
> > +		return q_idx & 0x3F;
> > +	else
> > +		return q_idx & 0xF;
> > +}
> > +
> > +/* Wrapper to set VF index in ring data. */ static inline int32_t
> > +set_vf_in_ring(volatile union acc_info_ring_data *ring_data,
> > +		uint16_t device_variant, uint16_t value) {
> > +	if (device_variant == VRB2_VARIANT)
> > +		return ring_data->vf_id_vrb2 = value;
> > +	else
> > +		return ring_data->vf_id = value;
> > +}
> > +
> >   /*
> >    * Find queue_id of a device queue based on details from the Info Ring.
> >    * If a queue isn't found UINT16_MAX is returned.
> >    */
> >   static inline uint16_t
> >   get_queue_id_from_ring_info(struct rte_bbdev_data *data,
> > -		const union acc_info_ring_data ring_data)
> > +		const union acc_info_ring_data ring_data, uint16_t
> device_variant)
> 
> As I suggested on v2:
> 
> get_queue_id_from_ring_info(struct rte_bbdev_data *data,
> 	const union acc_info_ring_data ring_data) {
> 	struct acc_device *d = data->dev_private;
> 
> 	...
> 
> 	if (acc_q != NULL && acc_q->aq_id == aq_from_ring(d, ring_data) &&
> ...
> 
> }
> 
> with
> 
> /* Wrapper to provide AQ index from ring data. */ tatic inline uint16_t
> aq_from_ring(struct acc_device *d, const union acc_info_ring_data ring_data)
> {
> 	if (d->device_variant == VRB2_VARIANT)
> 		return ring_data.aq_id_vrb2;
> 	else
> 		return ring_data.aq_id;
> }
> 

I will change the get_queue_id_from_ring_info() to have a smaller prototype
but I don’t plan on changing the other new underlying funs to use dev instead of the variant
in prototype, 
I don’t see a reason to as these only need this very member. 

> >   {
> >   	uint16_t queue_id;
> > +	struct acc_queue *acc_q;
> >
> >   	for (queue_id = 0; queue_id < data->num_queues; ++queue_id) {
> > -		struct acc_queue *acc_q =
> > -				data->queues[queue_id].queue_private;
> > -		if (acc_q != NULL && acc_q->aq_id == ring_data.aq_id &&
> > -				acc_q->qgrp_id == ring_data.qg_id &&
> > -				acc_q->vf_id == ring_data.vf_id)
> > +		acc_q = data->queues[queue_id].queue_private;
> > +
> > +		if (acc_q != NULL && acc_q->aq_id ==
> aq_from_ring(ring_data, device_variant) &&
> > +				acc_q->qgrp_id == qg_from_ring(ring_data,
> device_variant) &&
> > +				acc_q->vf_id == vf_from_ring(ring_data,
> device_variant))
> >   			return queue_id;
> >   	}
> >
> > @@ -1438,4 +1567,11 @@ get_num_cbs_in_tb_ldpc_enc(struct
> rte_bbdev_op_ldpc_enc *ldpc_enc)
> >   	return cbs_in_tb;
> >   }
> >
> > +static inline void
> > +acc_reg_fast_write(struct acc_device *d, uint32_t offset, uint32_t
> > +value) {
> > +	void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset);
> > +	mmio_write(reg_addr, value);
> > +}
> > +
> >   #endif /* _ACC_COMMON_H_ */
> > diff --git a/drivers/baseband/acc/rte_acc100_pmd.c
> > b/drivers/baseband/acc/rte_acc100_pmd.c
> > index 5362d39c30..7f8d05b5a9 100644
> > --- a/drivers/baseband/acc/rte_acc100_pmd.c
> > +++ b/drivers/baseband/acc/rte_acc100_pmd.c
> > @@ -294,7 +294,7 @@ acc100_pf_interrupt_handler(struct rte_bbdev *dev)
> >   		case ACC100_PF_INT_DMA_UL5G_DESC_IRQ:
> >   		case ACC100_PF_INT_DMA_DL5G_DESC_IRQ:
> >   			deq_intr_det.queue_id =
> get_queue_id_from_ring_info(
> > -					dev->data, *ring_data);
> > +					dev->data, *ring_data, acc100_dev-
> >device_variant);
> >   			if (deq_intr_det.queue_id == UINT16_MAX) {
> >   				rte_bbdev_log(ERR,
> >   						"Couldn't find queue: aq_id:
> %u, qg_id: %u, vf_id: %u", @@
> > -348,7 +348,7 @@ acc100_vf_interrupt_handler(struct rte_bbdev *dev)
> >   			 */
> >   			ring_data->vf_id = 0;
> >   			deq_intr_det.queue_id =
> get_queue_id_from_ring_info(
> > -					dev->data, *ring_data);
> > +					dev->data, *ring_data, acc100_dev-
> >device_variant);
> >   			if (deq_intr_det.queue_id == UINT16_MAX) {
> >   				rte_bbdev_log(ERR,
> >   						"Couldn't find queue: aq_id:
> %u, qg_id: %u", diff --git
> > a/drivers/baseband/acc/rte_vrb_pmd.c
> > b/drivers/baseband/acc/rte_vrb_pmd.c
> > index a1de012b40..c89c26c59a 100644
> > --- a/drivers/baseband/acc/rte_vrb_pmd.c
> > +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> > @@ -341,17 +341,18 @@ static inline void
> >   vrb_check_ir(struct acc_device *acc_dev)
> >   {
> >   	volatile union acc_info_ring_data *ring_data;
> > -	uint16_t info_ring_head = acc_dev->info_ring_head;
> > +	uint16_t info_ring_head = acc_dev->info_ring_head, int_nb;
> >   	if (unlikely(acc_dev->info_ring == NULL))
> >   		return;
> >
> >   	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
> > ACC_INFO_RING_MASK);
> >
> >   	while (ring_data->valid) {
> > -		if ((ring_data->int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
> > -				ring_data->int_nb >
> ACC_PF_INT_DMA_MLD_DESC_IRQ)) {
> > +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
> > +		if ((int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
> > +				int_nb > ACC_PF_INT_DMA_MLD_DESC_IRQ))
> {
> >   			rte_bbdev_log(WARNING, "InfoRing: ITR:%d
> Info:0x%x",
> > -					ring_data->int_nb, ring_data-
> >detailed_info);
> > +					int_nb, ring_data->detailed_info);
> >   			/* Initialize Info Ring entry and move forward. */
> >   			ring_data->val = 0;
> >   		}
> > @@ -368,16 +369,21 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >   	struct acc_device *acc_dev = dev->data->dev_private;
> >   	volatile union acc_info_ring_data *ring_data;
> >   	struct acc_deq_intr_details deq_intr_det;
> > +	uint16_t vf_id, aq_id, qg_id, int_nb;
> >
> >   	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
> > ACC_INFO_RING_MASK);
> >
> >   	while (ring_data->valid) {
> > +		vf_id = vf_from_ring(*ring_data, acc_dev->device_variant);
> > +		aq_id = aq_from_ring(*ring_data, acc_dev->device_variant);
> > +		qg_id = qg_from_ring(*ring_data, acc_dev->device_variant);
> > +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
> >   		if (acc_dev->pf_device) {
> >   			rte_bbdev_log_debug(
> > -					"VRB1 PF Interrupt received, Info Ring
> data: 0x%x -> %d",
> > -					ring_data->val, ring_data->int_nb);
> > +					"PF Interrupt received, Info Ring data:
> 0x%x -> %d",
> > +					ring_data->val, int_nb);
> >
> > -			switch (ring_data->int_nb) {
> > +			switch (int_nb) {
> >   			case ACC_PF_INT_DMA_DL_DESC_IRQ:
> >   			case ACC_PF_INT_DMA_UL_DESC_IRQ:
> >   			case ACC_PF_INT_DMA_FFT_DESC_IRQ:
> > @@ -385,13 +391,11 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >   			case ACC_PF_INT_DMA_DL5G_DESC_IRQ:
> >   			case ACC_PF_INT_DMA_MLD_DESC_IRQ:
> >   				deq_intr_det.queue_id =
> get_queue_id_from_ring_info(
> > -						dev->data, *ring_data);
> > +						dev->data, *ring_data,
> acc_dev->device_variant);
> >   				if (deq_intr_det.queue_id == UINT16_MAX) {
> >   					rte_bbdev_log(ERR,
> >   							"Couldn't find queue:
> aq_id: %u, qg_id: %u, vf_id: %u",
> > -							ring_data->aq_id,
> > -							ring_data->qg_id,
> > -							ring_data->vf_id);
> > +							aq_id, qg_id, vf_id);
> >   					return;
> >   				}
> >   				rte_bbdev_pmd_callback_process(dev,
> > @@ -403,9 +407,9 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >   			}
> >   		} else {
> >   			rte_bbdev_log_debug(
> > -					"VRB1 VF Interrupt received, Info Ring
> data: 0x%x\n",
> > +					"VRB VF Interrupt received, Info Ring
> data: 0x%x\n",
> >   					ring_data->val);
> > -			switch (ring_data->int_nb) {
> > +			switch (int_nb) {
> >   			case ACC_VF_INT_DMA_DL_DESC_IRQ:
> >   			case ACC_VF_INT_DMA_UL_DESC_IRQ:
> >   			case ACC_VF_INT_DMA_FFT_DESC_IRQ:
> > @@ -413,14 +417,13 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >   			case ACC_VF_INT_DMA_DL5G_DESC_IRQ:
> >   			case ACC_VF_INT_DMA_MLD_DESC_IRQ:
> >   				/* VFs are not aware of their vf_id - it's set to
> 0.  */
> > -				ring_data->vf_id = 0;
> > +				set_vf_in_ring(ring_data, acc_dev-
> >device_variant, 0);
> >   				deq_intr_det.queue_id =
> get_queue_id_from_ring_info(
> > -						dev->data, *ring_data);
> > +						dev->data, *ring_data,
> acc_dev->device_variant);
> >   				if (deq_intr_det.queue_id == UINT16_MAX) {
> >   					rte_bbdev_log(ERR,
> >   							"Couldn't find queue:
> aq_id: %u, qg_id: %u",
> > -							ring_data->aq_id,
> > -							ring_data->qg_id);
> > +							aq_id, qg_id);
> >   					return;
> >   				}
> >   				rte_bbdev_pmd_callback_process(dev,
> > @@ -435,8 +438,7 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >   		/* Initialize Info Ring entry and move forward. */
> >   		ring_data->val = 0;
> >   		++acc_dev->info_ring_head;
> > -		ring_data = acc_dev->info_ring +
> > -				(acc_dev->info_ring_head &
> ACC_INFO_RING_MASK);
> > +		ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
> > +ACC_INFO_RING_MASK);
> >   	}
> >   }
> >
> > @@ -556,8 +558,7 @@ vrb_setup_queues(struct rte_bbdev *dev, uint16_t
> > num_queues, int socket_id)
> >
> >   	/* Configure tail pointer for use when SDONE enabled. */
> >   	if (d->tail_ptrs == NULL)
> > -		d->tail_ptrs = rte_zmalloc_socket(
> > -				dev->device->driver->name,
> > +		d->tail_ptrs = rte_zmalloc_socket(dev->device->driver->name,
> >   				VRB_MAX_QGRPS * VRB_MAX_AQS *
> sizeof(uint32_t),
> >   				RTE_CACHE_LINE_SIZE, socket_id);
> >   	if (d->tail_ptrs == NULL) {
> > @@ -783,7 +784,7 @@ vrb_find_free_queue_idx(struct rte_bbdev *dev,
> >   			/* Mark the Queue as assigned. */
> >   			d->q_assigned_bit_map[group_idx] |= (1ULL <<
> aq_idx);
> >   			/* Report the AQ Index. */
> > -			return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx;
> > +			return queue_index(group_idx, aq_idx, d-
> >device_variant);
> >   		}
> >   	}
> >   	rte_bbdev_log(INFO, "Failed to find free queue on %s, priority %u",
> > @@ -922,9 +923,10 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t
> queue_id,
> >   		}
> >   	}
> >
> > -	q->qgrp_id = (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF;
> > -	q->vf_id = (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F;
> > -	q->aq_id = q_idx & 0xF;
> > +	q->qgrp_id = qg_from_q(q_idx, d->device_variant);
> > +	q->vf_id = vf_from_q(q_idx, d->device_variant);
> > +	q->aq_id = aq_from_q(q_idx, d->device_variant);
> > +
> >   	q->aq_depth = 0;
> >   	if (conf->op_type ==  RTE_BBDEV_OP_TURBO_DEC)
> >   		q->aq_depth = (1 << d->acc_conf.q_ul_4g.aq_depth_log2);
> > @@ -1311,7 +1313,7 @@ vrb_fcw_td_fill(const struct rte_bbdev_dec_op
> *op, struct acc_fcw_td *fcw)
> >   		fcw->bypass_teq = 0;
> >   	}
> >
> > -	fcw->code_block_mode = 1; /* FIXME */
> > +	fcw->code_block_mode = 1;
> 
> Could you remind me what was the issue?

Historically there was the intention to use a difference format option in the fcw to help with the TB mode but that is not considered anymore. 

> 
> >   	fcw->turbo_crc_type = check_bit(op->turbo_dec.op_flags,
> >   			RTE_BBDEV_TURBO_CRC_TYPE_24B);
> >
> > @@ -1471,8 +1473,8 @@ vrb_dma_desc_td_fill(struct rte_bbdev_dec_op
> *op,
> >   	if (op->turbo_dec.code_block_mode ==
> RTE_BBDEV_TRANSPORT_BLOCK) {
> >   		k = op->turbo_dec.tb_params.k_pos;
> >   		e = (r < op->turbo_dec.tb_params.cab)
> > -			? op->turbo_dec.tb_params.ea
> > -			: op->turbo_dec.tb_params.eb;
> > +				? op->turbo_dec.tb_params.ea
> > +				: op->turbo_dec.tb_params.eb;
> >   	} else {
> >   		k = op->turbo_dec.cb_params.k;
> >   		e = op->turbo_dec.cb_params.e;
> > @@ -1726,7 +1728,7 @@ vrb_dma_desc_ld_update(struct
> rte_bbdev_dec_op *op,
> >   	desc->op_addr = op;
> >   }
> >
> > -/* Enqueue one encode operations for device in CB mode */
> > +/* Enqueue one encode operations for device in CB mode. */
> >   static inline int
> >   enqueue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op
> *op,
> >   		uint16_t total_enqueued_cbs)
> > @@ -2263,7 +2265,7 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct
> acc_queue *q, struct rte_bbdev_dec_op *op,
> >   	return current_enqueued_cbs;
> >   }
> >
> > -/* Enqueue one decode operations for device in TB mode */
> > +/* Enqueue one decode operations for device in TB mode. */
> >   static inline int
> >   enqueue_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op
> *op,
> >   		uint16_t total_enqueued_cbs, uint8_t cbs_in_tb)


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 02/12] baseband/acc: add FFT window width in the VRB PMD
  2023-10-03 11:52   ` Maxime Coquelin
@ 2023-10-03 19:06     ` Chautru, Nicolas
  2023-10-04  7:55       ` Maxime Coquelin
  0 siblings, 1 reply; 42+ messages in thread
From: Chautru, Nicolas @ 2023-10-03 19:06 UTC (permalink / raw)
  To: Maxime Coquelin, dev, hemant.agrawal; +Cc: david.marchand, Vargas, Hernan

Hi Maxime, 

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, October 3, 2023 4:52 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v3 02/12] baseband/acc: add FFT window width in the
> VRB PMD
> 
> 
> 
> On 9/29/23 18:35, Nicolas Chautru wrote:
> > This allows to expose the FFT window width being introduced in
> > previous commit based on what is configured dynamically on the device
> > platform.
> >
> > Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> > ---
> >   drivers/baseband/acc/acc_common.h  |  3 +++
> >   drivers/baseband/acc/rte_vrb_pmd.c | 29
> +++++++++++++++++++++++++++++
> >   2 files changed, 32 insertions(+)
> >
> > diff --git a/drivers/baseband/acc/acc_common.h
> > b/drivers/baseband/acc/acc_common.h
> > index 5bb00746c3..7d24c644c0 100644
> > --- a/drivers/baseband/acc/acc_common.h
> > +++ b/drivers/baseband/acc/acc_common.h
> > @@ -512,6 +512,8 @@ struct acc_deq_intr_details {
> >   enum {
> >   	ACC_VF2PF_STATUS_REQUEST = 1,
> >   	ACC_VF2PF_USING_VF = 2,
> > +	ACC_VF2PF_LUT_VER_REQUEST = 3,
> > +	ACC_VF2PF_FFT_WIN_REQUEST = 4,
> >   };
> >
> >
> > @@ -558,6 +560,7 @@ struct acc_device {
> >   	queue_offset_fun_t queue_offset;  /* Device specific queue offset */
> >   	uint16_t num_qgroups;
> >   	uint16_t num_aqs;
> > +	uint16_t fft_window_width[RTE_BBDEV_MAX_FFT_WIN]; /* FFT
> windowing
> > +width. */
> >   };
> >
> >   /* Structure associated with each queue. */ diff --git
> > a/drivers/baseband/acc/rte_vrb_pmd.c
> > b/drivers/baseband/acc/rte_vrb_pmd.c
> > index 9e5a73c9c7..c5a74bae11 100644
> > --- a/drivers/baseband/acc/rte_vrb_pmd.c
> > +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> > @@ -298,6 +298,34 @@ vrb_device_status(struct rte_bbdev *dev)
> >   	return reg;
> >   }
> >
> > +/* Request device FFT windowing information. */ static inline void
> > +vrb_device_fft_win(struct rte_bbdev *dev, struct
> > +rte_bbdev_driver_info *dev_info) {
> > +	struct acc_device *d = dev->data->dev_private;
> > +	uint32_t reg, time_out = 0, win;
> > +
> > +	if (d->pf_device)
> > +		return;
> > +
> > +	/* Check from the device the first time. */
> > +	if (d->fft_window_width[0] == 0) {
> 
> O is not a possible value? Might not be a big deal as it would just perform
> reading of 16 x 2 registers every time .info_get() is called, just wondering.

This is impossible for this to be null. It would mean forcing a zero output all the time. Cannot happen fundamentally. 

> 
> > +		for (win = 0; win < RTE_BBDEV_MAX_FFT_WIN; win++) {
> > +			vrb_vf2pf(d, ACC_VF2PF_FFT_WIN_REQUEST | win);
> 
> That looks broken, as extending RTE_BBDEV_MAX_FFT_WIN to support other
> devices may break this implementation.

I don’t believe so. 16 windows shapes is a fairly large, this already takes a lot of memory to store all this. 

> 
> To me, it tends to show how this fft_window_width array is more device
> specific than generic.

I don't see why you say this really. This is fundamentally what windowing is. This is a given section of the FFT output where you apply a point-wise multiplication based on a given window shape, hence the size is scaled up and down based on the FFT size. 
This width information is required to estimate about much noise to remove by applying such windowing, hence this is enumerated during device enumeration. 
Then the number of windows available is a discrete numbers as mentioned above based on how many of these are programmed on the device (these needs to be stored in HW memory). 

Would you prefer to expose an additional parameter for the number of windows in the capability (ie. size of that array) then a pointer for the actual array? That is okay with me and probably better. Please kindly confirm. 
Also Herman feel free to chime in. 

Ie. 
		{
			.type	= RTE_BBDEV_OP_FFT,
			.cap.fft = {
				.capability_flags = (...),
				.num_windows = 16,
			}
		},

> 
> > +			reg = acc_reg_read(d, d->reg_addr->pf2vf_doorbell);
> > +			while ((time_out < ACC_STATUS_TO) && (reg ==
> RTE_BBDEV_DEV_NOSTATUS)) {
> > +				usleep(ACC_STATUS_WAIT); /*< Wait or VF-
> >PF->VF Comms */
> > +				reg = acc_reg_read(d, d->reg_addr-
> >pf2vf_doorbell);
> > +				time_out++;
> > +			}
> > +			d->fft_window_width[win] = reg;
> > +		}
> > +	}
> > +
> > +	for (win = 0; win < RTE_BBDEV_MAX_FFT_WIN; win++)
> > +		dev_info->fft_window_width[win] = d-
> >fft_window_width[win]; }
> > +
> >   /* Checks PF Info Ring to find the interrupt cause and handles it
> accordingly. */
> >   static inline void
> >   vrb_check_ir(struct acc_device *acc_dev) @@ -1100,6 +1128,7 @@
> > vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info
> *dev_info)
> >   	fetch_acc_config(dev);
> >   	/* Check the status of device. */
> >   	dev_info->device_status = vrb_device_status(dev);
> > +	vrb_device_fft_win(dev, dev_info);
> >
> >   	/* Exposed number of queues. */
> >   	dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
  2023-10-03 18:20     ` Chautru, Nicolas
@ 2023-10-04  7:11       ` Maxime Coquelin
  2023-10-04 21:18         ` Chautru, Nicolas
  0 siblings, 1 reply; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-04  7:11 UTC (permalink / raw)
  To: Chautru, Nicolas, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan



On 10/3/23 20:20, Chautru, Nicolas wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Tuesday, October 3, 2023 7:37 AM
>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
>> <hernan.vargas@intel.com>
>> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
>>
>>
>>
>> On 9/29/23 18:35, Nicolas Chautru wrote:
>>> Support for the FFT the processing specific to the
>>> VRB2 variant.
>>>
>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
>>> ---
>>>    drivers/baseband/acc/rte_vrb_pmd.c | 132
>> ++++++++++++++++++++++++++++-
>>>    1 file changed, 128 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
>>> b/drivers/baseband/acc/rte_vrb_pmd.c
>>> index 93add82947..ce4b90d8e7 100644
>>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
>>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
>>> @@ -903,6 +903,9 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t
>> queue_id,
>>>    			ACC_FCW_LD_BLEN : (conf->op_type ==
>> RTE_BBDEV_OP_FFT ?
>>>    			ACC_FCW_FFT_BLEN : ACC_FCW_MLDTS_BLEN))));
>>>
>>> +	if ((q->d->device_variant == VRB2_VARIANT) && (conf->op_type ==
>> RTE_BBDEV_OP_FFT))
>>> +		fcw_len = ACC_FCW_FFT_BLEN_3;
>>> +
>>>    	for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) {
>>>    		desc = q->ring_addr + desc_idx;
>>>    		desc->req.word0 = ACC_DMA_DESC_TYPE; @@ -1323,6
>> +1326,24 @@
>>> vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info
>> *dev_info)
>>>    			.num_buffers_soft_out = 0,
>>>    			}
>>>    		},
>>> +		{
>>> +			.type	= RTE_BBDEV_OP_FFT,
>>> +			.cap.fft = {
>>> +				.capability_flags =
>>> +
>> 	RTE_BBDEV_FFT_WINDOWING |
>>> +
>> 	RTE_BBDEV_FFT_CS_ADJUSTMENT |
>>> +
>> 	RTE_BBDEV_FFT_DFT_BYPASS |
>>> +
>> 	RTE_BBDEV_FFT_IDFT_BYPASS |
>>> +						RTE_BBDEV_FFT_FP16_INPUT
>> |
>>> +
>> 	RTE_BBDEV_FFT_FP16_OUTPUT |
>>> +
>> 	RTE_BBDEV_FFT_POWER_MEAS |
>>> +
>> 	RTE_BBDEV_FFT_WINDOWING_BYPASS,
>>> +				.num_buffers_src =
>>> +						1,
>>> +				.num_buffers_dst =
>>> +						1,
>>> +			}
>>> +		},
>>>    		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
>>>    	};
>>>
>>> @@ -3849,6 +3870,47 @@ vrb1_fcw_fft_fill(struct rte_bbdev_fft_op *op,
>> struct acc_fcw_fft *fcw)
>>>    		fcw->bypass = 0;
>>>    }
>>>
>>> +/* Fill in a frame control word for FFT processing. */ static inline
>>> +void vrb2_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct
>>> +acc_fcw_fft_3 *fcw) {
>>> +	fcw->in_frame_size = op->fft.input_sequence_size;
>>> +	fcw->leading_pad_size = op->fft.input_leading_padding;
>>> +	fcw->out_frame_size = op->fft.output_sequence_size;
>>> +	fcw->leading_depad_size = op->fft.output_leading_depadding;
>>> +	fcw->cs_window_sel = op->fft.window_index[0] +
>>> +			(op->fft.window_index[1] << 8) +
>>> +			(op->fft.window_index[2] << 16) +
>>> +			(op->fft.window_index[3] << 24);
>>> +	fcw->cs_window_sel2 = op->fft.window_index[4] +
>>> +			(op->fft.window_index[5] << 8);
>>> +	fcw->cs_enable_bmap = op->fft.cs_bitmap;
>>> +	fcw->num_antennas = op->fft.num_antennas_log2;
>>> +	fcw->idft_size = op->fft.idft_log2;
>>> +	fcw->dft_size = op->fft.dft_log2;
>>> +	fcw->cs_offset = op->fft.cs_time_adjustment;
>>> +	fcw->idft_shift = op->fft.idft_shift;
>>> +	fcw->dft_shift = op->fft.dft_shift;
>>> +	fcw->cs_multiplier = op->fft.ncs_reciprocal;
>>> +	fcw->power_shift = op->fft.power_shift; > +	fcw->exp_adj = op-
>>> fft.fp16_exp_adjust;
>>> +	fcw->fp16_in = check_bit(op->fft.op_flags,
>> RTE_BBDEV_FFT_FP16_INPUT);
>>> +	fcw->fp16_out = check_bit(op->fft.op_flags,
>> RTE_BBDEV_FFT_FP16_OUTPUT);
>>> +	fcw->power_en = check_bit(op->fft.op_flags,
>> RTE_BBDEV_FFT_POWER_MEAS);
>>> +	if (check_bit(op->fft.op_flags,
>>> +			RTE_BBDEV_FFT_IDFT_BYPASS)) {
>>> +		if (check_bit(op->fft.op_flags,
>>> +				RTE_BBDEV_FFT_WINDOWING_BYPASS))
>>> +			fcw->bypass = 2;
>>> +		else
>>> +			fcw->bypass = 1;
>>> +	} else if (check_bit(op->fft.op_flags,
>>> +			RTE_BBDEV_FFT_DFT_BYPASS))
>>> +		fcw->bypass = 3;
>>> +	else
>>> +		fcw->bypass = 0;
>>
>> The only difference I see with VRB1 are backed by corresponding op_flags
>> (POWER & FP16), is that correct? If so, it does not make sense to me to have a
>> specific function for VRB2.
> 
> There are more changes but these are only formally enabled in the next stepping hence some of the
> related code is not included yet. More generally the FCW and IP is different from VRB1 implementation.

Currently, the code is almost identical so vrb1 implementation should be
reused. If some later changes makes the two implementations diverge,
then we can consider having a dedicated function for VRB2 at that time.

>>
>>> +}
>>> +
>>>    static inline int
>>>    vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
>>>    		struct acc_dma_req_desc *desc,
>>> @@ -3882,6 +3944,58 @@ vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op
>> *op,
>>>    	return 0;
>>>    }
>>>
>>> +static inline int
>>> +vrb2_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
>>> +		struct acc_dma_req_desc *desc,
>>> +		struct rte_mbuf *input, struct rte_mbuf *output, struct
>> rte_mbuf *win_input,
>>> +		struct rte_mbuf *pwr, uint32_t *in_offset, uint32_t
>> *out_offset,
>>> +		uint32_t *win_offset, uint32_t *pwr_offset) {
>>> +	bool pwr_en = check_bit(op->fft.op_flags,
>> RTE_BBDEV_FFT_POWER_MEAS);
>>> +	bool win_en = check_bit(op->fft.op_flags,
>> RTE_BBDEV_FFT_DEWINDOWING);
>>> +	int num_cs = 0, i, bd_idx = 1;
>>> +
>>> +	/* FCW already done */
>>> +	acc_header_init(desc);
>>> +
>>> +	RTE_SET_USED(win_input);
>>> +	RTE_SET_USED(win_offset);
>>> +
>>> +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(input,
>> *in_offset);
>>> +	desc->data_ptrs[bd_idx].blen = op->fft.input_sequence_size *
>> ACC_IQ_SIZE;
>>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_IN;
>>> +	desc->data_ptrs[bd_idx].last = 1;
>>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
>>> +	bd_idx++;
>>> +
>>> +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(output,
>> *out_offset);
>>> +	desc->data_ptrs[bd_idx].blen = op->fft.output_sequence_size *
>> ACC_IQ_SIZE;
>>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_HARD;
>>> +	desc->data_ptrs[bd_idx].last = pwr_en ? 0 : 1;
>>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
>>> +	desc->m2dlen = win_en ? 3 : 2;
>>> +	desc->d2mlen = pwr_en ? 2 : 1;
>>> +	desc->ib_ant_offset = op->fft.input_sequence_size;
>>> +	desc->num_ant = op->fft.num_antennas_log2 - 3;
>>> +
>>> +	for (i = 0; i < RTE_BBDEV_MAX_CS; i++)
>>> +		if (check_bit(op->fft.cs_bitmap, 1 << i))
>>> +			num_cs++;
>>> +	desc->num_cs = num_cs;
>>> +
>>> +	if (pwr_en && pwr) {
>>> +		bd_idx++;
>>> +		desc->data_ptrs[bd_idx].address =
>> rte_pktmbuf_iova_offset(pwr, *pwr_offset);
>>> +		desc->data_ptrs[bd_idx].blen = num_cs * (1 << op-
>>> fft.num_antennas_log2) * 4;
>>> +		desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_SOFT;
>>> +		desc->data_ptrs[bd_idx].last = 1;
>>> +		desc->data_ptrs[bd_idx].dma_ext = 0;
>>> +	}
>>> +	desc->ob_cyc_offset = op->fft.output_sequence_size;
>>> +	desc->ob_ant_offset = op->fft.output_sequence_size * num_cs;
>>> +	desc->op_addr = op;
>>> +	return 0;
>>> +}
>>>
>>>    /** Enqueue one FFT operation for device. */
>>>    static inline int
>>> @@ -3889,22 +4003,32 @@ vrb_enqueue_fft_one_op(struct acc_queue *q,
>> struct rte_bbdev_fft_op *op,
>>>    		uint16_t total_enqueued_cbs)
>>>    {
>>>    	union acc_dma_desc *desc;
>>> -	struct rte_mbuf *input, *output;
>>> -	uint32_t in_offset, out_offset;
>>> +	struct rte_mbuf *input, *output, *pwr, *win;
>>> +	uint32_t in_offset, out_offset, pwr_offset, win_offset;
>>>    	struct acc_fcw_fft *fcw;
>>>
>>>    	desc = acc_desc(q, total_enqueued_cbs);
>>>    	input = op->fft.base_input.data;
>>>    	output = op->fft.base_output.data;
>>> +	pwr = op->fft.power_meas_output.data;
>>> +	win = op->fft.dewindowing_input.data;
>>>    	in_offset = op->fft.base_input.offset;
>>>    	out_offset = op->fft.base_output.offset;
>>> +	pwr_offset = op->fft.power_meas_output.offset;
>>> +	win_offset = op->fft.dewindowing_input.offset;
>>>
>>>    	fcw = (struct acc_fcw_fft *) (q->fcw_ring +
>>>    			((q->sw_ring_head + total_enqueued_cbs) & q-
>>> sw_ring_wrap_mask)
>>>    			* ACC_MAX_FCW_SIZE);
>>>
>>> -	vrb1_fcw_fft_fill(op, fcw);
>>> -	vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset,
>> &out_offset);
>>> +	if (q->d->device_variant == VRB1_VARIANT) {
>>> +		vrb1_fcw_fft_fill(op, fcw);
>>> +		vrb1_dma_desc_fft_fill(op, &desc->req, input, output,
>> &in_offset, &out_offset);
>>> +	} else {
>>> +		vrb2_fcw_fft_fill(op, (struct acc_fcw_fft_3 *) fcw);
>>> +		vrb2_dma_desc_fft_fill(op, &desc->req, input, output, win,
>> pwr,
>>> +				&in_offset, &out_offset, &win_offset,
>> &pwr_offset);
>>> +	}
>>>    #ifdef RTE_LIBRTE_BBDEV_DEBUG
>>>    	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
>>>    			sizeof(desc->req.fcw_fft));
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 06/12] baseband/acc: refactor to allow unified driver extension
  2023-10-03 18:54     ` Chautru, Nicolas
@ 2023-10-04  7:35       ` Maxime Coquelin
  2023-10-04 21:28         ` Chautru, Nicolas
  0 siblings, 1 reply; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-04  7:35 UTC (permalink / raw)
  To: Chautru, Nicolas, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan



On 10/3/23 20:54, Chautru, Nicolas wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Tuesday, October 3, 2023 6:15 AM
>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
>> <hernan.vargas@intel.com>
>> Subject: Re: [PATCH v3 06/12] baseband/acc: refactor to allow unified driver
>> extension
>>
>> Thanks for doing the split, that will ease review.
>>
>> On 9/29/23 18:35, Nicolas Chautru wrote:
>>> Adding a few functions and common code prior to extending the VRB
>>> driver.
>>>
>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
>>> ---
>>>    drivers/baseband/acc/acc_common.h     | 164 +++++++++++++++++++++++-
>> --
>>>    drivers/baseband/acc/rte_acc100_pmd.c |   4 +-
>>>    drivers/baseband/acc/rte_vrb_pmd.c    |  62 +++++-----
>>>    3 files changed, 184 insertions(+), 46 deletions(-)
>>>
>>> diff --git a/drivers/baseband/acc/acc_common.h
>>> b/drivers/baseband/acc/acc_common.h
>>> index 788abf1a3c..89893eae43 100644
>>> --- a/drivers/baseband/acc/acc_common.h
>>> +++ b/drivers/baseband/acc/acc_common.h
>>> @@ -18,6 +18,7 @@
>>>    #define ACC_DMA_BLKID_OUT_HARQ      3
>>>    #define ACC_DMA_BLKID_IN_HARQ       3
>>>    #define ACC_DMA_BLKID_IN_MLD_R      3
>>> +#define ACC_DMA_BLKID_DEWIN_IN      3
>>>
>>>    /* Values used in filling in decode FCWs */
>>>    #define ACC_FCW_TD_VER              1
>>> @@ -103,6 +104,9 @@
>>>    #define ACC_MAX_NUM_QGRPS              32
>>>    #define ACC_RING_SIZE_GRANULARITY      64
>>>    #define ACC_MAX_FCW_SIZE              128
>>> +#define ACC_IQ_SIZE                    4
>>> +
>>> +#define ACC_FCW_FFT_BLEN_3             28
>>>
>>>    /* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2 */
>>>    #define ACC_N_ZC_1 66 /* N = 66 Zc for BG 1 */ @@ -132,6 +136,17 @@
>>>    #define ACC_LIM_21 14 /* 0.21 */
>>>    #define ACC_LIM_31 20 /* 0.31 */
>>>    #define ACC_MAX_E (128 * 1024 - 2)
>>> +#define ACC_MAX_CS 12
>>> +
>>> +#define ACC100_VARIANT          0
>>> +#define VRB1_VARIANT		2
>>> +#define VRB2_VARIANT		3
>>> +
>>> +/* Queue Index Hierarchy */
>>> +#define VRB1_GRP_ID_SHIFT    10
>>> +#define VRB1_VF_ID_SHIFT     4
>>> +#define VRB2_GRP_ID_SHIFT    12
>>> +#define VRB2_VF_ID_SHIFT     6
>>>
>>>    /* Helper macro for logging */
>>>    #define rte_acc_log(level, fmt, ...) \ @@ -332,6 +347,37 @@ struct
>>> __rte_packed acc_fcw_fft {
>>>    		res:19;
>>>    };
>>>
>>> +/* FFT Frame Control Word. */
>>> +struct __rte_packed acc_fcw_fft_3 {
>>> +	uint32_t in_frame_size:16,
>>> +		leading_pad_size:16;
>>> +	uint32_t out_frame_size:16,
>>> +		leading_depad_size:16;
>>> +	uint32_t cs_window_sel;
>>> +	uint32_t cs_window_sel2:16,
>>> +		cs_enable_bmap:16;
>>> +	uint32_t num_antennas:8,
>>> +		idft_size:8,
>>> +		dft_size:8,
>>> +		cs_offset:8;
>>> +	uint32_t idft_shift:8,
>>> +		dft_shift:8,
>>> +		cs_multiplier:16;
>>> +	uint32_t bypass:2,
>>> +		fp16_in:1,
>>> +		fp16_out:1,
>>> +		exp_adj:4,
>>> +		power_shift:4,
>>> +		power_en:1,
>>> +		enable_dewin:1,
>>> +		freq_resample_mode:2,
>>> +		depad_output_size:16;
>>> +	uint16_t cs_theta_0[ACC_MAX_CS];
>>> +	uint32_t cs_theta_d[ACC_MAX_CS];
>>> +	int8_t cs_time_offset[ACC_MAX_CS];
>>> +};
>>> +
>>> +
>>>    /* MLD-TS Frame Control Word */
>>>    struct __rte_packed acc_fcw_mldts {
>>>    	uint32_t fcw_version:4,
>>> @@ -473,14 +519,14 @@ union acc_info_ring_data {
>>>    		uint16_t valid: 1;
>>>    	};
>>>    	struct {
>>> -		uint32_t aq_id_3: 6;
>>> -		uint32_t qg_id_3: 5;
>>> -		uint32_t vf_id_3: 6;
>>> -		uint32_t int_nb_3: 6;
>>> -		uint32_t msi_0_3: 1;
>>> -		uint32_t vf2pf_3: 6;
>>> -		uint32_t loop_3: 1;
>>> -		uint32_t valid_3: 1;
>>> +		uint32_t aq_id_vrb2: 6;
>>> +		uint32_t qg_id_vrb2: 5;
>>> +		uint32_t vf_id_vrb2: 6;
>>> +		uint32_t int_nb_vrb2: 6;
>>> +		uint32_t msi_0_vrb2: 1;
>>> +		uint32_t vf2pf_vrb2: 6;
>>> +		uint32_t loop_vrb2: 1;
>>> +		uint32_t valid_vrb2: 1;
>>>    	};
>>>    } __rte_packed;
>>>
>>> @@ -761,22 +807,105 @@ alloc_sw_rings_min_mem(struct rte_bbdev *dev,
>> struct acc_device *d,
>>>    	free_base_addresses(base_addrs, i);
>>>    }
>>>
>>> +/* Wrapper to provide VF index from ring data. */ static inline
>>> +uint16_t vf_from_ring(const union acc_info_ring_data ring_data,
>>> +uint16_t device_variant) {
>>
>> curly braces on a new line.
> 
> Thanks.
> 
>>
>>> +	if (device_variant == VRB2_VARIANT)
>>> +		return ring_data.vf_id_vrb2;
>>> +	else
>>> +		return ring_data.vf_id;
>>> +}
>>> +
>>> +/* Wrapper to provide QG index from ring data. */ static inline
>>> +uint16_t qg_from_ring(const union acc_info_ring_data ring_data,
>>> +uint16_t device_variant) {
>>> +	if (device_variant == VRB2_VARIANT)
>>> +		return ring_data.qg_id_vrb2;
>>> +	else
>>> +		return ring_data.qg_id;
>>> +}
>>> +
>>> +/* Wrapper to provide AQ index from ring data. */ static inline
>>> +uint16_t aq_from_ring(const union acc_info_ring_data ring_data,
>>> +uint16_t device_variant) {
>>> +	if (device_variant == VRB2_VARIANT)
>>> +		return ring_data.aq_id_vrb2;
>>> +	else
>>> +		return ring_data.aq_id;
>>> +}
>>> +
>>> +/* Wrapper to provide int index from ring data. */ static inline
>>> +uint16_t int_from_ring(const union acc_info_ring_data ring_data,
>>> +uint16_t device_variant) {
>>> +	if (device_variant == VRB2_VARIANT)
>>> +		return ring_data.int_nb_vrb2;
>>> +	else
>>> +		return ring_data.int_nb;
>>> +}
>>> +
>>> +/* Wrapper to provide queue index from group and aq index. */ static
>>> +inline int queue_index(uint16_t group_idx, uint16_t aq_idx, uint16_t
>>> +device_variant) {
>>> +	if (device_variant == VRB2_VARIANT)
>>> +		return (group_idx << VRB2_GRP_ID_SHIFT) + aq_idx;
>>> +	else
>>> +		return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx; }
>>> +
>>> +/* Wrapper to provide queue group from queue index. */ static inline
>>> +int qg_from_q(uint32_t q_idx, uint16_t device_variant) {
>>> +	if (device_variant == VRB2_VARIANT)
>>> +		return (q_idx >> VRB2_GRP_ID_SHIFT) & 0x1F;
>>> +	else
>>> +		return (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF; }
>>> +
>>> +/* Wrapper to provide vf from queue index. */ static inline int32_t
>>> +vf_from_q(uint32_t q_idx, uint16_t device_variant) {
>>> +	if (device_variant == VRB2_VARIANT)
>>> +		return (q_idx >> VRB2_VF_ID_SHIFT)  & 0x3F;
>>> +	else
>>> +		return (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F; }
>>> +
>>> +/* Wrapper to provide aq index from queue index. */ static inline
>>> +int32_t aq_from_q(uint32_t q_idx, uint16_t device_variant) {
>>> +	if (device_variant == VRB2_VARIANT)
>>> +		return q_idx & 0x3F;
>>> +	else
>>> +		return q_idx & 0xF;
>>> +}
>>> +
>>> +/* Wrapper to set VF index in ring data. */ static inline int32_t
>>> +set_vf_in_ring(volatile union acc_info_ring_data *ring_data,
>>> +		uint16_t device_variant, uint16_t value) {
>>> +	if (device_variant == VRB2_VARIANT)
>>> +		return ring_data->vf_id_vrb2 = value;
>>> +	else
>>> +		return ring_data->vf_id = value;
>>> +}
>>> +
>>>    /*
>>>     * Find queue_id of a device queue based on details from the Info Ring.
>>>     * If a queue isn't found UINT16_MAX is returned.
>>>     */
>>>    static inline uint16_t
>>>    get_queue_id_from_ring_info(struct rte_bbdev_data *data,
>>> -		const union acc_info_ring_data ring_data)
>>> +		const union acc_info_ring_data ring_data, uint16_t
>> device_variant)
>>
>> As I suggested on v2:
>>
>> get_queue_id_from_ring_info(struct rte_bbdev_data *data,
>> 	const union acc_info_ring_data ring_data) {
>> 	struct acc_device *d = data->dev_private;
>>
>> 	...
>>
>> 	if (acc_q != NULL && acc_q->aq_id == aq_from_ring(d, ring_data) &&
>> ...
>>
>> }
>>
>> with
>>
>> /* Wrapper to provide AQ index from ring data. */ tatic inline uint16_t
>> aq_from_ring(struct acc_device *d, const union acc_info_ring_data ring_data)
>> {
>> 	if (d->device_variant == VRB2_VARIANT)
>> 		return ring_data.aq_id_vrb2;
>> 	else
>> 		return ring_data.aq_id;
>> }
>>
> 
> I will change the get_queue_id_from_ring_info() to have a smaller prototype
> but I don’t plan on changing the other new underlying funs to use dev instead of the variant
> in prototype,
> I don’t see a reason to as these only need this very member.

IMHO, reason is it cost nothing and is more future proof.

Also, my initial idea was to have an intermediate representation, like:

struct acc_queue_info { // Not sure about the name
	uint16_t vf_id;
	uint8_t qgrp_id;
	uint16_t aq_id;
};

Then we have a single callback for each variant

static void
vrb1_ring_data_to_queue_info(const union acc_info_ring_data ring_data,
		struct acc_queue_info *queue_info)
{
	queue_info->vf_id = ring_data.vf_id;
	queue_info->qgrp_id = ...
}

static void
vrb2_ring_data_to_queue_info(const union acc_info_ring_data ring_data,
		struct acc_queue_info *queue_info)
{

}

The acc_queue_info struct can also be used in struct acc_queue, so we
use same format everywhere.

I think it will be less verbose, and quicker to add new variants without
risking to miss adding "else if (d->device_variant == VRBx_VARIANT)"
anywhere.

What do you think?

> 
>>>    {
>>>    	uint16_t queue_id;
>>> +	struct acc_queue *acc_q;
>>>
>>>    	for (queue_id = 0; queue_id < data->num_queues; ++queue_id) {
>>> -		struct acc_queue *acc_q =
>>> -				data->queues[queue_id].queue_private;
>>> -		if (acc_q != NULL && acc_q->aq_id == ring_data.aq_id &&
>>> -				acc_q->qgrp_id == ring_data.qg_id &&
>>> -				acc_q->vf_id == ring_data.vf_id)
>>> +		acc_q = data->queues[queue_id].queue_private;
>>> +
>>> +		if (acc_q != NULL && acc_q->aq_id ==
>> aq_from_ring(ring_data, device_variant) &&
>>> +				acc_q->qgrp_id == qg_from_ring(ring_data,
>> device_variant) &&
>>> +				acc_q->vf_id == vf_from_ring(ring_data,
>> device_variant))
>>>    			return queue_id;
>>>    	}
>>>
>>> @@ -1438,4 +1567,11 @@ get_num_cbs_in_tb_ldpc_enc(struct
>> rte_bbdev_op_ldpc_enc *ldpc_enc)
>>>    	return cbs_in_tb;
>>>    }
>>>
>>> +static inline void
>>> +acc_reg_fast_write(struct acc_device *d, uint32_t offset, uint32_t
>>> +value) {
>>> +	void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset);
>>> +	mmio_write(reg_addr, value);
>>> +}
>>> +
>>>    #endif /* _ACC_COMMON_H_ */
>>> diff --git a/drivers/baseband/acc/rte_acc100_pmd.c
>>> b/drivers/baseband/acc/rte_acc100_pmd.c
>>> index 5362d39c30..7f8d05b5a9 100644
>>> --- a/drivers/baseband/acc/rte_acc100_pmd.c
>>> +++ b/drivers/baseband/acc/rte_acc100_pmd.c
>>> @@ -294,7 +294,7 @@ acc100_pf_interrupt_handler(struct rte_bbdev *dev)
>>>    		case ACC100_PF_INT_DMA_UL5G_DESC_IRQ:
>>>    		case ACC100_PF_INT_DMA_DL5G_DESC_IRQ:
>>>    			deq_intr_det.queue_id =
>> get_queue_id_from_ring_info(
>>> -					dev->data, *ring_data);
>>> +					dev->data, *ring_data, acc100_dev-
>>> device_variant);
>>>    			if (deq_intr_det.queue_id == UINT16_MAX) {
>>>    				rte_bbdev_log(ERR,
>>>    						"Couldn't find queue: aq_id:
>> %u, qg_id: %u, vf_id: %u", @@
>>> -348,7 +348,7 @@ acc100_vf_interrupt_handler(struct rte_bbdev *dev)
>>>    			 */
>>>    			ring_data->vf_id = 0;
>>>    			deq_intr_det.queue_id =
>> get_queue_id_from_ring_info(
>>> -					dev->data, *ring_data);
>>> +					dev->data, *ring_data, acc100_dev-
>>> device_variant);
>>>    			if (deq_intr_det.queue_id == UINT16_MAX) {
>>>    				rte_bbdev_log(ERR,
>>>    						"Couldn't find queue: aq_id:
>> %u, qg_id: %u", diff --git
>>> a/drivers/baseband/acc/rte_vrb_pmd.c
>>> b/drivers/baseband/acc/rte_vrb_pmd.c
>>> index a1de012b40..c89c26c59a 100644
>>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
>>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
>>> @@ -341,17 +341,18 @@ static inline void
>>>    vrb_check_ir(struct acc_device *acc_dev)
>>>    {
>>>    	volatile union acc_info_ring_data *ring_data;
>>> -	uint16_t info_ring_head = acc_dev->info_ring_head;
>>> +	uint16_t info_ring_head = acc_dev->info_ring_head, int_nb;
>>>    	if (unlikely(acc_dev->info_ring == NULL))
>>>    		return;
>>>
>>>    	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
>>> ACC_INFO_RING_MASK);
>>>
>>>    	while (ring_data->valid) {
>>> -		if ((ring_data->int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
>>> -				ring_data->int_nb >
>> ACC_PF_INT_DMA_MLD_DESC_IRQ)) {
>>> +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
>>> +		if ((int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
>>> +				int_nb > ACC_PF_INT_DMA_MLD_DESC_IRQ))
>> {
>>>    			rte_bbdev_log(WARNING, "InfoRing: ITR:%d
>> Info:0x%x",
>>> -					ring_data->int_nb, ring_data-
>>> detailed_info);
>>> +					int_nb, ring_data->detailed_info);
>>>    			/* Initialize Info Ring entry and move forward. */
>>>    			ring_data->val = 0;
>>>    		}
>>> @@ -368,16 +369,21 @@ vrb_dev_interrupt_handler(void *cb_arg)
>>>    	struct acc_device *acc_dev = dev->data->dev_private;
>>>    	volatile union acc_info_ring_data *ring_data;
>>>    	struct acc_deq_intr_details deq_intr_det;
>>> +	uint16_t vf_id, aq_id, qg_id, int_nb;
>>>
>>>    	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
>>> ACC_INFO_RING_MASK);
>>>
>>>    	while (ring_data->valid) {
>>> +		vf_id = vf_from_ring(*ring_data, acc_dev->device_variant);
>>> +		aq_id = aq_from_ring(*ring_data, acc_dev->device_variant);
>>> +		qg_id = qg_from_ring(*ring_data, acc_dev->device_variant);
>>> +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
>>>    		if (acc_dev->pf_device) {
>>>    			rte_bbdev_log_debug(
>>> -					"VRB1 PF Interrupt received, Info Ring
>> data: 0x%x -> %d",
>>> -					ring_data->val, ring_data->int_nb);
>>> +					"PF Interrupt received, Info Ring data:
>> 0x%x -> %d",
>>> +					ring_data->val, int_nb);
>>>
>>> -			switch (ring_data->int_nb) {
>>> +			switch (int_nb) {
>>>    			case ACC_PF_INT_DMA_DL_DESC_IRQ:
>>>    			case ACC_PF_INT_DMA_UL_DESC_IRQ:
>>>    			case ACC_PF_INT_DMA_FFT_DESC_IRQ:
>>> @@ -385,13 +391,11 @@ vrb_dev_interrupt_handler(void *cb_arg)
>>>    			case ACC_PF_INT_DMA_DL5G_DESC_IRQ:
>>>    			case ACC_PF_INT_DMA_MLD_DESC_IRQ:
>>>    				deq_intr_det.queue_id =
>> get_queue_id_from_ring_info(
>>> -						dev->data, *ring_data);
>>> +						dev->data, *ring_data,
>> acc_dev->device_variant);
>>>    				if (deq_intr_det.queue_id == UINT16_MAX) {
>>>    					rte_bbdev_log(ERR,
>>>    							"Couldn't find queue:
>> aq_id: %u, qg_id: %u, vf_id: %u",
>>> -							ring_data->aq_id,
>>> -							ring_data->qg_id,
>>> -							ring_data->vf_id);
>>> +							aq_id, qg_id, vf_id);
>>>    					return;
>>>    				}
>>>    				rte_bbdev_pmd_callback_process(dev,
>>> @@ -403,9 +407,9 @@ vrb_dev_interrupt_handler(void *cb_arg)
>>>    			}
>>>    		} else {
>>>    			rte_bbdev_log_debug(
>>> -					"VRB1 VF Interrupt received, Info Ring
>> data: 0x%x\n",
>>> +					"VRB VF Interrupt received, Info Ring
>> data: 0x%x\n",
>>>    					ring_data->val);
>>> -			switch (ring_data->int_nb) {
>>> +			switch (int_nb) {
>>>    			case ACC_VF_INT_DMA_DL_DESC_IRQ:
>>>    			case ACC_VF_INT_DMA_UL_DESC_IRQ:
>>>    			case ACC_VF_INT_DMA_FFT_DESC_IRQ:
>>> @@ -413,14 +417,13 @@ vrb_dev_interrupt_handler(void *cb_arg)
>>>    			case ACC_VF_INT_DMA_DL5G_DESC_IRQ:
>>>    			case ACC_VF_INT_DMA_MLD_DESC_IRQ:
>>>    				/* VFs are not aware of their vf_id - it's set to
>> 0.  */
>>> -				ring_data->vf_id = 0;
>>> +				set_vf_in_ring(ring_data, acc_dev-
>>> device_variant, 0);
>>>    				deq_intr_det.queue_id =
>> get_queue_id_from_ring_info(
>>> -						dev->data, *ring_data);
>>> +						dev->data, *ring_data,
>> acc_dev->device_variant);
>>>    				if (deq_intr_det.queue_id == UINT16_MAX) {
>>>    					rte_bbdev_log(ERR,
>>>    							"Couldn't find queue:
>> aq_id: %u, qg_id: %u",
>>> -							ring_data->aq_id,
>>> -							ring_data->qg_id);
>>> +							aq_id, qg_id);
>>>    					return;
>>>    				}
>>>    				rte_bbdev_pmd_callback_process(dev,
>>> @@ -435,8 +438,7 @@ vrb_dev_interrupt_handler(void *cb_arg)
>>>    		/* Initialize Info Ring entry and move forward. */
>>>    		ring_data->val = 0;
>>>    		++acc_dev->info_ring_head;
>>> -		ring_data = acc_dev->info_ring +
>>> -				(acc_dev->info_ring_head &
>> ACC_INFO_RING_MASK);
>>> +		ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
>>> +ACC_INFO_RING_MASK);
>>>    	}
>>>    }
>>>
>>> @@ -556,8 +558,7 @@ vrb_setup_queues(struct rte_bbdev *dev, uint16_t
>>> num_queues, int socket_id)
>>>
>>>    	/* Configure tail pointer for use when SDONE enabled. */
>>>    	if (d->tail_ptrs == NULL)
>>> -		d->tail_ptrs = rte_zmalloc_socket(
>>> -				dev->device->driver->name,
>>> +		d->tail_ptrs = rte_zmalloc_socket(dev->device->driver->name,
>>>    				VRB_MAX_QGRPS * VRB_MAX_AQS *
>> sizeof(uint32_t),
>>>    				RTE_CACHE_LINE_SIZE, socket_id);
>>>    	if (d->tail_ptrs == NULL) {
>>> @@ -783,7 +784,7 @@ vrb_find_free_queue_idx(struct rte_bbdev *dev,
>>>    			/* Mark the Queue as assigned. */
>>>    			d->q_assigned_bit_map[group_idx] |= (1ULL <<
>> aq_idx);
>>>    			/* Report the AQ Index. */
>>> -			return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx;
>>> +			return queue_index(group_idx, aq_idx, d-
>>> device_variant);
>>>    		}
>>>    	}
>>>    	rte_bbdev_log(INFO, "Failed to find free queue on %s, priority %u",
>>> @@ -922,9 +923,10 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t
>> queue_id,
>>>    		}
>>>    	}
>>>
>>> -	q->qgrp_id = (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF;
>>> -	q->vf_id = (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F;
>>> -	q->aq_id = q_idx & 0xF;
>>> +	q->qgrp_id = qg_from_q(q_idx, d->device_variant);
>>> +	q->vf_id = vf_from_q(q_idx, d->device_variant);
>>> +	q->aq_id = aq_from_q(q_idx, d->device_variant);
>>> +
>>>    	q->aq_depth = 0;
>>>    	if (conf->op_type ==  RTE_BBDEV_OP_TURBO_DEC)
>>>    		q->aq_depth = (1 << d->acc_conf.q_ul_4g.aq_depth_log2);
>>> @@ -1311,7 +1313,7 @@ vrb_fcw_td_fill(const struct rte_bbdev_dec_op
>> *op, struct acc_fcw_td *fcw)
>>>    		fcw->bypass_teq = 0;
>>>    	}
>>>
>>> -	fcw->code_block_mode = 1; /* FIXME */
>>> +	fcw->code_block_mode = 1;
>>
>> Could you remind me what was the issue?
> 
> Historically there was the intention to use a difference format option in the fcw to help with the TB mode but that is not considered anymore.

Ok.

> 
>>
>>>    	fcw->turbo_crc_type = check_bit(op->turbo_dec.op_flags,
>>>    			RTE_BBDEV_TURBO_CRC_TYPE_24B);
>>>
>>> @@ -1471,8 +1473,8 @@ vrb_dma_desc_td_fill(struct rte_bbdev_dec_op
>> *op,
>>>    	if (op->turbo_dec.code_block_mode ==
>> RTE_BBDEV_TRANSPORT_BLOCK) {
>>>    		k = op->turbo_dec.tb_params.k_pos;
>>>    		e = (r < op->turbo_dec.tb_params.cab)
>>> -			? op->turbo_dec.tb_params.ea
>>> -			: op->turbo_dec.tb_params.eb;
>>> +				? op->turbo_dec.tb_params.ea
>>> +				: op->turbo_dec.tb_params.eb;
>>>    	} else {
>>>    		k = op->turbo_dec.cb_params.k;
>>>    		e = op->turbo_dec.cb_params.e;
>>> @@ -1726,7 +1728,7 @@ vrb_dma_desc_ld_update(struct
>> rte_bbdev_dec_op *op,
>>>    	desc->op_addr = op;
>>>    }
>>>
>>> -/* Enqueue one encode operations for device in CB mode */
>>> +/* Enqueue one encode operations for device in CB mode. */
>>>    static inline int
>>>    enqueue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op
>> *op,
>>>    		uint16_t total_enqueued_cbs)
>>> @@ -2263,7 +2265,7 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct
>> acc_queue *q, struct rte_bbdev_dec_op *op,
>>>    	return current_enqueued_cbs;
>>>    }
>>>
>>> -/* Enqueue one decode operations for device in TB mode */
>>> +/* Enqueue one decode operations for device in TB mode. */
>>>    static inline int
>>>    enqueue_dec_one_op_tb(struct acc_queue *q, struct rte_bbdev_dec_op
>> *op,
>>>    		uint16_t total_enqueued_cbs, uint8_t cbs_in_tb)
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 02/12] baseband/acc: add FFT window width in the VRB PMD
  2023-10-03 19:06     ` Chautru, Nicolas
@ 2023-10-04  7:55       ` Maxime Coquelin
  0 siblings, 0 replies; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-04  7:55 UTC (permalink / raw)
  To: Chautru, Nicolas, dev, hemant.agrawal; +Cc: david.marchand, Vargas, Hernan



On 10/3/23 21:06, Chautru, Nicolas wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Tuesday, October 3, 2023 4:52 AM
>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
>> <hernan.vargas@intel.com>
>> Subject: Re: [PATCH v3 02/12] baseband/acc: add FFT window width in the
>> VRB PMD
>>
>>
>>
>> On 9/29/23 18:35, Nicolas Chautru wrote:
>>> This allows to expose the FFT window width being introduced in
>>> previous commit based on what is configured dynamically on the device
>>> platform.
>>>
>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
>>> ---
>>>    drivers/baseband/acc/acc_common.h  |  3 +++
>>>    drivers/baseband/acc/rte_vrb_pmd.c | 29
>> +++++++++++++++++++++++++++++
>>>    2 files changed, 32 insertions(+)
>>>
>>> diff --git a/drivers/baseband/acc/acc_common.h
>>> b/drivers/baseband/acc/acc_common.h
>>> index 5bb00746c3..7d24c644c0 100644
>>> --- a/drivers/baseband/acc/acc_common.h
>>> +++ b/drivers/baseband/acc/acc_common.h
>>> @@ -512,6 +512,8 @@ struct acc_deq_intr_details {
>>>    enum {
>>>    	ACC_VF2PF_STATUS_REQUEST = 1,
>>>    	ACC_VF2PF_USING_VF = 2,
>>> +	ACC_VF2PF_LUT_VER_REQUEST = 3,
>>> +	ACC_VF2PF_FFT_WIN_REQUEST = 4,
>>>    };
>>>
>>>
>>> @@ -558,6 +560,7 @@ struct acc_device {
>>>    	queue_offset_fun_t queue_offset;  /* Device specific queue offset */
>>>    	uint16_t num_qgroups;
>>>    	uint16_t num_aqs;
>>> +	uint16_t fft_window_width[RTE_BBDEV_MAX_FFT_WIN]; /* FFT
>> windowing
>>> +width. */
>>>    };
>>>
>>>    /* Structure associated with each queue. */ diff --git
>>> a/drivers/baseband/acc/rte_vrb_pmd.c
>>> b/drivers/baseband/acc/rte_vrb_pmd.c
>>> index 9e5a73c9c7..c5a74bae11 100644
>>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
>>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
>>> @@ -298,6 +298,34 @@ vrb_device_status(struct rte_bbdev *dev)
>>>    	return reg;
>>>    }
>>>
>>> +/* Request device FFT windowing information. */ static inline void
>>> +vrb_device_fft_win(struct rte_bbdev *dev, struct
>>> +rte_bbdev_driver_info *dev_info) {
>>> +	struct acc_device *d = dev->data->dev_private;
>>> +	uint32_t reg, time_out = 0, win;
>>> +
>>> +	if (d->pf_device)
>>> +		return;
>>> +
>>> +	/* Check from the device the first time. */
>>> +	if (d->fft_window_width[0] == 0) {
>>
>> O is not a possible value? Might not be a big deal as it would just perform
>> reading of 16 x 2 registers every time .info_get() is called, just wondering.
> 
> This is impossible for this to be null. It would mean forcing a zero output all the time. Cannot happen fundamentally.

Ack.

> 
>>
>>> +		for (win = 0; win < RTE_BBDEV_MAX_FFT_WIN; win++) {
>>> +			vrb_vf2pf(d, ACC_VF2PF_FFT_WIN_REQUEST | win);
>>
>> That looks broken, as extending RTE_BBDEV_MAX_FFT_WIN to support other
>> devices may break this implementation.
> 
> I don’t believe so. 16 windows shapes is a fairly large, this already takes a lot of memory to store all this.

Maybe, but the issue here is you rely on some generic BBDEV defines as
an offset to access HW registers in your device, that's wrong IMO as the
define may evolve in the future. At least you should define what is the
maximum FFT windows for your device, a use the minimum value between the
two.

But the suggestion you make below is better

> 
>>
>> To me, it tends to show how this fft_window_width array is more device
>> specific than generic.
> 
> I don't see why you say this really. This is fundamentally what windowing is. This is a given section of the FFT output where you apply a point-wise multiplication based on a given window shape, hence the size is scaled up and down based on the FFT size.
> This width information is required to estimate about much noise to remove by applying such windowing, hence this is enumerated during device enumeration.
> Then the number of windows available is a discrete numbers as mentioned above based on how many of these are programmed on the device (these needs to be stored in HW memory).
> 
> Would you prefer to expose an additional parameter for the number of windows in the capability (ie. size of that array) then a pointer for the actual array? That is okay with me and probably better. Please kindly confirm.
> Also Herman feel free to chime in.
> 
> Ie.
> 		{
> 			.type	= RTE_BBDEV_OP_FFT,
> 			.cap.fft = {
> 				.capability_flags = (...),
> 				.num_windows = 16,
> 			}
> 		},

That would be better IMO.

>>
>>> +			reg = acc_reg_read(d, d->reg_addr->pf2vf_doorbell);
>>> +			while ((time_out < ACC_STATUS_TO) && (reg ==
>> RTE_BBDEV_DEV_NOSTATUS)) {
>>> +				usleep(ACC_STATUS_WAIT); /*< Wait or VF-
>>> PF->VF Comms */
>>> +				reg = acc_reg_read(d, d->reg_addr-
>>> pf2vf_doorbell);
>>> +				time_out++;
>>> +			}
>>> +			d->fft_window_width[win] = reg;
>>> +		}
>>> +	}
>>> +
>>> +	for (win = 0; win < RTE_BBDEV_MAX_FFT_WIN; win++)
>>> +		dev_info->fft_window_width[win] = d-
>>> fft_window_width[win]; }
>>> +
>>>    /* Checks PF Info Ring to find the interrupt cause and handles it
>> accordingly. */
>>>    static inline void
>>>    vrb_check_ir(struct acc_device *acc_dev) @@ -1100,6 +1128,7 @@
>>> vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info
>> *dev_info)
>>>    	fetch_acc_config(dev);
>>>    	/* Check the status of device. */
>>>    	dev_info->device_status = vrb_device_status(dev);
>>> +	vrb_device_fft_win(dev, dev_info);
>>>
>>>    	/* Exposed number of queues. */
>>>    	dev_info->num_queues[RTE_BBDEV_OP_NONE] = 0;
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 08/12] baseband/acc: add FEC capabilities for the VRB2 variant
  2023-10-03 14:28   ` Maxime Coquelin
@ 2023-10-04 21:11     ` Chautru, Nicolas
  2023-10-05 14:36       ` Maxime Coquelin
  0 siblings, 1 reply; 42+ messages in thread
From: Chautru, Nicolas @ 2023-10-04 21:11 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan

Hi Maxime, 

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, October 3, 2023 7:28 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v3 08/12] baseband/acc: add FEC capabilities for the VRB2
> variant
> 
> 
> 
> On 9/29/23 18:35, Nicolas Chautru wrote:
> > New implementation for some of the FEC features specific to the VRB2
> > variant.
> >
> > Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> > ---
> >   drivers/baseband/acc/rte_vrb_pmd.c | 567
> ++++++++++++++++++++++++++++-
> >   1 file changed, 548 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
> > b/drivers/baseband/acc/rte_vrb_pmd.c
> > index 48e779ce77..93add82947 100644
> > --- a/drivers/baseband/acc/rte_vrb_pmd.c
> > +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> > @@ -1235,6 +1235,94 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct
> rte_bbdev_driver_info *dev_info)
> >   	};
> >
> >   	static const struct rte_bbdev_op_cap vrb2_bbdev_capabilities[] = {
> > +		{
> > +			.type = RTE_BBDEV_OP_TURBO_DEC,
> > +			.cap.turbo_dec = {
> > +				.capability_flags =
> > +
> 	RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE |
> > +					RTE_BBDEV_TURBO_CRC_TYPE_24B |
> > +
> 	RTE_BBDEV_TURBO_DEC_CRC_24B_DROP |
> > +					RTE_BBDEV_TURBO_EQUALIZER |
> > +
> 	RTE_BBDEV_TURBO_SOFT_OUT_SATURATE |
> > +
> 	RTE_BBDEV_TURBO_HALF_ITERATION_EVEN |
> > +
> 	RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH |
> > +					RTE_BBDEV_TURBO_SOFT_OUTPUT |
> > +
> 	RTE_BBDEV_TURBO_EARLY_TERMINATION |
> > +
> 	RTE_BBDEV_TURBO_DEC_INTERRUPTS |
> > +
> 	RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN |
> > +
> 	RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT |
> > +					RTE_BBDEV_TURBO_MAP_DEC |
> > +
> 	RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP |
> > +
> 	RTE_BBDEV_TURBO_DEC_SCATTER_GATHER,
> > +				.max_llr_modulus = INT8_MAX,
> > +				.num_buffers_src =
> > +
> 	RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
> > +				.num_buffers_hard_out =
> > +
> 	RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
> > +				.num_buffers_soft_out =
> > +
> 	RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
> > +			}
> > +		},
> > +		{
> > +			.type = RTE_BBDEV_OP_TURBO_ENC,
> > +			.cap.turbo_enc = {
> > +				.capability_flags =
> > +
> 	RTE_BBDEV_TURBO_CRC_24B_ATTACH |
> > +
> 	RTE_BBDEV_TURBO_RV_INDEX_BYPASS |
> > +					RTE_BBDEV_TURBO_RATE_MATCH |
> > +
> 	RTE_BBDEV_TURBO_ENC_INTERRUPTS |
> > +
> 	RTE_BBDEV_TURBO_ENC_SCATTER_GATHER,
> > +				.num_buffers_src =
> > +
> 	RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
> > +				.num_buffers_dst =
> > +
> 	RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
> > +			}
> > +		},
> > +		{
> > +			.type   = RTE_BBDEV_OP_LDPC_ENC,
> > +			.cap.ldpc_enc = {
> > +				.capability_flags =
> > +					RTE_BBDEV_LDPC_RATE_MATCH |
> > +					RTE_BBDEV_LDPC_CRC_24B_ATTACH
> |
> > +
> 	RTE_BBDEV_LDPC_INTERLEAVER_BYPASS |
> > +					RTE_BBDEV_LDPC_ENC_INTERRUPTS
> |
> > +
> 	RTE_BBDEV_LDPC_ENC_SCATTER_GATHER |
> > +
> 	RTE_BBDEV_LDPC_ENC_CONCATENATION,
> > +				.num_buffers_src =
> > +
> 	RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
> > +				.num_buffers_dst =
> > +
> 	RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
> > +			}
> > +		},
> > +		{
> > +			.type   = RTE_BBDEV_OP_LDPC_DEC,
> > +			.cap.ldpc_dec = {
> > +			.capability_flags =
> > +				RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK |
> > +				RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP |
> > +				RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK |
> > +				RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK |
> > +				RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE
> |
> > +
> 	RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE |
> > +				RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE
> |
> > +				RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS |
> > +				RTE_BBDEV_LDPC_DEC_SCATTER_GATHER |
> > +
> 	RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION |
> > +
> 	RTE_BBDEV_LDPC_HARQ_4BIT_COMPRESSION |
> > +				RTE_BBDEV_LDPC_LLR_COMPRESSION |
> > +				RTE_BBDEV_LDPC_SOFT_OUT_ENABLE |
> > +				RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS |
> > +
> 	RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS |
> > +				RTE_BBDEV_LDPC_DEC_INTERRUPTS,
> > +			.llr_size = 8,
> > +			.llr_decimals = 2,
> > +			.num_buffers_src =
> > +
> 	RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
> > +			.num_buffers_hard_out =
> > +
> 	RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
> > +			.num_buffers_soft_out = 0,
> > +			}
> > +		},
> >   		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
> >   	};
> >
> > @@ -1774,6 +1862,141 @@ vrb1_dma_desc_ld_fill(struct rte_bbdev_dec_op
> *op,
> >   	return 0;
> >   }
> >
> > +/* Fill in a frame control word for LDPC decoding. */ static inline
> > +void vrb2_fcw_ld_fill(struct rte_bbdev_dec_op *op, struct acc_fcw_ld
> > +*fcw,
> > +		union acc_harq_layout_data *harq_layout) {
> > +	uint16_t harq_out_length, harq_in_length, ncb_p, k0_p, parity_offset;
> > +	uint32_t harq_index;
> > +	uint32_t l;
> 
> 
> This is so similar with vrb1_fcw_ld_fill() that it does not make sense
> to duplicate so much code.
> 
> Do you confirm there are no other difference than the SOFT_OUT stuff,
> and reusing vrb2_fcw_ld_fill on VRB1 would just work as the op_flags are
> checked (and they should not be set if capability is not advertized)?

There are quite of lot of difference to the fundamental underlying IP, the  IP decoder is different with different tuning point, the SO and HARQ support are different. 
Still I believe we can support both in the same function without being a too much a problem moving forward. Doing this in v4. 


> 
> > +	fcw->qm = op->ldpc_dec.q_m;
> > +	fcw->nfiller = op->ldpc_dec.n_filler;
> > +	fcw->BG = (op->ldpc_dec.basegraph - 1);
> > +	fcw->Zc = op->ldpc_dec.z_c;
> > +	fcw->ncb = op->ldpc_dec.n_cb;
> > +	fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_dec.basegraph,
> > +			op->ldpc_dec.rv_index);
> > +	if (op->ldpc_dec.code_block_mode == RTE_BBDEV_CODE_BLOCK)
> > +		fcw->rm_e = op->ldpc_dec.cb_params.e;
> > +	else
> > +		fcw->rm_e = (op->ldpc_dec.tb_params.r <
> > +				op->ldpc_dec.tb_params.cab) ?
> > +						op->ldpc_dec.tb_params.ea :
> > +						op->ldpc_dec.tb_params.eb;
> > +
> > +	if (unlikely(check_bit(op->ldpc_dec.op_flags,
> > +			RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE) &&
> > +			(op->ldpc_dec.harq_combined_input.length == 0))) {
> > +		rte_bbdev_log(WARNING, "Null HARQ input size provided");
> > +		/* Disable HARQ input in that case to carry forward. */
> > +		op->ldpc_dec.op_flags ^=
> RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE;
> > +	}
> > +	if (unlikely(fcw->rm_e == 0)) {
> > +		rte_bbdev_log(WARNING, "Null E input provided");
> > +		fcw->rm_e = 2;
> > +	}
> > +
> > +	fcw->hcin_en = check_bit(op->ldpc_dec.op_flags,
> > +			RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE);
> > +	fcw->hcout_en = check_bit(op->ldpc_dec.op_flags,
> > +			RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE);
> > +	fcw->crc_select = check_bit(op->ldpc_dec.op_flags,
> > +			RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK);
> > +	fcw->so_en = check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_SOFT_OUT_ENABLE);
> > +	fcw->so_bypass_intlv = check_bit(op->ldpc_dec.op_flags,
> > +
> 	RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS);
> > +	fcw->so_bypass_rm = check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS);
> > +	fcw->bypass_dec = 0;
> > +	fcw->bypass_intlv = check_bit(op->ldpc_dec.op_flags,
> > +			RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS);
> > +	if (op->ldpc_dec.q_m == 1) {
> > +		fcw->bypass_intlv = 1;
> > +		fcw->qm = 2;
> > +	}
> > +	if (check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION)) {
> > +		fcw->hcin_decomp_mode = 1;
> > +		fcw->hcout_comp_mode = 1;
> > +	} else if (check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_HARQ_4BIT_COMPRESSION)) {
> > +		fcw->hcin_decomp_mode = 4;
> > +		fcw->hcout_comp_mode = 4;
> > +	} else {
> > +		fcw->hcin_decomp_mode = 0;
> > +		fcw->hcout_comp_mode = 0;
> > +	}
> > +
> > +	fcw->llr_pack_mode = check_bit(op->ldpc_dec.op_flags,
> > +			RTE_BBDEV_LDPC_LLR_COMPRESSION);
> > +	harq_index = hq_index(op->ldpc_dec.harq_combined_output.offset);
> > +	if (fcw->hcin_en > 0) {
> > +		harq_in_length = op->ldpc_dec.harq_combined_input.length;
> > +		if (fcw->hcin_decomp_mode == 1)
> > +			harq_in_length = harq_in_length * 8 / 6;
> > +		else if (fcw->hcin_decomp_mode == 4)
> > +			harq_in_length = harq_in_length * 2;
> > +		harq_in_length = RTE_MIN(harq_in_length, op->ldpc_dec.n_cb
> > +				- op->ldpc_dec.n_filler);
> > +		harq_in_length = RTE_ALIGN_CEIL(harq_in_length, 64);
> > +		fcw->hcin_size0 = harq_in_length;
> > +		fcw->hcin_offset = 0;
> > +		fcw->hcin_size1 = 0;
> > +	} else {
> > +		fcw->hcin_size0 = 0;
> > +		fcw->hcin_offset = 0;
> > +		fcw->hcin_size1 = 0;
> > +	}
> > +
> > +	fcw->itmax = op->ldpc_dec.iter_max;
> > +	fcw->so_it = op->ldpc_dec.iter_max;
> > +	fcw->itstop = check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE);
> > +	fcw->cnu_algo = ACC_ALGO_MSA;
> > +	fcw->synd_precoder = fcw->itstop;
> > +
> > +	fcw->minsum_offset = 1;
> > +	fcw->dec_llrclip   = 2;
> > +
> > +	/*
> > +	 * These are all implicitly set
> > +	 * fcw->synd_post = 0;
> > +	 * fcw->dec_convllr = 0;
> > +	 * fcw->hcout_convllr = 0;
> > +	 * fcw->hcout_size1 = 0;
> > +	 * fcw->hcout_offset = 0;
> > +	 * fcw->negstop_th = 0;
> > +	 * fcw->negstop_it = 0;
> > +	 * fcw->negstop_en = 0;
> > +	 * fcw->gain_i = 1;
> > +	 * fcw->gain_h = 1;
> > +	 */
> > +	if (fcw->hcout_en > 0) {
> > +		parity_offset = (op->ldpc_dec.basegraph == 1 ? 20 : 8)
> > +			* op->ldpc_dec.z_c - op->ldpc_dec.n_filler;
> > +		k0_p = (fcw->k0 > parity_offset) ?
> > +				fcw->k0 - op->ldpc_dec.n_filler : fcw->k0;
> > +		ncb_p = fcw->ncb - op->ldpc_dec.n_filler;
> > +		l = k0_p + fcw->rm_e;
> > +		harq_out_length = (uint16_t) fcw->hcin_size0;
> > +		harq_out_length = RTE_MIN(RTE_MAX(harq_out_length, l),
> ncb_p);
> > +		harq_out_length = RTE_ALIGN_CEIL(harq_out_length, 64);
> > +		fcw->hcout_size0 = harq_out_length;
> > +		fcw->hcout_size1 = 0;
> > +		fcw->hcout_offset = 0;
> > +		harq_layout[harq_index].offset = fcw->hcout_offset;
> > +		harq_layout[harq_index].size0 = fcw->hcout_size0;
> > +	} else {
> > +		fcw->hcout_size0 = 0;
> > +		fcw->hcout_size1 = 0;
> > +		fcw->hcout_offset = 0;
> > +	}
> > +
> > +	fcw->tb_crc_select = 0;
> > +	if (check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK))
> > +		fcw->tb_crc_select = 2;
> > +	if (check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK))
> > +		fcw->tb_crc_select = 1;
> > +}
> > +
> >   static inline void
> >   vrb_dma_desc_ld_update(struct rte_bbdev_dec_op *op,
> >   		struct acc_dma_req_desc *desc,
> > @@ -1817,6 +2040,139 @@ vrb_dma_desc_ld_update(struct
> rte_bbdev_dec_op *op,
> >   	desc->op_addr = op;
> >   }
> >
> > +static inline int
> > +vrb2_dma_desc_ld_fill(struct rte_bbdev_dec_op *op,
> > +		struct acc_dma_req_desc *desc,
> > +		struct rte_mbuf **input, struct rte_mbuf *h_output,
> > +		uint32_t *in_offset, uint32_t *h_out_offset,
> > +		uint32_t *h_out_length, uint32_t *mbuf_total_left,
> > +		uint32_t *seg_total_left, struct acc_fcw_ld *fcw)
> > +{
> Same here.
> 
> I compared with vrb1_dma_desc_ld_fill(), and I don't see why we need two
> functions.
> 
> The only differences are either backed by capability checks, and vrb1
> already sets fcw->hcin_decomp_mode, so this code should work as-is on
> vrb1 if I'm not mistaken.

Yes fair enough, doing this in v3. 

> 
> > +	struct rte_bbdev_op_ldpc_dec *dec = &op->ldpc_dec;
> > +	int next_triplet = 1; /* FCW already done. */
> > +	uint32_t input_length;
> > +	uint16_t output_length, crc24_overlap = 0;
> > +	uint16_t sys_cols, K, h_p_size, h_np_size;
> > +
> > +	acc_header_init(desc);
> > +
> > +	if (check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP))
> > +		crc24_overlap = 24;
> > +
> > +	/* Compute some LDPC BG lengths. */
> > +	input_length = fcw->rm_e;
> > +	if (check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_LLR_COMPRESSION))
> > +		input_length = (input_length * 3 + 3) / 4;
> > +	sys_cols = (dec->basegraph == 1) ? 22 : 10;
> > +	K = sys_cols * dec->z_c;
> > +	output_length = K - dec->n_filler - crc24_overlap;
> > +
> > +	if (unlikely((*mbuf_total_left == 0) || (*mbuf_total_left <
> input_length))) {
> > +		rte_bbdev_log(ERR,
> > +				"Mismatch between mbuf length and included
> CB sizes: mbuf len %u, cb len %u",
> > +				*mbuf_total_left, input_length);
> > +		return -1;
> > +	}
> > +
> > +	next_triplet = acc_dma_fill_blk_type_in(desc, input,
> > +			in_offset, input_length,
> > +			seg_total_left, next_triplet,
> > +			check_bit(op->ldpc_dec.op_flags,
> > +			RTE_BBDEV_LDPC_DEC_SCATTER_GATHER));
> > +
> > +	if (unlikely(next_triplet < 0)) {
> > +		rte_bbdev_log(ERR,
> > +				"Mismatch between data to process and mbuf
> data length in bbdev_op: %p",
> > +				op);
> > +		return -1;
> > +	}
> > +
> > +	if (check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE)) {
> > +		if (op->ldpc_dec.harq_combined_input.data == 0) {
> > +			rte_bbdev_log(ERR, "HARQ input is not defined");
> > +			return -1;
> > +		}
> > +		h_p_size = fcw->hcin_size0 + fcw->hcin_size1;
> > +		if (fcw->hcin_decomp_mode == 1)
> > +			h_p_size = (h_p_size * 3 + 3) / 4;
> > +		else if (fcw->hcin_decomp_mode == 4)
> > +			h_p_size = h_p_size / 2;
> > +		if (op->ldpc_dec.harq_combined_input.data == 0) {
> > +			rte_bbdev_log(ERR, "HARQ input is not defined");
> > +			return -1;
> > +		}
> > +		acc_dma_fill_blk_type(
> > +				desc,
> > +				op->ldpc_dec.harq_combined_input.data,
> > +				op->ldpc_dec.harq_combined_input.offset,
> > +				h_p_size,
> > +				next_triplet,
> > +				ACC_DMA_BLKID_IN_HARQ);
> > +		next_triplet++;
> > +	}
> > +
> > +	desc->data_ptrs[next_triplet - 1].last = 1;
> > +	desc->m2dlen = next_triplet;
> > +	*mbuf_total_left -= input_length;
> > +
> > +	next_triplet = acc_dma_fill_blk_type(desc, h_output,
> > +			*h_out_offset, output_length >> 3, next_triplet,
> > +			ACC_DMA_BLKID_OUT_HARD);
> > +
> > +	if (check_bit(op->ldpc_dec.op_flags,
> RTE_BBDEV_LDPC_SOFT_OUT_ENABLE)) {
> > +		if (op->ldpc_dec.soft_output.data == 0) {
> > +			rte_bbdev_log(ERR, "Soft output is not defined");
> > +			return -1;
> > +		}
> > +		dec->soft_output.length = fcw->rm_e;
> > +		acc_dma_fill_blk_type(desc, dec->soft_output.data, dec-
> >soft_output.offset,
> > +				fcw->rm_e, next_triplet,
> ACC_DMA_BLKID_OUT_SOFT);
> > +		next_triplet++;
> > +	}
> > +
> > +	if (check_bit(op->ldpc_dec.op_flags,
> > +
> 	RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE)) {
> > +		if (op->ldpc_dec.harq_combined_output.data == 0) {
> > +			rte_bbdev_log(ERR, "HARQ output is not defined");
> > +			return -1;
> > +		}
> > +
> > +		/* Pruned size of the HARQ */
> > +		h_p_size = fcw->hcout_size0 + fcw->hcout_size1;
> > +		/* Non-Pruned size of the HARQ */
> > +		h_np_size = fcw->hcout_offset > 0 ?
> > +				fcw->hcout_offset + fcw->hcout_size1 :
> > +				h_p_size;
> > +		if (fcw->hcin_decomp_mode == 1) {
> > +			h_np_size = (h_np_size * 3 + 3) / 4;
> > +			h_p_size = (h_p_size * 3 + 3) / 4;
> > +		} else if (fcw->hcin_decomp_mode == 4) {
> > +			h_np_size = h_np_size / 2;
> > +			h_p_size = h_p_size / 2;
> > +		}
> > +		dec->harq_combined_output.length = h_np_size;
> > +		acc_dma_fill_blk_type(
> > +				desc,
> > +				dec->harq_combined_output.data,
> > +				dec->harq_combined_output.offset,
> > +				h_p_size,
> > +				next_triplet,
> > +				ACC_DMA_BLKID_OUT_HARQ);
> > +
> > +		next_triplet++;
> > +	}
> > +
> > +	*h_out_length = output_length >> 3;
> > +	dec->hard_output.length += *h_out_length;
> > +	*h_out_offset += *h_out_length;
> > +	desc->data_ptrs[next_triplet - 1].last = 1;
> > +	desc->d2mlen = next_triplet - desc->m2dlen;
> > +
> > +	desc->op_addr = op;
> > +
> > +	return 0;
> > +}
> > +
> >   /* Enqueue one encode operations for device in CB mode. */
> >   static inline int
> >   enqueue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op
> *op,
> > @@ -1877,6 +2233,7 @@ enqueue_ldpc_enc_n_op_cb(struct acc_queue *q,
> struct rte_bbdev_enc_op **ops,
> >   	/** This could be done at polling. */
> >   	acc_header_init(&desc->req);
> >   	desc->req.numCBs = num;
> > +	desc->req.dltb = 0;
> >
> >   	in_length_in_bytes = ops[0]->ldpc_enc.input.data->data_len;
> >   	out_length = (enc->cb_params.e + 7) >> 3;
> > @@ -2102,6 +2459,105 @@ vrb1_enqueue_ldpc_enc_one_op_tb(struct
> acc_queue *q, struct rte_bbdev_enc_op *op
> >   	return return_descs;
> >   }
> >
> > +/* Fill in a frame control word for LDPC encoding. */
> > +static inline void
> > +vrb2_fcw_letb_fill(const struct rte_bbdev_enc_op *op, struct acc_fcw_le
> *fcw)
> > +{
> > +	fcw->qm = op->ldpc_enc.q_m;
> > +	fcw->nfiller = op->ldpc_enc.n_filler;
> > +	fcw->BG = (op->ldpc_enc.basegraph - 1);
> > +	fcw->Zc = op->ldpc_enc.z_c;
> > +	fcw->ncb = op->ldpc_enc.n_cb;
> > +	fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_enc.basegraph,
> > +			op->ldpc_enc.rv_index);
> > +	fcw->rm_e = op->ldpc_enc.tb_params.ea;
> > +	fcw->rm_e_b = op->ldpc_enc.tb_params.eb;
> > +	fcw->crc_select = check_bit(op->ldpc_enc.op_flags,
> > +			RTE_BBDEV_LDPC_CRC_24B_ATTACH);
> > +	fcw->bypass_intlv = 0;
> > +	if (op->ldpc_enc.tb_params.c > 1) {
> > +		fcw->mcb_count = 0;
> > +		fcw->C = op->ldpc_enc.tb_params.c;
> > +		fcw->Cab = op->ldpc_enc.tb_params.cab;
> > +	} else {
> > +		fcw->mcb_count = 1;
> > +		fcw->C = 0;
> > +	}
> > +}
> > +
> > +/* Enqueue one encode operations for device in TB mode.
> > + * returns the number of descs used.
> > + */
> > +static inline int
> > +vrb2_enqueue_ldpc_enc_one_op_tb(struct acc_queue *q, struct
> rte_bbdev_enc_op *op,
> > +		uint16_t enq_descs)
> > +{
> > +	union acc_dma_desc *desc = NULL;
> > +	uint32_t in_offset, out_offset, out_length, seg_total_left;
> > +	struct rte_mbuf *input, *output_head, *output;
> > +
> > +	uint16_t desc_idx = ((q->sw_ring_head + enq_descs) & q-
> >sw_ring_wrap_mask);
> > +	desc = q->ring_addr + desc_idx;
> 
> Use acc_desc()?

thanks

> 
> > +	vrb2_fcw_letb_fill(op, &desc->req.fcw_le);
> > +	struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc;
> > +	int next_triplet = 1; /* FCW already done */
> > +	uint32_t in_length_in_bytes;
> > +	uint16_t K, in_length_in_bits;
> > +
> > +	input = enc->input.data;
> > +	output_head = output = enc->output.data;
> > +	in_offset = enc->input.offset;
> > +	out_offset = enc->output.offset;
> > +	seg_total_left = rte_pktmbuf_data_len(enc->input.data) - in_offset;
> > +
> > +	acc_header_init(&desc->req);
> > +	K = (enc->basegraph == 1 ? 22 : 10) * enc->z_c;
> > +	in_length_in_bits = K - enc->n_filler;
> > +	if ((enc->op_flags & RTE_BBDEV_LDPC_CRC_24A_ATTACH) ||
> > +			(enc->op_flags &
> RTE_BBDEV_LDPC_CRC_24B_ATTACH))
> > +		in_length_in_bits -= 24;
> > +	in_length_in_bytes = (in_length_in_bits >> 3) * enc->tb_params.c;
> > +
> > +	next_triplet = acc_dma_fill_blk_type_in(&desc->req, &input,
> &in_offset,
> > +			in_length_in_bytes, &seg_total_left, next_triplet,
> > +			check_bit(enc->op_flags,
> RTE_BBDEV_LDPC_ENC_SCATTER_GATHER));
> > +	if (unlikely(next_triplet < 0)) {
> > +		rte_bbdev_log(ERR,
> > +				"Mismatch between data to process and mbuf
> data length in bbdev_op: %p",
> > +				op);
> > +		return -1;
> > +	}
> > +	desc->req.data_ptrs[next_triplet - 1].last = 1;
> > +	desc->req.m2dlen = next_triplet;
> > +
> > +	/* Set output length */
> > +	/* Integer round up division by 8 */
> > +	out_length = (enc->tb_params.ea * enc->tb_params.cab +
> > +			enc->tb_params.eb * (enc->tb_params.c - enc-
> >tb_params.cab)  + 7) >> 3;
> > +
> > +	next_triplet = acc_dma_fill_blk_type(&desc->req, output, out_offset,
> > +			out_length, next_triplet, ACC_DMA_BLKID_OUT_ENC);
> > +	enc->output.length = out_length;
> > +	out_offset += out_length;
> > +	desc->req.data_ptrs[next_triplet - 1].last = 1;
> > +	desc->req.data_ptrs[next_triplet - 1].dma_ext = 0;
> > +	desc->req.d2mlen = next_triplet - desc->req.m2dlen;
> > +	desc->req.numCBs = enc->tb_params.c;
> > +	if (desc->req.numCBs > 1)
> > +		desc->req.dltb = 1;
> > +	desc->req.op_addr = op;
> > +
> > +	if (out_length < ACC_MAX_E_MBUF)
> > +		mbuf_append(output_head, output, out_length);
> > +
> > +#ifdef RTE_LIBRTE_BBDEV_DEBUG
> > +	rte_memdump(stderr, "FCW", &desc->req.fcw_le, sizeof(desc-
> >req.fcw_le));
> > +	rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
> > +#endif
> > +	/* One CB (one op) was successfully prepared to enqueue */
> > +	return 1;
> 
> This function is quite different from the VRB1 variant.
> Is the underlying hardware completely different, or just a different
> implementation?

The underlying HW is different in this mode of operation, notably as it
supports RTE_BBDEV_LDPC_ENC_CONCATENATION hence more of true TB
implementation. 
Kept separate on purpose. 

> 
> > +}
> > +
> >   /** Enqueue one decode operations for device in CB mode. */
> >   static inline int
> >   enqueue_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op
> *op,
> > @@ -2215,10 +2671,16 @@ vrb_enqueue_ldpc_dec_one_op_cb(struct
> acc_queue *q, struct rte_bbdev_dec_op *op,
> >   		else
> >   			seg_total_left = fcw->rm_e;
> >
> > -		ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input, h_output,
> > -				&in_offset, &h_out_offset,
> > -				&h_out_length, &mbuf_total_left,
> > -				&seg_total_left, fcw);
> > +		if (q->d->device_variant == VRB1_VARIANT)
> > +			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
> h_output,
> > +					&in_offset, &h_out_offset,
> > +					&h_out_length, &mbuf_total_left,
> > +					&seg_total_left, fcw);
> > +		else
> > +			ret = vrb2_dma_desc_ld_fill(op, &desc->req, &input,
> h_output,
> > +					&in_offset, &h_out_offset,
> > +					&h_out_length, &mbuf_total_left,
> > +					&seg_total_left, fcw);
> >   		if (unlikely(ret < 0))
> >   			return ret;
> >   	}
> > @@ -2308,11 +2770,18 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct
> acc_queue *q, struct rte_bbdev_dec_op *op,
> >   		rte_memcpy(&desc->req.fcw_ld, &desc_first->req.fcw_ld,
> ACC_FCW_LD_BLEN);
> >   		desc->req.fcw_ld.tb_trailer_size = (c - r - 1) * trail_len;
> >
> > -		ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
> > -				h_output, &in_offset, &h_out_offset,
> > -				&h_out_length,
> > -				&mbuf_total_left, &seg_total_left,
> > -				&desc->req.fcw_ld);
> > +		if (q->d->device_variant == VRB1_VARIANT)
> > +			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
> > +					h_output, &in_offset, &h_out_offset,
> > +					&h_out_length,
> > +					&mbuf_total_left, &seg_total_left,
> > +					&desc->req.fcw_ld);
> > +		else
> > +			ret = vrb2_dma_desc_ld_fill(op, &desc->req, &input,
> > +					h_output, &in_offset, &h_out_offset,
> > +					&h_out_length,
> > +					&mbuf_total_left, &seg_total_left,
> > +					&desc->req.fcw_ld);
> >
> >   		if (unlikely(ret < 0))
> >   			return ret;
> > @@ -2576,14 +3045,22 @@ vrb_enqueue_ldpc_enc_tb(struct
> rte_bbdev_queue_data *q_data,
> >   	int descs_used;
> >
> >   	for (i = 0; i < num; ++i) {
> > -		cbs_in_tb = get_num_cbs_in_tb_ldpc_enc(&ops[i]->ldpc_enc);
> > -		/* Check if there are available space for further processing. */
> > -		if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) {
> > -			acc_enqueue_ring_full(q_data);
> > -			break;
> > +		if (q->d->device_variant == VRB1_VARIANT) {
> > +			cbs_in_tb = get_num_cbs_in_tb_ldpc_enc(&ops[i]-
> >ldpc_enc);
> > +			/* Check if there are available space for further
> processing. */
> > +			if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) {
> > +				acc_enqueue_ring_full(q_data);
> > +				break;
> > +			}
> > +			descs_used = vrb1_enqueue_ldpc_enc_one_op_tb(q,
> ops[i],
> > +					enqueued_descs, cbs_in_tb);
> > +		} else {
> > +			if (unlikely(avail < 1)) {
> > +				acc_enqueue_ring_full(q_data);
> > +				break;
> > +			}
> > +			descs_used = vrb2_enqueue_ldpc_enc_one_op_tb(q,
> ops[i], enqueued_descs);
> >   		}
> > -
> > -		descs_used = vrb1_enqueue_ldpc_enc_one_op_tb(q, ops[i],
> enqueued_descs, cbs_in_tb);
> >   		if (descs_used < 0) {
> >   			acc_enqueue_invalid(q_data);
> >   			break;
> > @@ -2865,6 +3342,52 @@ vrb_dequeue_enc_one_op_cb(struct acc_queue
> *q, struct rte_bbdev_enc_op **ref_op,
> >   	return desc->req.numCBs;
> >   }
> >
> > +/* Dequeue one LDPC encode operations from VRB2 device in TB mode. */
> > +static inline int
> > +vrb2_dequeue_ldpc_enc_one_op_tb(struct acc_queue *q, struct
> rte_bbdev_enc_op **ref_op,
> > +		uint16_t *dequeued_ops, uint32_t *aq_dequeued,
> > +		uint16_t *dequeued_descs)
> > +{
> > +	union acc_dma_desc *desc, atom_desc;
> > +	union acc_dma_rsp_desc rsp;
> > +	struct rte_bbdev_enc_op *op;
> > +	int desc_idx = ((q->sw_ring_tail + *dequeued_descs) & q-
> >sw_ring_wrap_mask);
> > +
> > +	desc = q->ring_addr + desc_idx;
> > +	atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc,
> __ATOMIC_RELAXED);
> > +
> > +	/* Check fdone bit. */
> > +	if (!(atom_desc.rsp.val & ACC_FDONE))
> > +		return -1;
> > +
> > +	rsp.val = atom_desc.rsp.val;
> > +	rte_bbdev_log_debug("Resp. desc %p: %x", desc, rsp.val);
> > +
> > +	/* Dequeue. */
> > +	op = desc->req.op_addr;
> > +
> > +	/* Clearing status, it will be set based on response. */
> > +	op->status = 0;
> > +	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
> > +	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
> > +	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
> > +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
> > +
> > +	if (desc->req.last_desc_in_batch) {
> > +		(*aq_dequeued)++;
> > +		desc->req.last_desc_in_batch = 0;
> > +	}
> > +	desc->rsp.val = ACC_DMA_DESC_TYPE;
> > +	desc->rsp.add_info_0 = 0; /* Reserved bits. */
> > +	desc->rsp.add_info_1 = 0; /* Reserved bits. */
> > +
> > +	/* One op was successfully dequeued */
> > +	ref_op[0] = op;
> > +	(*dequeued_descs)++;
> > +	(*dequeued_ops)++;
> > +	return 1;
> > +}
> > +
> >   /* Dequeue one LDPC encode operations from device in TB mode.
> >    * That operation may cover multiple descriptors.
> >    */
> > @@ -3189,9 +3712,14 @@ vrb_dequeue_ldpc_enc(struct
> rte_bbdev_queue_data *q_data,
> >
> >   	for (i = 0; i < avail; i++) {
> >   		if (cbm == RTE_BBDEV_TRANSPORT_BLOCK)
> > -			ret = vrb_dequeue_enc_one_op_tb(q,
> &ops[dequeued_ops],
> > -					&dequeued_ops, &aq_dequeued,
> > -					&dequeued_descs, num);
> > +			if (q->d->device_variant == VRB1_VARIANT)
> > +				ret = vrb_dequeue_enc_one_op_tb(q,
> &ops[dequeued_ops],
> > +						&dequeued_ops,
> &aq_dequeued,
> > +						&dequeued_descs, num);
> > +			else
> > +				ret = vrb2_dequeue_ldpc_enc_one_op_tb(q,
> &ops[dequeued_ops],
> > +						&dequeued_ops,
> &aq_dequeued,
> > +						&dequeued_descs);
> >   		else
> >   			ret = vrb_dequeue_enc_one_op_cb(q,
> &ops[dequeued_ops],
> >   					&dequeued_ops, &aq_dequeued,
> > @@ -3536,6 +4064,7 @@ vrb_bbdev_init(struct rte_bbdev *dev, struct
> rte_pci_driver *drv)
> >   	} else {
> >   		d->device_variant = VRB2_VARIANT;
> >   		d->queue_offset = vrb2_queue_offset;
> > +		d->fcw_ld_fill = vrb2_fcw_ld_fill;
> >   		d->num_qgroups = VRB2_NUM_QGRPS;
> >   		d->num_aqs = VRB2_NUM_AQS;
> >   		if (d->pf_device)
> 
> 
> It looks like most (60%+) of the code in this patch could be removed if
> duplication was avoided.
> 
> Thanks,
> Maxime


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
  2023-10-04  7:11       ` Maxime Coquelin
@ 2023-10-04 21:18         ` Chautru, Nicolas
  2023-10-05 14:34           ` Maxime Coquelin
  0 siblings, 1 reply; 42+ messages in thread
From: Chautru, Nicolas @ 2023-10-04 21:18 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan

Hi Maxime, 

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, October 4, 2023 12:11 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
> 
> 
> 
> On 10/3/23 20:20, Chautru, Nicolas wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Tuesday, October 3, 2023 7:37 AM
> >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> >> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
> Hernan
> >> <hernan.vargas@intel.com>
> >> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2
> >> variant
> >>
> >>
> >>
> >> On 9/29/23 18:35, Nicolas Chautru wrote:
> >>> Support for the FFT the processing specific to the
> >>> VRB2 variant.
> >>>
> >>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> >>> ---
> >>>    drivers/baseband/acc/rte_vrb_pmd.c | 132
> >> ++++++++++++++++++++++++++++-
> >>>    1 file changed, 128 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
> >>> b/drivers/baseband/acc/rte_vrb_pmd.c
> >>> index 93add82947..ce4b90d8e7 100644
> >>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> >>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> >>> @@ -903,6 +903,9 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t
> >> queue_id,
> >>>    			ACC_FCW_LD_BLEN : (conf->op_type ==
> >> RTE_BBDEV_OP_FFT ?
> >>>    			ACC_FCW_FFT_BLEN : ACC_FCW_MLDTS_BLEN))));
> >>>
> >>> +	if ((q->d->device_variant == VRB2_VARIANT) && (conf->op_type ==
> >> RTE_BBDEV_OP_FFT))
> >>> +		fcw_len = ACC_FCW_FFT_BLEN_3;
> >>> +
> >>>    	for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) {
> >>>    		desc = q->ring_addr + desc_idx;
> >>>    		desc->req.word0 = ACC_DMA_DESC_TYPE; @@ -1323,6
> >> +1326,24 @@
> >>> vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info
> >> *dev_info)
> >>>    			.num_buffers_soft_out = 0,
> >>>    			}
> >>>    		},
> >>> +		{
> >>> +			.type	= RTE_BBDEV_OP_FFT,
> >>> +			.cap.fft = {
> >>> +				.capability_flags =
> >>> +
> >> 	RTE_BBDEV_FFT_WINDOWING |
> >>> +
> >> 	RTE_BBDEV_FFT_CS_ADJUSTMENT |
> >>> +
> >> 	RTE_BBDEV_FFT_DFT_BYPASS |
> >>> +
> >> 	RTE_BBDEV_FFT_IDFT_BYPASS |
> >>> +						RTE_BBDEV_FFT_FP16_INPUT
> >> |
> >>> +
> >> 	RTE_BBDEV_FFT_FP16_OUTPUT |
> >>> +
> >> 	RTE_BBDEV_FFT_POWER_MEAS |
> >>> +
> >> 	RTE_BBDEV_FFT_WINDOWING_BYPASS,
> >>> +				.num_buffers_src =
> >>> +						1,
> >>> +				.num_buffers_dst =
> >>> +						1,
> >>> +			}
> >>> +		},
> >>>    		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
> >>>    	};
> >>>
> >>> @@ -3849,6 +3870,47 @@ vrb1_fcw_fft_fill(struct rte_bbdev_fft_op
> >>> *op,
> >> struct acc_fcw_fft *fcw)
> >>>    		fcw->bypass = 0;
> >>>    }
> >>>
> >>> +/* Fill in a frame control word for FFT processing. */ static
> >>> +inline void vrb2_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct
> >>> +acc_fcw_fft_3 *fcw) {
> >>> +	fcw->in_frame_size = op->fft.input_sequence_size;
> >>> +	fcw->leading_pad_size = op->fft.input_leading_padding;
> >>> +	fcw->out_frame_size = op->fft.output_sequence_size;
> >>> +	fcw->leading_depad_size = op->fft.output_leading_depadding;
> >>> +	fcw->cs_window_sel = op->fft.window_index[0] +
> >>> +			(op->fft.window_index[1] << 8) +
> >>> +			(op->fft.window_index[2] << 16) +
> >>> +			(op->fft.window_index[3] << 24);
> >>> +	fcw->cs_window_sel2 = op->fft.window_index[4] +
> >>> +			(op->fft.window_index[5] << 8);
> >>> +	fcw->cs_enable_bmap = op->fft.cs_bitmap;
> >>> +	fcw->num_antennas = op->fft.num_antennas_log2;
> >>> +	fcw->idft_size = op->fft.idft_log2;
> >>> +	fcw->dft_size = op->fft.dft_log2;
> >>> +	fcw->cs_offset = op->fft.cs_time_adjustment;
> >>> +	fcw->idft_shift = op->fft.idft_shift;
> >>> +	fcw->dft_shift = op->fft.dft_shift;
> >>> +	fcw->cs_multiplier = op->fft.ncs_reciprocal;
> >>> +	fcw->power_shift = op->fft.power_shift; > +	fcw->exp_adj = op-
> >>> fft.fp16_exp_adjust;
> >>> +	fcw->fp16_in = check_bit(op->fft.op_flags,
> >> RTE_BBDEV_FFT_FP16_INPUT);
> >>> +	fcw->fp16_out = check_bit(op->fft.op_flags,
> >> RTE_BBDEV_FFT_FP16_OUTPUT);
> >>> +	fcw->power_en = check_bit(op->fft.op_flags,
> >> RTE_BBDEV_FFT_POWER_MEAS);
> >>> +	if (check_bit(op->fft.op_flags,
> >>> +			RTE_BBDEV_FFT_IDFT_BYPASS)) {
> >>> +		if (check_bit(op->fft.op_flags,
> >>> +				RTE_BBDEV_FFT_WINDOWING_BYPASS))
> >>> +			fcw->bypass = 2;
> >>> +		else
> >>> +			fcw->bypass = 1;
> >>> +	} else if (check_bit(op->fft.op_flags,
> >>> +			RTE_BBDEV_FFT_DFT_BYPASS))
> >>> +		fcw->bypass = 3;
> >>> +	else
> >>> +		fcw->bypass = 0;
> >>
> >> The only difference I see with VRB1 are backed by corresponding
> >> op_flags (POWER & FP16), is that correct? If so, it does not make
> >> sense to me to have a specific function for VRB2.
> >
> > There are more changes but these are only formally enabled in the next
> > stepping hence some of the related code is not included yet. More generally
> the FCW and IP is different from VRB1 implementation.
> 
> Currently, the code is almost identical so vrb1 implementation should be
> reused. If some later changes makes the two implementations diverge, then we
> can consider having a dedicated function for VRB2 at that time.

If I may, I believe this is best as-is notably for patches and support. 
The functions are fairly small (not much code overlap quantitatively) and the underlying IP is different 
(with more differences we can enable over time). I don’t think it would help anyone really to try to make them
coexist for a small period of time. 
Does that sound fair? 


> 
> >>
> >>> +}
> >>> +
> >>>    static inline int
> >>>    vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
> >>>    		struct acc_dma_req_desc *desc,
> >>> @@ -3882,6 +3944,58 @@ vrb1_dma_desc_fft_fill(struct
> >>> rte_bbdev_fft_op
> >> *op,
> >>>    	return 0;
> >>>    }
> >>>
> >>> +static inline int
> >>> +vrb2_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
> >>> +		struct acc_dma_req_desc *desc,
> >>> +		struct rte_mbuf *input, struct rte_mbuf *output, struct
> >> rte_mbuf *win_input,
> >>> +		struct rte_mbuf *pwr, uint32_t *in_offset, uint32_t
> >> *out_offset,
> >>> +		uint32_t *win_offset, uint32_t *pwr_offset) {
> >>> +	bool pwr_en = check_bit(op->fft.op_flags,
> >> RTE_BBDEV_FFT_POWER_MEAS);
> >>> +	bool win_en = check_bit(op->fft.op_flags,
> >> RTE_BBDEV_FFT_DEWINDOWING);
> >>> +	int num_cs = 0, i, bd_idx = 1;
> >>> +
> >>> +	/* FCW already done */
> >>> +	acc_header_init(desc);
> >>> +
> >>> +	RTE_SET_USED(win_input);
> >>> +	RTE_SET_USED(win_offset);
> >>> +
> >>> +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(input,
> >> *in_offset);
> >>> +	desc->data_ptrs[bd_idx].blen = op->fft.input_sequence_size *
> >> ACC_IQ_SIZE;
> >>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_IN;
> >>> +	desc->data_ptrs[bd_idx].last = 1;
> >>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
> >>> +	bd_idx++;
> >>> +
> >>> +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(output,
> >> *out_offset);
> >>> +	desc->data_ptrs[bd_idx].blen = op->fft.output_sequence_size *
> >> ACC_IQ_SIZE;
> >>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_HARD;
> >>> +	desc->data_ptrs[bd_idx].last = pwr_en ? 0 : 1;
> >>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
> >>> +	desc->m2dlen = win_en ? 3 : 2;
> >>> +	desc->d2mlen = pwr_en ? 2 : 1;
> >>> +	desc->ib_ant_offset = op->fft.input_sequence_size;
> >>> +	desc->num_ant = op->fft.num_antennas_log2 - 3;
> >>> +
> >>> +	for (i = 0; i < RTE_BBDEV_MAX_CS; i++)
> >>> +		if (check_bit(op->fft.cs_bitmap, 1 << i))
> >>> +			num_cs++;
> >>> +	desc->num_cs = num_cs;
> >>> +
> >>> +	if (pwr_en && pwr) {
> >>> +		bd_idx++;
> >>> +		desc->data_ptrs[bd_idx].address =
> >> rte_pktmbuf_iova_offset(pwr, *pwr_offset);
> >>> +		desc->data_ptrs[bd_idx].blen = num_cs * (1 << op-
> >>> fft.num_antennas_log2) * 4;
> >>> +		desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_SOFT;
> >>> +		desc->data_ptrs[bd_idx].last = 1;
> >>> +		desc->data_ptrs[bd_idx].dma_ext = 0;
> >>> +	}
> >>> +	desc->ob_cyc_offset = op->fft.output_sequence_size;
> >>> +	desc->ob_ant_offset = op->fft.output_sequence_size * num_cs;
> >>> +	desc->op_addr = op;
> >>> +	return 0;
> >>> +}
> >>>
> >>>    /** Enqueue one FFT operation for device. */
> >>>    static inline int
> >>> @@ -3889,22 +4003,32 @@ vrb_enqueue_fft_one_op(struct acc_queue
> *q,
> >> struct rte_bbdev_fft_op *op,
> >>>    		uint16_t total_enqueued_cbs)
> >>>    {
> >>>    	union acc_dma_desc *desc;
> >>> -	struct rte_mbuf *input, *output;
> >>> -	uint32_t in_offset, out_offset;
> >>> +	struct rte_mbuf *input, *output, *pwr, *win;
> >>> +	uint32_t in_offset, out_offset, pwr_offset, win_offset;
> >>>    	struct acc_fcw_fft *fcw;
> >>>
> >>>    	desc = acc_desc(q, total_enqueued_cbs);
> >>>    	input = op->fft.base_input.data;
> >>>    	output = op->fft.base_output.data;
> >>> +	pwr = op->fft.power_meas_output.data;
> >>> +	win = op->fft.dewindowing_input.data;
> >>>    	in_offset = op->fft.base_input.offset;
> >>>    	out_offset = op->fft.base_output.offset;
> >>> +	pwr_offset = op->fft.power_meas_output.offset;
> >>> +	win_offset = op->fft.dewindowing_input.offset;
> >>>
> >>>    	fcw = (struct acc_fcw_fft *) (q->fcw_ring +
> >>>    			((q->sw_ring_head + total_enqueued_cbs) & q-
> >>> sw_ring_wrap_mask)
> >>>    			* ACC_MAX_FCW_SIZE);
> >>>
> >>> -	vrb1_fcw_fft_fill(op, fcw);
> >>> -	vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset,
> >> &out_offset);
> >>> +	if (q->d->device_variant == VRB1_VARIANT) {
> >>> +		vrb1_fcw_fft_fill(op, fcw);
> >>> +		vrb1_dma_desc_fft_fill(op, &desc->req, input, output,
> >> &in_offset, &out_offset);
> >>> +	} else {
> >>> +		vrb2_fcw_fft_fill(op, (struct acc_fcw_fft_3 *) fcw);
> >>> +		vrb2_dma_desc_fft_fill(op, &desc->req, input, output, win,
> >> pwr,
> >>> +				&in_offset, &out_offset, &win_offset,
> >> &pwr_offset);
> >>> +	}
> >>>    #ifdef RTE_LIBRTE_BBDEV_DEBUG
> >>>    	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
> >>>    			sizeof(desc->req.fcw_fft));
> >


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 06/12] baseband/acc: refactor to allow unified driver extension
  2023-10-04  7:35       ` Maxime Coquelin
@ 2023-10-04 21:28         ` Chautru, Nicolas
  2023-10-05 14:31           ` Maxime Coquelin
  0 siblings, 1 reply; 42+ messages in thread
From: Chautru, Nicolas @ 2023-10-04 21:28 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan

Hi Maxime, 

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, October 4, 2023 12:36 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v3 06/12] baseband/acc: refactor to allow unified driver
> extension
> 
> 
> 
> On 10/3/23 20:54, Chautru, Nicolas wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Tuesday, October 3, 2023 6:15 AM
> >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> >> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
> Hernan
> >> <hernan.vargas@intel.com>
> >> Subject: Re: [PATCH v3 06/12] baseband/acc: refactor to allow unified
> >> driver extension
> >>
> >> Thanks for doing the split, that will ease review.
> >>
> >> On 9/29/23 18:35, Nicolas Chautru wrote:
> >>> Adding a few functions and common code prior to extending the VRB
> >>> driver.
> >>>
> >>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> >>> ---
> >>>    drivers/baseband/acc/acc_common.h     | 164
> +++++++++++++++++++++++-
> >> --
> >>>    drivers/baseband/acc/rte_acc100_pmd.c |   4 +-
> >>>    drivers/baseband/acc/rte_vrb_pmd.c    |  62 +++++-----
> >>>    3 files changed, 184 insertions(+), 46 deletions(-)
> >>>
> >>> diff --git a/drivers/baseband/acc/acc_common.h
> >>> b/drivers/baseband/acc/acc_common.h
> >>> index 788abf1a3c..89893eae43 100644
> >>> --- a/drivers/baseband/acc/acc_common.h
> >>> +++ b/drivers/baseband/acc/acc_common.h
> >>> @@ -18,6 +18,7 @@
> >>>    #define ACC_DMA_BLKID_OUT_HARQ      3
> >>>    #define ACC_DMA_BLKID_IN_HARQ       3
> >>>    #define ACC_DMA_BLKID_IN_MLD_R      3
> >>> +#define ACC_DMA_BLKID_DEWIN_IN      3
> >>>
> >>>    /* Values used in filling in decode FCWs */
> >>>    #define ACC_FCW_TD_VER              1
> >>> @@ -103,6 +104,9 @@
> >>>    #define ACC_MAX_NUM_QGRPS              32
> >>>    #define ACC_RING_SIZE_GRANULARITY      64
> >>>    #define ACC_MAX_FCW_SIZE              128
> >>> +#define ACC_IQ_SIZE                    4
> >>> +
> >>> +#define ACC_FCW_FFT_BLEN_3             28
> >>>
> >>>    /* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2 */
> >>>    #define ACC_N_ZC_1 66 /* N = 66 Zc for BG 1 */ @@ -132,6 +136,17 @@
> >>>    #define ACC_LIM_21 14 /* 0.21 */
> >>>    #define ACC_LIM_31 20 /* 0.31 */
> >>>    #define ACC_MAX_E (128 * 1024 - 2)
> >>> +#define ACC_MAX_CS 12
> >>> +
> >>> +#define ACC100_VARIANT          0
> >>> +#define VRB1_VARIANT		2
> >>> +#define VRB2_VARIANT		3
> >>> +
> >>> +/* Queue Index Hierarchy */
> >>> +#define VRB1_GRP_ID_SHIFT    10
> >>> +#define VRB1_VF_ID_SHIFT     4
> >>> +#define VRB2_GRP_ID_SHIFT    12
> >>> +#define VRB2_VF_ID_SHIFT     6
> >>>
> >>>    /* Helper macro for logging */
> >>>    #define rte_acc_log(level, fmt, ...) \ @@ -332,6 +347,37 @@
> >>> struct __rte_packed acc_fcw_fft {
> >>>    		res:19;
> >>>    };
> >>>
> >>> +/* FFT Frame Control Word. */
> >>> +struct __rte_packed acc_fcw_fft_3 {
> >>> +	uint32_t in_frame_size:16,
> >>> +		leading_pad_size:16;
> >>> +	uint32_t out_frame_size:16,
> >>> +		leading_depad_size:16;
> >>> +	uint32_t cs_window_sel;
> >>> +	uint32_t cs_window_sel2:16,
> >>> +		cs_enable_bmap:16;
> >>> +	uint32_t num_antennas:8,
> >>> +		idft_size:8,
> >>> +		dft_size:8,
> >>> +		cs_offset:8;
> >>> +	uint32_t idft_shift:8,
> >>> +		dft_shift:8,
> >>> +		cs_multiplier:16;
> >>> +	uint32_t bypass:2,
> >>> +		fp16_in:1,
> >>> +		fp16_out:1,
> >>> +		exp_adj:4,
> >>> +		power_shift:4,
> >>> +		power_en:1,
> >>> +		enable_dewin:1,
> >>> +		freq_resample_mode:2,
> >>> +		depad_output_size:16;
> >>> +	uint16_t cs_theta_0[ACC_MAX_CS];
> >>> +	uint32_t cs_theta_d[ACC_MAX_CS];
> >>> +	int8_t cs_time_offset[ACC_MAX_CS]; };
> >>> +
> >>> +
> >>>    /* MLD-TS Frame Control Word */
> >>>    struct __rte_packed acc_fcw_mldts {
> >>>    	uint32_t fcw_version:4,
> >>> @@ -473,14 +519,14 @@ union acc_info_ring_data {
> >>>    		uint16_t valid: 1;
> >>>    	};
> >>>    	struct {
> >>> -		uint32_t aq_id_3: 6;
> >>> -		uint32_t qg_id_3: 5;
> >>> -		uint32_t vf_id_3: 6;
> >>> -		uint32_t int_nb_3: 6;
> >>> -		uint32_t msi_0_3: 1;
> >>> -		uint32_t vf2pf_3: 6;
> >>> -		uint32_t loop_3: 1;
> >>> -		uint32_t valid_3: 1;
> >>> +		uint32_t aq_id_vrb2: 6;
> >>> +		uint32_t qg_id_vrb2: 5;
> >>> +		uint32_t vf_id_vrb2: 6;
> >>> +		uint32_t int_nb_vrb2: 6;
> >>> +		uint32_t msi_0_vrb2: 1;
> >>> +		uint32_t vf2pf_vrb2: 6;
> >>> +		uint32_t loop_vrb2: 1;
> >>> +		uint32_t valid_vrb2: 1;
> >>>    	};
> >>>    } __rte_packed;
> >>>
> >>> @@ -761,22 +807,105 @@ alloc_sw_rings_min_mem(struct rte_bbdev
> *dev,
> >> struct acc_device *d,
> >>>    	free_base_addresses(base_addrs, i);
> >>>    }
> >>>
> >>> +/* Wrapper to provide VF index from ring data. */ static inline
> >>> +uint16_t vf_from_ring(const union acc_info_ring_data ring_data,
> >>> +uint16_t device_variant) {
> >>
> >> curly braces on a new line.
> >
> > Thanks.
> >
> >>
> >>> +	if (device_variant == VRB2_VARIANT)
> >>> +		return ring_data.vf_id_vrb2;
> >>> +	else
> >>> +		return ring_data.vf_id;
> >>> +}
> >>> +
> >>> +/* Wrapper to provide QG index from ring data. */ static inline
> >>> +uint16_t qg_from_ring(const union acc_info_ring_data ring_data,
> >>> +uint16_t device_variant) {
> >>> +	if (device_variant == VRB2_VARIANT)
> >>> +		return ring_data.qg_id_vrb2;
> >>> +	else
> >>> +		return ring_data.qg_id;
> >>> +}
> >>> +
> >>> +/* Wrapper to provide AQ index from ring data. */ static inline
> >>> +uint16_t aq_from_ring(const union acc_info_ring_data ring_data,
> >>> +uint16_t device_variant) {
> >>> +	if (device_variant == VRB2_VARIANT)
> >>> +		return ring_data.aq_id_vrb2;
> >>> +	else
> >>> +		return ring_data.aq_id;
> >>> +}
> >>> +
> >>> +/* Wrapper to provide int index from ring data. */ static inline
> >>> +uint16_t int_from_ring(const union acc_info_ring_data ring_data,
> >>> +uint16_t device_variant) {
> >>> +	if (device_variant == VRB2_VARIANT)
> >>> +		return ring_data.int_nb_vrb2;
> >>> +	else
> >>> +		return ring_data.int_nb;
> >>> +}
> >>> +
> >>> +/* Wrapper to provide queue index from group and aq index. */
> >>> +static inline int queue_index(uint16_t group_idx, uint16_t aq_idx,
> >>> +uint16_t
> >>> +device_variant) {
> >>> +	if (device_variant == VRB2_VARIANT)
> >>> +		return (group_idx << VRB2_GRP_ID_SHIFT) + aq_idx;
> >>> +	else
> >>> +		return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx; }
> >>> +
> >>> +/* Wrapper to provide queue group from queue index. */ static
> >>> +inline int qg_from_q(uint32_t q_idx, uint16_t device_variant) {
> >>> +	if (device_variant == VRB2_VARIANT)
> >>> +		return (q_idx >> VRB2_GRP_ID_SHIFT) & 0x1F;
> >>> +	else
> >>> +		return (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF; }
> >>> +
> >>> +/* Wrapper to provide vf from queue index. */ static inline int32_t
> >>> +vf_from_q(uint32_t q_idx, uint16_t device_variant) {
> >>> +	if (device_variant == VRB2_VARIANT)
> >>> +		return (q_idx >> VRB2_VF_ID_SHIFT)  & 0x3F;
> >>> +	else
> >>> +		return (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F; }
> >>> +
> >>> +/* Wrapper to provide aq index from queue index. */ static inline
> >>> +int32_t aq_from_q(uint32_t q_idx, uint16_t device_variant) {
> >>> +	if (device_variant == VRB2_VARIANT)
> >>> +		return q_idx & 0x3F;
> >>> +	else
> >>> +		return q_idx & 0xF;
> >>> +}
> >>> +
> >>> +/* Wrapper to set VF index in ring data. */ static inline int32_t
> >>> +set_vf_in_ring(volatile union acc_info_ring_data *ring_data,
> >>> +		uint16_t device_variant, uint16_t value) {
> >>> +	if (device_variant == VRB2_VARIANT)
> >>> +		return ring_data->vf_id_vrb2 = value;
> >>> +	else
> >>> +		return ring_data->vf_id = value;
> >>> +}
> >>> +
> >>>    /*
> >>>     * Find queue_id of a device queue based on details from the Info Ring.
> >>>     * If a queue isn't found UINT16_MAX is returned.
> >>>     */
> >>>    static inline uint16_t
> >>>    get_queue_id_from_ring_info(struct rte_bbdev_data *data,
> >>> -		const union acc_info_ring_data ring_data)
> >>> +		const union acc_info_ring_data ring_data, uint16_t
> >> device_variant)
> >>
> >> As I suggested on v2:
> >>
> >> get_queue_id_from_ring_info(struct rte_bbdev_data *data,
> >> 	const union acc_info_ring_data ring_data) {
> >> 	struct acc_device *d = data->dev_private;
> >>
> >> 	...
> >>
> >> 	if (acc_q != NULL && acc_q->aq_id == aq_from_ring(d, ring_data) &&
> >> ...
> >>
> >> }
> >>
> >> with
> >>
> >> /* Wrapper to provide AQ index from ring data. */ tatic inline
> >> uint16_t aq_from_ring(struct acc_device *d, const union
> >> acc_info_ring_data ring_data) {
> >> 	if (d->device_variant == VRB2_VARIANT)
> >> 		return ring_data.aq_id_vrb2;
> >> 	else
> >> 		return ring_data.aq_id;
> >> }
> >>
> >
> > I will change the get_queue_id_from_ring_info() to have a smaller
> > prototype but I don’t plan on changing the other new underlying funs
> > to use dev instead of the variant in prototype, I don’t see a reason
> > to as these only need this very member.
> 
> IMHO, reason is it cost nothing and is more future proof.

Thanks, on that very case I believe it the prototype is cleaner with the device variant. I don’t see future proof concern. 

> 
> Also, my initial idea was to have an intermediate representation, like:
> 
> struct acc_queue_info { // Not sure about the name
> 	uint16_t vf_id;
> 	uint8_t qgrp_id;
> 	uint16_t aq_id;
> };
> 
> Then we have a single callback for each variant
> 
> static void
> vrb1_ring_data_to_queue_info(const union acc_info_ring_data ring_data,
> 		struct acc_queue_info *queue_info)
> {
> 	queue_info->vf_id = ring_data.vf_id;
> 	queue_info->qgrp_id = ...
> }
> 
> static void
> vrb2_ring_data_to_queue_info(const union acc_info_ring_data ring_data,
> 		struct acc_queue_info *queue_info)
> {
> 
> }
> 
> The acc_queue_info struct can also be used in struct acc_queue, so we use
> same format everywhere.
> 
> I think it will be less verbose, and quicker to add new variants without risking to
> miss adding "else if (d->device_variant == VRBx_VARIANT)"
> anywhere.
> 
> What do you think?

I think both would work. The intermediate structure may be a bit artificial, and it would have different members when getting info from queue or ring (ie. the int index). Also there is no reciprocal function, ie we set only the VF into the ring. And there is a location where we only need one of information not all of the other members. 
Again both are okay to me without super strong preference, so for now I would suggest to keep as is. 

> 
> >
> >>>    {
> >>>    	uint16_t queue_id;
> >>> +	struct acc_queue *acc_q;
> >>>
> >>>    	for (queue_id = 0; queue_id < data->num_queues; ++queue_id) {
> >>> -		struct acc_queue *acc_q =
> >>> -				data->queues[queue_id].queue_private;
> >>> -		if (acc_q != NULL && acc_q->aq_id == ring_data.aq_id &&
> >>> -				acc_q->qgrp_id == ring_data.qg_id &&
> >>> -				acc_q->vf_id == ring_data.vf_id)
> >>> +		acc_q = data->queues[queue_id].queue_private;
> >>> +
> >>> +		if (acc_q != NULL && acc_q->aq_id ==
> >> aq_from_ring(ring_data, device_variant) &&
> >>> +				acc_q->qgrp_id == qg_from_ring(ring_data,
> >> device_variant) &&
> >>> +				acc_q->vf_id == vf_from_ring(ring_data,
> >> device_variant))
> >>>    			return queue_id;
> >>>    	}
> >>>
> >>> @@ -1438,4 +1567,11 @@ get_num_cbs_in_tb_ldpc_enc(struct
> >> rte_bbdev_op_ldpc_enc *ldpc_enc)
> >>>    	return cbs_in_tb;
> >>>    }
> >>>
> >>> +static inline void
> >>> +acc_reg_fast_write(struct acc_device *d, uint32_t offset, uint32_t
> >>> +value) {
> >>> +	void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset);
> >>> +	mmio_write(reg_addr, value);
> >>> +}
> >>> +
> >>>    #endif /* _ACC_COMMON_H_ */
> >>> diff --git a/drivers/baseband/acc/rte_acc100_pmd.c
> >>> b/drivers/baseband/acc/rte_acc100_pmd.c
> >>> index 5362d39c30..7f8d05b5a9 100644
> >>> --- a/drivers/baseband/acc/rte_acc100_pmd.c
> >>> +++ b/drivers/baseband/acc/rte_acc100_pmd.c
> >>> @@ -294,7 +294,7 @@ acc100_pf_interrupt_handler(struct rte_bbdev
> *dev)
> >>>    		case ACC100_PF_INT_DMA_UL5G_DESC_IRQ:
> >>>    		case ACC100_PF_INT_DMA_DL5G_DESC_IRQ:
> >>>    			deq_intr_det.queue_id =
> >> get_queue_id_from_ring_info(
> >>> -					dev->data, *ring_data);
> >>> +					dev->data, *ring_data, acc100_dev-
> >>> device_variant);
> >>>    			if (deq_intr_det.queue_id == UINT16_MAX) {
> >>>    				rte_bbdev_log(ERR,
> >>>    						"Couldn't find queue: aq_id:
> >> %u, qg_id: %u, vf_id: %u", @@
> >>> -348,7 +348,7 @@ acc100_vf_interrupt_handler(struct rte_bbdev *dev)
> >>>    			 */
> >>>    			ring_data->vf_id = 0;
> >>>    			deq_intr_det.queue_id =
> >> get_queue_id_from_ring_info(
> >>> -					dev->data, *ring_data);
> >>> +					dev->data, *ring_data, acc100_dev-
> >>> device_variant);
> >>>    			if (deq_intr_det.queue_id == UINT16_MAX) {
> >>>    				rte_bbdev_log(ERR,
> >>>    						"Couldn't find queue: aq_id:
> >> %u, qg_id: %u", diff --git
> >>> a/drivers/baseband/acc/rte_vrb_pmd.c
> >>> b/drivers/baseband/acc/rte_vrb_pmd.c
> >>> index a1de012b40..c89c26c59a 100644
> >>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> >>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> >>> @@ -341,17 +341,18 @@ static inline void
> >>>    vrb_check_ir(struct acc_device *acc_dev)
> >>>    {
> >>>    	volatile union acc_info_ring_data *ring_data;
> >>> -	uint16_t info_ring_head = acc_dev->info_ring_head;
> >>> +	uint16_t info_ring_head = acc_dev->info_ring_head, int_nb;
> >>>    	if (unlikely(acc_dev->info_ring == NULL))
> >>>    		return;
> >>>
> >>>    	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
> >>> ACC_INFO_RING_MASK);
> >>>
> >>>    	while (ring_data->valid) {
> >>> -		if ((ring_data->int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
> >>> -				ring_data->int_nb >
> >> ACC_PF_INT_DMA_MLD_DESC_IRQ)) {
> >>> +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
> >>> +		if ((int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
> >>> +				int_nb > ACC_PF_INT_DMA_MLD_DESC_IRQ))
> >> {
> >>>    			rte_bbdev_log(WARNING, "InfoRing: ITR:%d
> >> Info:0x%x",
> >>> -					ring_data->int_nb, ring_data-
> >>> detailed_info);
> >>> +					int_nb, ring_data->detailed_info);
> >>>    			/* Initialize Info Ring entry and move forward. */
> >>>    			ring_data->val = 0;
> >>>    		}
> >>> @@ -368,16 +369,21 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >>>    	struct acc_device *acc_dev = dev->data->dev_private;
> >>>    	volatile union acc_info_ring_data *ring_data;
> >>>    	struct acc_deq_intr_details deq_intr_det;
> >>> +	uint16_t vf_id, aq_id, qg_id, int_nb;
> >>>
> >>>    	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
> >>> ACC_INFO_RING_MASK);
> >>>
> >>>    	while (ring_data->valid) {
> >>> +		vf_id = vf_from_ring(*ring_data, acc_dev->device_variant);
> >>> +		aq_id = aq_from_ring(*ring_data, acc_dev->device_variant);
> >>> +		qg_id = qg_from_ring(*ring_data, acc_dev->device_variant);
> >>> +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
> >>>    		if (acc_dev->pf_device) {
> >>>    			rte_bbdev_log_debug(
> >>> -					"VRB1 PF Interrupt received, Info Ring
> >> data: 0x%x -> %d",
> >>> -					ring_data->val, ring_data->int_nb);
> >>> +					"PF Interrupt received, Info Ring data:
> >> 0x%x -> %d",
> >>> +					ring_data->val, int_nb);
> >>>
> >>> -			switch (ring_data->int_nb) {
> >>> +			switch (int_nb) {
> >>>    			case ACC_PF_INT_DMA_DL_DESC_IRQ:
> >>>    			case ACC_PF_INT_DMA_UL_DESC_IRQ:
> >>>    			case ACC_PF_INT_DMA_FFT_DESC_IRQ:
> >>> @@ -385,13 +391,11 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >>>    			case ACC_PF_INT_DMA_DL5G_DESC_IRQ:
> >>>    			case ACC_PF_INT_DMA_MLD_DESC_IRQ:
> >>>    				deq_intr_det.queue_id =
> >> get_queue_id_from_ring_info(
> >>> -						dev->data, *ring_data);
> >>> +						dev->data, *ring_data,
> >> acc_dev->device_variant);
> >>>    				if (deq_intr_det.queue_id == UINT16_MAX) {
> >>>    					rte_bbdev_log(ERR,
> >>>    							"Couldn't find queue:
> >> aq_id: %u, qg_id: %u, vf_id: %u",
> >>> -							ring_data->aq_id,
> >>> -							ring_data->qg_id,
> >>> -							ring_data->vf_id);
> >>> +							aq_id, qg_id, vf_id);
> >>>    					return;
> >>>    				}
> >>>    				rte_bbdev_pmd_callback_process(dev,
> >>> @@ -403,9 +407,9 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >>>    			}
> >>>    		} else {
> >>>    			rte_bbdev_log_debug(
> >>> -					"VRB1 VF Interrupt received, Info Ring
> >> data: 0x%x\n",
> >>> +					"VRB VF Interrupt received, Info Ring
> >> data: 0x%x\n",
> >>>    					ring_data->val);
> >>> -			switch (ring_data->int_nb) {
> >>> +			switch (int_nb) {
> >>>    			case ACC_VF_INT_DMA_DL_DESC_IRQ:
> >>>    			case ACC_VF_INT_DMA_UL_DESC_IRQ:
> >>>    			case ACC_VF_INT_DMA_FFT_DESC_IRQ:
> >>> @@ -413,14 +417,13 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >>>    			case ACC_VF_INT_DMA_DL5G_DESC_IRQ:
> >>>    			case ACC_VF_INT_DMA_MLD_DESC_IRQ:
> >>>    				/* VFs are not aware of their vf_id - it's set to
> >> 0.  */
> >>> -				ring_data->vf_id = 0;
> >>> +				set_vf_in_ring(ring_data, acc_dev-
> >>> device_variant, 0);
> >>>    				deq_intr_det.queue_id =
> >> get_queue_id_from_ring_info(
> >>> -						dev->data, *ring_data);
> >>> +						dev->data, *ring_data,
> >> acc_dev->device_variant);
> >>>    				if (deq_intr_det.queue_id == UINT16_MAX) {
> >>>    					rte_bbdev_log(ERR,
> >>>    							"Couldn't find queue:
> >> aq_id: %u, qg_id: %u",
> >>> -							ring_data->aq_id,
> >>> -							ring_data->qg_id);
> >>> +							aq_id, qg_id);
> >>>    					return;
> >>>    				}
> >>>    				rte_bbdev_pmd_callback_process(dev,
> >>> @@ -435,8 +438,7 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >>>    		/* Initialize Info Ring entry and move forward. */
> >>>    		ring_data->val = 0;
> >>>    		++acc_dev->info_ring_head;
> >>> -		ring_data = acc_dev->info_ring +
> >>> -				(acc_dev->info_ring_head &
> >> ACC_INFO_RING_MASK);
> >>> +		ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
> >>> +ACC_INFO_RING_MASK);
> >>>    	}
> >>>    }
> >>>
> >>> @@ -556,8 +558,7 @@ vrb_setup_queues(struct rte_bbdev *dev, uint16_t
> >>> num_queues, int socket_id)
> >>>
> >>>    	/* Configure tail pointer for use when SDONE enabled. */
> >>>    	if (d->tail_ptrs == NULL)
> >>> -		d->tail_ptrs = rte_zmalloc_socket(
> >>> -				dev->device->driver->name,
> >>> +		d->tail_ptrs = rte_zmalloc_socket(dev->device->driver->name,
> >>>    				VRB_MAX_QGRPS * VRB_MAX_AQS *
> >> sizeof(uint32_t),
> >>>    				RTE_CACHE_LINE_SIZE, socket_id);
> >>>    	if (d->tail_ptrs == NULL) {
> >>> @@ -783,7 +784,7 @@ vrb_find_free_queue_idx(struct rte_bbdev *dev,
> >>>    			/* Mark the Queue as assigned. */
> >>>    			d->q_assigned_bit_map[group_idx] |= (1ULL <<
> >> aq_idx);
> >>>    			/* Report the AQ Index. */
> >>> -			return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx;
> >>> +			return queue_index(group_idx, aq_idx, d-
> >>> device_variant);
> >>>    		}
> >>>    	}
> >>>    	rte_bbdev_log(INFO, "Failed to find free queue on %s, priority
> >>> %u", @@ -922,9 +923,10 @@ vrb_queue_setup(struct rte_bbdev *dev,
> >>> uint16_t
> >> queue_id,
> >>>    		}
> >>>    	}
> >>>
> >>> -	q->qgrp_id = (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF;
> >>> -	q->vf_id = (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F;
> >>> -	q->aq_id = q_idx & 0xF;
> >>> +	q->qgrp_id = qg_from_q(q_idx, d->device_variant);
> >>> +	q->vf_id = vf_from_q(q_idx, d->device_variant);
> >>> +	q->aq_id = aq_from_q(q_idx, d->device_variant);
> >>> +
> >>>    	q->aq_depth = 0;
> >>>    	if (conf->op_type ==  RTE_BBDEV_OP_TURBO_DEC)
> >>>    		q->aq_depth = (1 << d->acc_conf.q_ul_4g.aq_depth_log2);
> >>> @@ -1311,7 +1313,7 @@ vrb_fcw_td_fill(const struct rte_bbdev_dec_op
> >> *op, struct acc_fcw_td *fcw)
> >>>    		fcw->bypass_teq = 0;
> >>>    	}
> >>>
> >>> -	fcw->code_block_mode = 1; /* FIXME */
> >>> +	fcw->code_block_mode = 1;
> >>
> >> Could you remind me what was the issue?
> >
> > Historically there was the intention to use a difference format option in the
> fcw to help with the TB mode but that is not considered anymore.
> 
> Ok.
> 
> >
> >>
> >>>    	fcw->turbo_crc_type = check_bit(op->turbo_dec.op_flags,
> >>>    			RTE_BBDEV_TURBO_CRC_TYPE_24B);
> >>>
> >>> @@ -1471,8 +1473,8 @@ vrb_dma_desc_td_fill(struct rte_bbdev_dec_op
> >> *op,
> >>>    	if (op->turbo_dec.code_block_mode ==
> >> RTE_BBDEV_TRANSPORT_BLOCK) {
> >>>    		k = op->turbo_dec.tb_params.k_pos;
> >>>    		e = (r < op->turbo_dec.tb_params.cab)
> >>> -			? op->turbo_dec.tb_params.ea
> >>> -			: op->turbo_dec.tb_params.eb;
> >>> +				? op->turbo_dec.tb_params.ea
> >>> +				: op->turbo_dec.tb_params.eb;
> >>>    	} else {
> >>>    		k = op->turbo_dec.cb_params.k;
> >>>    		e = op->turbo_dec.cb_params.e;
> >>> @@ -1726,7 +1728,7 @@ vrb_dma_desc_ld_update(struct
> >> rte_bbdev_dec_op *op,
> >>>    	desc->op_addr = op;
> >>>    }
> >>>
> >>> -/* Enqueue one encode operations for device in CB mode */
> >>> +/* Enqueue one encode operations for device in CB mode. */
> >>>    static inline int
> >>>    enqueue_enc_one_op_cb(struct acc_queue *q, struct
> >>> rte_bbdev_enc_op
> >> *op,
> >>>    		uint16_t total_enqueued_cbs)
> >>> @@ -2263,7 +2265,7 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct
> >> acc_queue *q, struct rte_bbdev_dec_op *op,
> >>>    	return current_enqueued_cbs;
> >>>    }
> >>>
> >>> -/* Enqueue one decode operations for device in TB mode */
> >>> +/* Enqueue one decode operations for device in TB mode. */
> >>>    static inline int
> >>>    enqueue_dec_one_op_tb(struct acc_queue *q, struct
> >>> rte_bbdev_dec_op
> >> *op,
> >>>    		uint16_t total_enqueued_cbs, uint8_t cbs_in_tb)
> >


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 06/12] baseband/acc: refactor to allow unified driver extension
  2023-10-04 21:28         ` Chautru, Nicolas
@ 2023-10-05 14:31           ` Maxime Coquelin
  2023-10-05 15:00             ` Chautru, Nicolas
  0 siblings, 1 reply; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-05 14:31 UTC (permalink / raw)
  To: Chautru, Nicolas, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan



On 10/4/23 23:28, Chautru, Nicolas wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Wednesday, October 4, 2023 12:36 AM
>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
>> <hernan.vargas@intel.com>
>> Subject: Re: [PATCH v3 06/12] baseband/acc: refactor to allow unified driver
>> extension
>>
>>
>>
>> On 10/3/23 20:54, Chautru, Nicolas wrote:
>>> Hi Maxime,
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> Sent: Tuesday, October 3, 2023 6:15 AM
>>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>>>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
>> Hernan
>>>> <hernan.vargas@intel.com>
>>>> Subject: Re: [PATCH v3 06/12] baseband/acc: refactor to allow unified
>>>> driver extension
>>>>
>>>> Thanks for doing the split, that will ease review.
>>>>
>>>> On 9/29/23 18:35, Nicolas Chautru wrote:
>>>>> Adding a few functions and common code prior to extending the VRB
>>>>> driver.
>>>>>
>>>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
>>>>> ---
>>>>>     drivers/baseband/acc/acc_common.h     | 164
>> +++++++++++++++++++++++-
>>>> --
>>>>>     drivers/baseband/acc/rte_acc100_pmd.c |   4 +-
>>>>>     drivers/baseband/acc/rte_vrb_pmd.c    |  62 +++++-----
>>>>>     3 files changed, 184 insertions(+), 46 deletions(-)
>>>>>
>>>>> diff --git a/drivers/baseband/acc/acc_common.h
>>>>> b/drivers/baseband/acc/acc_common.h
>>>>> index 788abf1a3c..89893eae43 100644
>>>>> --- a/drivers/baseband/acc/acc_common.h
>>>>> +++ b/drivers/baseband/acc/acc_common.h
>>>>> @@ -18,6 +18,7 @@
>>>>>     #define ACC_DMA_BLKID_OUT_HARQ      3
>>>>>     #define ACC_DMA_BLKID_IN_HARQ       3
>>>>>     #define ACC_DMA_BLKID_IN_MLD_R      3
>>>>> +#define ACC_DMA_BLKID_DEWIN_IN      3
>>>>>
>>>>>     /* Values used in filling in decode FCWs */
>>>>>     #define ACC_FCW_TD_VER              1
>>>>> @@ -103,6 +104,9 @@
>>>>>     #define ACC_MAX_NUM_QGRPS              32
>>>>>     #define ACC_RING_SIZE_GRANULARITY      64
>>>>>     #define ACC_MAX_FCW_SIZE              128
>>>>> +#define ACC_IQ_SIZE                    4
>>>>> +
>>>>> +#define ACC_FCW_FFT_BLEN_3             28
>>>>>
>>>>>     /* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2 */
>>>>>     #define ACC_N_ZC_1 66 /* N = 66 Zc for BG 1 */ @@ -132,6 +136,17 @@
>>>>>     #define ACC_LIM_21 14 /* 0.21 */
>>>>>     #define ACC_LIM_31 20 /* 0.31 */
>>>>>     #define ACC_MAX_E (128 * 1024 - 2)
>>>>> +#define ACC_MAX_CS 12
>>>>> +
>>>>> +#define ACC100_VARIANT          0
>>>>> +#define VRB1_VARIANT		2
>>>>> +#define VRB2_VARIANT		3
>>>>> +
>>>>> +/* Queue Index Hierarchy */
>>>>> +#define VRB1_GRP_ID_SHIFT    10
>>>>> +#define VRB1_VF_ID_SHIFT     4
>>>>> +#define VRB2_GRP_ID_SHIFT    12
>>>>> +#define VRB2_VF_ID_SHIFT     6
>>>>>
>>>>>     /* Helper macro for logging */
>>>>>     #define rte_acc_log(level, fmt, ...) \ @@ -332,6 +347,37 @@
>>>>> struct __rte_packed acc_fcw_fft {
>>>>>     		res:19;
>>>>>     };
>>>>>
>>>>> +/* FFT Frame Control Word. */
>>>>> +struct __rte_packed acc_fcw_fft_3 {
>>>>> +	uint32_t in_frame_size:16,
>>>>> +		leading_pad_size:16;
>>>>> +	uint32_t out_frame_size:16,
>>>>> +		leading_depad_size:16;
>>>>> +	uint32_t cs_window_sel;
>>>>> +	uint32_t cs_window_sel2:16,
>>>>> +		cs_enable_bmap:16;
>>>>> +	uint32_t num_antennas:8,
>>>>> +		idft_size:8,
>>>>> +		dft_size:8,
>>>>> +		cs_offset:8;
>>>>> +	uint32_t idft_shift:8,
>>>>> +		dft_shift:8,
>>>>> +		cs_multiplier:16;
>>>>> +	uint32_t bypass:2,
>>>>> +		fp16_in:1,
>>>>> +		fp16_out:1,
>>>>> +		exp_adj:4,
>>>>> +		power_shift:4,
>>>>> +		power_en:1,
>>>>> +		enable_dewin:1,
>>>>> +		freq_resample_mode:2,
>>>>> +		depad_output_size:16;
>>>>> +	uint16_t cs_theta_0[ACC_MAX_CS];
>>>>> +	uint32_t cs_theta_d[ACC_MAX_CS];
>>>>> +	int8_t cs_time_offset[ACC_MAX_CS]; };
>>>>> +
>>>>> +
>>>>>     /* MLD-TS Frame Control Word */
>>>>>     struct __rte_packed acc_fcw_mldts {
>>>>>     	uint32_t fcw_version:4,
>>>>> @@ -473,14 +519,14 @@ union acc_info_ring_data {
>>>>>     		uint16_t valid: 1;
>>>>>     	};
>>>>>     	struct {
>>>>> -		uint32_t aq_id_3: 6;
>>>>> -		uint32_t qg_id_3: 5;
>>>>> -		uint32_t vf_id_3: 6;
>>>>> -		uint32_t int_nb_3: 6;
>>>>> -		uint32_t msi_0_3: 1;
>>>>> -		uint32_t vf2pf_3: 6;
>>>>> -		uint32_t loop_3: 1;
>>>>> -		uint32_t valid_3: 1;
>>>>> +		uint32_t aq_id_vrb2: 6;
>>>>> +		uint32_t qg_id_vrb2: 5;
>>>>> +		uint32_t vf_id_vrb2: 6;
>>>>> +		uint32_t int_nb_vrb2: 6;
>>>>> +		uint32_t msi_0_vrb2: 1;
>>>>> +		uint32_t vf2pf_vrb2: 6;
>>>>> +		uint32_t loop_vrb2: 1;
>>>>> +		uint32_t valid_vrb2: 1;
>>>>>     	};
>>>>>     } __rte_packed;
>>>>>
>>>>> @@ -761,22 +807,105 @@ alloc_sw_rings_min_mem(struct rte_bbdev
>> *dev,
>>>> struct acc_device *d,
>>>>>     	free_base_addresses(base_addrs, i);
>>>>>     }
>>>>>
>>>>> +/* Wrapper to provide VF index from ring data. */ static inline
>>>>> +uint16_t vf_from_ring(const union acc_info_ring_data ring_data,
>>>>> +uint16_t device_variant) {
>>>>
>>>> curly braces on a new line.
>>>
>>> Thanks.
>>>
>>>>
>>>>> +	if (device_variant == VRB2_VARIANT)
>>>>> +		return ring_data.vf_id_vrb2;
>>>>> +	else
>>>>> +		return ring_data.vf_id;
>>>>> +}
>>>>> +
>>>>> +/* Wrapper to provide QG index from ring data. */ static inline
>>>>> +uint16_t qg_from_ring(const union acc_info_ring_data ring_data,
>>>>> +uint16_t device_variant) {
>>>>> +	if (device_variant == VRB2_VARIANT)
>>>>> +		return ring_data.qg_id_vrb2;
>>>>> +	else
>>>>> +		return ring_data.qg_id;
>>>>> +}
>>>>> +
>>>>> +/* Wrapper to provide AQ index from ring data. */ static inline
>>>>> +uint16_t aq_from_ring(const union acc_info_ring_data ring_data,
>>>>> +uint16_t device_variant) {
>>>>> +	if (device_variant == VRB2_VARIANT)
>>>>> +		return ring_data.aq_id_vrb2;
>>>>> +	else
>>>>> +		return ring_data.aq_id;
>>>>> +}
>>>>> +
>>>>> +/* Wrapper to provide int index from ring data. */ static inline
>>>>> +uint16_t int_from_ring(const union acc_info_ring_data ring_data,
>>>>> +uint16_t device_variant) {
>>>>> +	if (device_variant == VRB2_VARIANT)
>>>>> +		return ring_data.int_nb_vrb2;
>>>>> +	else
>>>>> +		return ring_data.int_nb;
>>>>> +}
>>>>> +
>>>>> +/* Wrapper to provide queue index from group and aq index. */
>>>>> +static inline int queue_index(uint16_t group_idx, uint16_t aq_idx,
>>>>> +uint16_t
>>>>> +device_variant) {
>>>>> +	if (device_variant == VRB2_VARIANT)
>>>>> +		return (group_idx << VRB2_GRP_ID_SHIFT) + aq_idx;
>>>>> +	else
>>>>> +		return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx; }
>>>>> +
>>>>> +/* Wrapper to provide queue group from queue index. */ static
>>>>> +inline int qg_from_q(uint32_t q_idx, uint16_t device_variant) {
>>>>> +	if (device_variant == VRB2_VARIANT)
>>>>> +		return (q_idx >> VRB2_GRP_ID_SHIFT) & 0x1F;
>>>>> +	else
>>>>> +		return (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF; }
>>>>> +
>>>>> +/* Wrapper to provide vf from queue index. */ static inline int32_t
>>>>> +vf_from_q(uint32_t q_idx, uint16_t device_variant) {
>>>>> +	if (device_variant == VRB2_VARIANT)
>>>>> +		return (q_idx >> VRB2_VF_ID_SHIFT)  & 0x3F;
>>>>> +	else
>>>>> +		return (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F; }
>>>>> +
>>>>> +/* Wrapper to provide aq index from queue index. */ static inline
>>>>> +int32_t aq_from_q(uint32_t q_idx, uint16_t device_variant) {
>>>>> +	if (device_variant == VRB2_VARIANT)
>>>>> +		return q_idx & 0x3F;
>>>>> +	else
>>>>> +		return q_idx & 0xF;
>>>>> +}
>>>>> +
>>>>> +/* Wrapper to set VF index in ring data. */ static inline int32_t
>>>>> +set_vf_in_ring(volatile union acc_info_ring_data *ring_data,
>>>>> +		uint16_t device_variant, uint16_t value) {
>>>>> +	if (device_variant == VRB2_VARIANT)
>>>>> +		return ring_data->vf_id_vrb2 = value;
>>>>> +	else
>>>>> +		return ring_data->vf_id = value;
>>>>> +}
>>>>> +
>>>>>     /*
>>>>>      * Find queue_id of a device queue based on details from the Info Ring.
>>>>>      * If a queue isn't found UINT16_MAX is returned.
>>>>>      */
>>>>>     static inline uint16_t
>>>>>     get_queue_id_from_ring_info(struct rte_bbdev_data *data,
>>>>> -		const union acc_info_ring_data ring_data)
>>>>> +		const union acc_info_ring_data ring_data, uint16_t
>>>> device_variant)
>>>>
>>>> As I suggested on v2:
>>>>
>>>> get_queue_id_from_ring_info(struct rte_bbdev_data *data,
>>>> 	const union acc_info_ring_data ring_data) {
>>>> 	struct acc_device *d = data->dev_private;
>>>>
>>>> 	...
>>>>
>>>> 	if (acc_q != NULL && acc_q->aq_id == aq_from_ring(d, ring_data) &&
>>>> ...
>>>>
>>>> }
>>>>
>>>> with
>>>>
>>>> /* Wrapper to provide AQ index from ring data. */ tatic inline
>>>> uint16_t aq_from_ring(struct acc_device *d, const union
>>>> acc_info_ring_data ring_data) {
>>>> 	if (d->device_variant == VRB2_VARIANT)
>>>> 		return ring_data.aq_id_vrb2;
>>>> 	else
>>>> 		return ring_data.aq_id;
>>>> }
>>>>
>>>
>>> I will change the get_queue_id_from_ring_info() to have a smaller
>>> prototype but I don’t plan on changing the other new underlying funs
>>> to use dev instead of the variant in prototype, I don’t see a reason
>>> to as these only need this very member.
>>
>> IMHO, reason is it cost nothing and is more future proof.
> 
> Thanks, on that very case I believe it the prototype is cleaner with the device variant. I don’t see future proof concern.
> 
>>
>> Also, my initial idea was to have an intermediate representation, like:
>>
>> struct acc_queue_info { // Not sure about the name
>> 	uint16_t vf_id;
>> 	uint8_t qgrp_id;
>> 	uint16_t aq_id;
>> };
>>
>> Then we have a single callback for each variant
>>
>> static void
>> vrb1_ring_data_to_queue_info(const union acc_info_ring_data ring_data,
>> 		struct acc_queue_info *queue_info)
>> {
>> 	queue_info->vf_id = ring_data.vf_id;
>> 	queue_info->qgrp_id = ...
>> }
>>
>> static void
>> vrb2_ring_data_to_queue_info(const union acc_info_ring_data ring_data,
>> 		struct acc_queue_info *queue_info)
>> {
>>
>> }
>>
>> The acc_queue_info struct can also be used in struct acc_queue, so we use
>> same format everywhere.
>>
>> I think it will be less verbose, and quicker to add new variants without risking to
>> miss adding "else if (d->device_variant == VRBx_VARIANT)"
>> anywhere.
>>
>> What do you think?
> 
> I think both would work. The intermediate structure may be a bit artificial, and it would have different members when getting info from queue or ring (ie. the int index). Also there is no reciprocal function, ie we set only the VF into the ring. And there is a location where we only need one of information not all of the other members.
> Again both are okay to me without super strong preference, so for now I would suggest to keep as is.

Ok, but please pass dev and not variant directly in the helpers.

>>
>>>
>>>>>     {
>>>>>     	uint16_t queue_id;
>>>>> +	struct acc_queue *acc_q;
>>>>>
>>>>>     	for (queue_id = 0; queue_id < data->num_queues; ++queue_id) {
>>>>> -		struct acc_queue *acc_q =
>>>>> -				data->queues[queue_id].queue_private;
>>>>> -		if (acc_q != NULL && acc_q->aq_id == ring_data.aq_id &&
>>>>> -				acc_q->qgrp_id == ring_data.qg_id &&
>>>>> -				acc_q->vf_id == ring_data.vf_id)
>>>>> +		acc_q = data->queues[queue_id].queue_private;
>>>>> +
>>>>> +		if (acc_q != NULL && acc_q->aq_id ==
>>>> aq_from_ring(ring_data, device_variant) &&
>>>>> +				acc_q->qgrp_id == qg_from_ring(ring_data,
>>>> device_variant) &&
>>>>> +				acc_q->vf_id == vf_from_ring(ring_data,
>>>> device_variant))
>>>>>     			return queue_id;
>>>>>     	}
>>>>>
>>>>> @@ -1438,4 +1567,11 @@ get_num_cbs_in_tb_ldpc_enc(struct
>>>> rte_bbdev_op_ldpc_enc *ldpc_enc)
>>>>>     	return cbs_in_tb;
>>>>>     }
>>>>>
>>>>> +static inline void
>>>>> +acc_reg_fast_write(struct acc_device *d, uint32_t offset, uint32_t
>>>>> +value) {
>>>>> +	void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset);
>>>>> +	mmio_write(reg_addr, value);
>>>>> +}
>>>>> +
>>>>>     #endif /* _ACC_COMMON_H_ */
>>>>> diff --git a/drivers/baseband/acc/rte_acc100_pmd.c
>>>>> b/drivers/baseband/acc/rte_acc100_pmd.c
>>>>> index 5362d39c30..7f8d05b5a9 100644
>>>>> --- a/drivers/baseband/acc/rte_acc100_pmd.c
>>>>> +++ b/drivers/baseband/acc/rte_acc100_pmd.c
>>>>> @@ -294,7 +294,7 @@ acc100_pf_interrupt_handler(struct rte_bbdev
>> *dev)
>>>>>     		case ACC100_PF_INT_DMA_UL5G_DESC_IRQ:
>>>>>     		case ACC100_PF_INT_DMA_DL5G_DESC_IRQ:
>>>>>     			deq_intr_det.queue_id =
>>>> get_queue_id_from_ring_info(
>>>>> -					dev->data, *ring_data);
>>>>> +					dev->data, *ring_data, acc100_dev-
>>>>> device_variant);
>>>>>     			if (deq_intr_det.queue_id == UINT16_MAX) {
>>>>>     				rte_bbdev_log(ERR,
>>>>>     						"Couldn't find queue: aq_id:
>>>> %u, qg_id: %u, vf_id: %u", @@
>>>>> -348,7 +348,7 @@ acc100_vf_interrupt_handler(struct rte_bbdev *dev)
>>>>>     			 */
>>>>>     			ring_data->vf_id = 0;
>>>>>     			deq_intr_det.queue_id =
>>>> get_queue_id_from_ring_info(
>>>>> -					dev->data, *ring_data);
>>>>> +					dev->data, *ring_data, acc100_dev-
>>>>> device_variant);
>>>>>     			if (deq_intr_det.queue_id == UINT16_MAX) {
>>>>>     				rte_bbdev_log(ERR,
>>>>>     						"Couldn't find queue: aq_id:
>>>> %u, qg_id: %u", diff --git
>>>>> a/drivers/baseband/acc/rte_vrb_pmd.c
>>>>> b/drivers/baseband/acc/rte_vrb_pmd.c
>>>>> index a1de012b40..c89c26c59a 100644
>>>>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
>>>>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
>>>>> @@ -341,17 +341,18 @@ static inline void
>>>>>     vrb_check_ir(struct acc_device *acc_dev)
>>>>>     {
>>>>>     	volatile union acc_info_ring_data *ring_data;
>>>>> -	uint16_t info_ring_head = acc_dev->info_ring_head;
>>>>> +	uint16_t info_ring_head = acc_dev->info_ring_head, int_nb;
>>>>>     	if (unlikely(acc_dev->info_ring == NULL))
>>>>>     		return;
>>>>>
>>>>>     	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
>>>>> ACC_INFO_RING_MASK);
>>>>>
>>>>>     	while (ring_data->valid) {
>>>>> -		if ((ring_data->int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
>>>>> -				ring_data->int_nb >
>>>> ACC_PF_INT_DMA_MLD_DESC_IRQ)) {
>>>>> +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
>>>>> +		if ((int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
>>>>> +				int_nb > ACC_PF_INT_DMA_MLD_DESC_IRQ))
>>>> {
>>>>>     			rte_bbdev_log(WARNING, "InfoRing: ITR:%d
>>>> Info:0x%x",
>>>>> -					ring_data->int_nb, ring_data-
>>>>> detailed_info);
>>>>> +					int_nb, ring_data->detailed_info);
>>>>>     			/* Initialize Info Ring entry and move forward. */
>>>>>     			ring_data->val = 0;
>>>>>     		}
>>>>> @@ -368,16 +369,21 @@ vrb_dev_interrupt_handler(void *cb_arg)
>>>>>     	struct acc_device *acc_dev = dev->data->dev_private;
>>>>>     	volatile union acc_info_ring_data *ring_data;
>>>>>     	struct acc_deq_intr_details deq_intr_det;
>>>>> +	uint16_t vf_id, aq_id, qg_id, int_nb;
>>>>>
>>>>>     	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
>>>>> ACC_INFO_RING_MASK);
>>>>>
>>>>>     	while (ring_data->valid) {
>>>>> +		vf_id = vf_from_ring(*ring_data, acc_dev->device_variant);
>>>>> +		aq_id = aq_from_ring(*ring_data, acc_dev->device_variant);
>>>>> +		qg_id = qg_from_ring(*ring_data, acc_dev->device_variant);
>>>>> +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
>>>>>     		if (acc_dev->pf_device) {
>>>>>     			rte_bbdev_log_debug(
>>>>> -					"VRB1 PF Interrupt received, Info Ring
>>>> data: 0x%x -> %d",
>>>>> -					ring_data->val, ring_data->int_nb);
>>>>> +					"PF Interrupt received, Info Ring data:
>>>> 0x%x -> %d",
>>>>> +					ring_data->val, int_nb);
>>>>>
>>>>> -			switch (ring_data->int_nb) {
>>>>> +			switch (int_nb) {
>>>>>     			case ACC_PF_INT_DMA_DL_DESC_IRQ:
>>>>>     			case ACC_PF_INT_DMA_UL_DESC_IRQ:
>>>>>     			case ACC_PF_INT_DMA_FFT_DESC_IRQ:
>>>>> @@ -385,13 +391,11 @@ vrb_dev_interrupt_handler(void *cb_arg)
>>>>>     			case ACC_PF_INT_DMA_DL5G_DESC_IRQ:
>>>>>     			case ACC_PF_INT_DMA_MLD_DESC_IRQ:
>>>>>     				deq_intr_det.queue_id =
>>>> get_queue_id_from_ring_info(
>>>>> -						dev->data, *ring_data);
>>>>> +						dev->data, *ring_data,
>>>> acc_dev->device_variant);
>>>>>     				if (deq_intr_det.queue_id == UINT16_MAX) {
>>>>>     					rte_bbdev_log(ERR,
>>>>>     							"Couldn't find queue:
>>>> aq_id: %u, qg_id: %u, vf_id: %u",
>>>>> -							ring_data->aq_id,
>>>>> -							ring_data->qg_id,
>>>>> -							ring_data->vf_id);
>>>>> +							aq_id, qg_id, vf_id);
>>>>>     					return;
>>>>>     				}
>>>>>     				rte_bbdev_pmd_callback_process(dev,
>>>>> @@ -403,9 +407,9 @@ vrb_dev_interrupt_handler(void *cb_arg)
>>>>>     			}
>>>>>     		} else {
>>>>>     			rte_bbdev_log_debug(
>>>>> -					"VRB1 VF Interrupt received, Info Ring
>>>> data: 0x%x\n",
>>>>> +					"VRB VF Interrupt received, Info Ring
>>>> data: 0x%x\n",
>>>>>     					ring_data->val);
>>>>> -			switch (ring_data->int_nb) {
>>>>> +			switch (int_nb) {
>>>>>     			case ACC_VF_INT_DMA_DL_DESC_IRQ:
>>>>>     			case ACC_VF_INT_DMA_UL_DESC_IRQ:
>>>>>     			case ACC_VF_INT_DMA_FFT_DESC_IRQ:
>>>>> @@ -413,14 +417,13 @@ vrb_dev_interrupt_handler(void *cb_arg)
>>>>>     			case ACC_VF_INT_DMA_DL5G_DESC_IRQ:
>>>>>     			case ACC_VF_INT_DMA_MLD_DESC_IRQ:
>>>>>     				/* VFs are not aware of their vf_id - it's set to
>>>> 0.  */
>>>>> -				ring_data->vf_id = 0;
>>>>> +				set_vf_in_ring(ring_data, acc_dev-
>>>>> device_variant, 0);
>>>>>     				deq_intr_det.queue_id =
>>>> get_queue_id_from_ring_info(
>>>>> -						dev->data, *ring_data);
>>>>> +						dev->data, *ring_data,
>>>> acc_dev->device_variant);
>>>>>     				if (deq_intr_det.queue_id == UINT16_MAX) {
>>>>>     					rte_bbdev_log(ERR,
>>>>>     							"Couldn't find queue:
>>>> aq_id: %u, qg_id: %u",
>>>>> -							ring_data->aq_id,
>>>>> -							ring_data->qg_id);
>>>>> +							aq_id, qg_id);
>>>>>     					return;
>>>>>     				}
>>>>>     				rte_bbdev_pmd_callback_process(dev,
>>>>> @@ -435,8 +438,7 @@ vrb_dev_interrupt_handler(void *cb_arg)
>>>>>     		/* Initialize Info Ring entry and move forward. */
>>>>>     		ring_data->val = 0;
>>>>>     		++acc_dev->info_ring_head;
>>>>> -		ring_data = acc_dev->info_ring +
>>>>> -				(acc_dev->info_ring_head &
>>>> ACC_INFO_RING_MASK);
>>>>> +		ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
>>>>> +ACC_INFO_RING_MASK);
>>>>>     	}
>>>>>     }
>>>>>
>>>>> @@ -556,8 +558,7 @@ vrb_setup_queues(struct rte_bbdev *dev, uint16_t
>>>>> num_queues, int socket_id)
>>>>>
>>>>>     	/* Configure tail pointer for use when SDONE enabled. */
>>>>>     	if (d->tail_ptrs == NULL)
>>>>> -		d->tail_ptrs = rte_zmalloc_socket(
>>>>> -				dev->device->driver->name,
>>>>> +		d->tail_ptrs = rte_zmalloc_socket(dev->device->driver->name,
>>>>>     				VRB_MAX_QGRPS * VRB_MAX_AQS *
>>>> sizeof(uint32_t),
>>>>>     				RTE_CACHE_LINE_SIZE, socket_id);
>>>>>     	if (d->tail_ptrs == NULL) {
>>>>> @@ -783,7 +784,7 @@ vrb_find_free_queue_idx(struct rte_bbdev *dev,
>>>>>     			/* Mark the Queue as assigned. */
>>>>>     			d->q_assigned_bit_map[group_idx] |= (1ULL <<
>>>> aq_idx);
>>>>>     			/* Report the AQ Index. */
>>>>> -			return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx;
>>>>> +			return queue_index(group_idx, aq_idx, d-
>>>>> device_variant);
>>>>>     		}
>>>>>     	}
>>>>>     	rte_bbdev_log(INFO, "Failed to find free queue on %s, priority
>>>>> %u", @@ -922,9 +923,10 @@ vrb_queue_setup(struct rte_bbdev *dev,
>>>>> uint16_t
>>>> queue_id,
>>>>>     		}
>>>>>     	}
>>>>>
>>>>> -	q->qgrp_id = (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF;
>>>>> -	q->vf_id = (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F;
>>>>> -	q->aq_id = q_idx & 0xF;
>>>>> +	q->qgrp_id = qg_from_q(q_idx, d->device_variant);
>>>>> +	q->vf_id = vf_from_q(q_idx, d->device_variant);
>>>>> +	q->aq_id = aq_from_q(q_idx, d->device_variant);
>>>>> +
>>>>>     	q->aq_depth = 0;
>>>>>     	if (conf->op_type ==  RTE_BBDEV_OP_TURBO_DEC)
>>>>>     		q->aq_depth = (1 << d->acc_conf.q_ul_4g.aq_depth_log2);
>>>>> @@ -1311,7 +1313,7 @@ vrb_fcw_td_fill(const struct rte_bbdev_dec_op
>>>> *op, struct acc_fcw_td *fcw)
>>>>>     		fcw->bypass_teq = 0;
>>>>>     	}
>>>>>
>>>>> -	fcw->code_block_mode = 1; /* FIXME */
>>>>> +	fcw->code_block_mode = 1;
>>>>
>>>> Could you remind me what was the issue?
>>>
>>> Historically there was the intention to use a difference format option in the
>> fcw to help with the TB mode but that is not considered anymore.
>>
>> Ok.
>>
>>>
>>>>
>>>>>     	fcw->turbo_crc_type = check_bit(op->turbo_dec.op_flags,
>>>>>     			RTE_BBDEV_TURBO_CRC_TYPE_24B);
>>>>>
>>>>> @@ -1471,8 +1473,8 @@ vrb_dma_desc_td_fill(struct rte_bbdev_dec_op
>>>> *op,
>>>>>     	if (op->turbo_dec.code_block_mode ==
>>>> RTE_BBDEV_TRANSPORT_BLOCK) {
>>>>>     		k = op->turbo_dec.tb_params.k_pos;
>>>>>     		e = (r < op->turbo_dec.tb_params.cab)
>>>>> -			? op->turbo_dec.tb_params.ea
>>>>> -			: op->turbo_dec.tb_params.eb;
>>>>> +				? op->turbo_dec.tb_params.ea
>>>>> +				: op->turbo_dec.tb_params.eb;
>>>>>     	} else {
>>>>>     		k = op->turbo_dec.cb_params.k;
>>>>>     		e = op->turbo_dec.cb_params.e;
>>>>> @@ -1726,7 +1728,7 @@ vrb_dma_desc_ld_update(struct
>>>> rte_bbdev_dec_op *op,
>>>>>     	desc->op_addr = op;
>>>>>     }
>>>>>
>>>>> -/* Enqueue one encode operations for device in CB mode */
>>>>> +/* Enqueue one encode operations for device in CB mode. */
>>>>>     static inline int
>>>>>     enqueue_enc_one_op_cb(struct acc_queue *q, struct
>>>>> rte_bbdev_enc_op
>>>> *op,
>>>>>     		uint16_t total_enqueued_cbs)
>>>>> @@ -2263,7 +2265,7 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct
>>>> acc_queue *q, struct rte_bbdev_dec_op *op,
>>>>>     	return current_enqueued_cbs;
>>>>>     }
>>>>>
>>>>> -/* Enqueue one decode operations for device in TB mode */
>>>>> +/* Enqueue one decode operations for device in TB mode. */
>>>>>     static inline int
>>>>>     enqueue_dec_one_op_tb(struct acc_queue *q, struct
>>>>> rte_bbdev_dec_op
>>>> *op,
>>>>>     		uint16_t total_enqueued_cbs, uint8_t cbs_in_tb)
>>>
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
  2023-10-04 21:18         ` Chautru, Nicolas
@ 2023-10-05 14:34           ` Maxime Coquelin
  2023-10-05 17:59             ` Chautru, Nicolas
  0 siblings, 1 reply; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-05 14:34 UTC (permalink / raw)
  To: Chautru, Nicolas, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan



On 10/4/23 23:18, Chautru, Nicolas wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Wednesday, October 4, 2023 12:11 AM
>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
>> <hernan.vargas@intel.com>
>> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
>>
>>
>>
>> On 10/3/23 20:20, Chautru, Nicolas wrote:
>>> Hi Maxime,
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> Sent: Tuesday, October 3, 2023 7:37 AM
>>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>>>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
>> Hernan
>>>> <hernan.vargas@intel.com>
>>>> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2
>>>> variant
>>>>
>>>>
>>>>
>>>> On 9/29/23 18:35, Nicolas Chautru wrote:
>>>>> Support for the FFT the processing specific to the
>>>>> VRB2 variant.
>>>>>
>>>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
>>>>> ---
>>>>>     drivers/baseband/acc/rte_vrb_pmd.c | 132
>>>> ++++++++++++++++++++++++++++-
>>>>>     1 file changed, 128 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
>>>>> b/drivers/baseband/acc/rte_vrb_pmd.c
>>>>> index 93add82947..ce4b90d8e7 100644
>>>>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
>>>>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
>>>>> @@ -903,6 +903,9 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t
>>>> queue_id,
>>>>>     			ACC_FCW_LD_BLEN : (conf->op_type ==
>>>> RTE_BBDEV_OP_FFT ?
>>>>>     			ACC_FCW_FFT_BLEN : ACC_FCW_MLDTS_BLEN))));
>>>>>
>>>>> +	if ((q->d->device_variant == VRB2_VARIANT) && (conf->op_type ==
>>>> RTE_BBDEV_OP_FFT))
>>>>> +		fcw_len = ACC_FCW_FFT_BLEN_3;
>>>>> +
>>>>>     	for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) {
>>>>>     		desc = q->ring_addr + desc_idx;
>>>>>     		desc->req.word0 = ACC_DMA_DESC_TYPE; @@ -1323,6
>>>> +1326,24 @@
>>>>> vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info
>>>> *dev_info)
>>>>>     			.num_buffers_soft_out = 0,
>>>>>     			}
>>>>>     		},
>>>>> +		{
>>>>> +			.type	= RTE_BBDEV_OP_FFT,
>>>>> +			.cap.fft = {
>>>>> +				.capability_flags =
>>>>> +
>>>> 	RTE_BBDEV_FFT_WINDOWING |
>>>>> +
>>>> 	RTE_BBDEV_FFT_CS_ADJUSTMENT |
>>>>> +
>>>> 	RTE_BBDEV_FFT_DFT_BYPASS |
>>>>> +
>>>> 	RTE_BBDEV_FFT_IDFT_BYPASS |
>>>>> +						RTE_BBDEV_FFT_FP16_INPUT
>>>> |
>>>>> +
>>>> 	RTE_BBDEV_FFT_FP16_OUTPUT |
>>>>> +
>>>> 	RTE_BBDEV_FFT_POWER_MEAS |
>>>>> +
>>>> 	RTE_BBDEV_FFT_WINDOWING_BYPASS,
>>>>> +				.num_buffers_src =
>>>>> +						1,
>>>>> +				.num_buffers_dst =
>>>>> +						1,
>>>>> +			}
>>>>> +		},
>>>>>     		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
>>>>>     	};
>>>>>
>>>>> @@ -3849,6 +3870,47 @@ vrb1_fcw_fft_fill(struct rte_bbdev_fft_op
>>>>> *op,
>>>> struct acc_fcw_fft *fcw)
>>>>>     		fcw->bypass = 0;
>>>>>     }
>>>>>
>>>>> +/* Fill in a frame control word for FFT processing. */ static
>>>>> +inline void vrb2_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct
>>>>> +acc_fcw_fft_3 *fcw) {
>>>>> +	fcw->in_frame_size = op->fft.input_sequence_size;
>>>>> +	fcw->leading_pad_size = op->fft.input_leading_padding;
>>>>> +	fcw->out_frame_size = op->fft.output_sequence_size;
>>>>> +	fcw->leading_depad_size = op->fft.output_leading_depadding;
>>>>> +	fcw->cs_window_sel = op->fft.window_index[0] +
>>>>> +			(op->fft.window_index[1] << 8) +
>>>>> +			(op->fft.window_index[2] << 16) +
>>>>> +			(op->fft.window_index[3] << 24);
>>>>> +	fcw->cs_window_sel2 = op->fft.window_index[4] +
>>>>> +			(op->fft.window_index[5] << 8);
>>>>> +	fcw->cs_enable_bmap = op->fft.cs_bitmap;
>>>>> +	fcw->num_antennas = op->fft.num_antennas_log2;
>>>>> +	fcw->idft_size = op->fft.idft_log2;
>>>>> +	fcw->dft_size = op->fft.dft_log2;
>>>>> +	fcw->cs_offset = op->fft.cs_time_adjustment;
>>>>> +	fcw->idft_shift = op->fft.idft_shift;
>>>>> +	fcw->dft_shift = op->fft.dft_shift;
>>>>> +	fcw->cs_multiplier = op->fft.ncs_reciprocal;
>>>>> +	fcw->power_shift = op->fft.power_shift; > +	fcw->exp_adj = op-
>>>>> fft.fp16_exp_adjust;
>>>>> +	fcw->fp16_in = check_bit(op->fft.op_flags,
>>>> RTE_BBDEV_FFT_FP16_INPUT);
>>>>> +	fcw->fp16_out = check_bit(op->fft.op_flags,
>>>> RTE_BBDEV_FFT_FP16_OUTPUT);
>>>>> +	fcw->power_en = check_bit(op->fft.op_flags,
>>>> RTE_BBDEV_FFT_POWER_MEAS);
>>>>> +	if (check_bit(op->fft.op_flags,
>>>>> +			RTE_BBDEV_FFT_IDFT_BYPASS)) {
>>>>> +		if (check_bit(op->fft.op_flags,
>>>>> +				RTE_BBDEV_FFT_WINDOWING_BYPASS))
>>>>> +			fcw->bypass = 2;
>>>>> +		else
>>>>> +			fcw->bypass = 1;
>>>>> +	} else if (check_bit(op->fft.op_flags,
>>>>> +			RTE_BBDEV_FFT_DFT_BYPASS))
>>>>> +		fcw->bypass = 3;
>>>>> +	else
>>>>> +		fcw->bypass = 0;
>>>>
>>>> The only difference I see with VRB1 are backed by corresponding
>>>> op_flags (POWER & FP16), is that correct? If so, it does not make
>>>> sense to me to have a specific function for VRB2.
>>>
>>> There are more changes but these are only formally enabled in the next
>>> stepping hence some of the related code is not included yet. More generally
>> the FCW and IP is different from VRB1 implementation.
>>
>> Currently, the code is almost identical so vrb1 implementation should be
>> reused. If some later changes makes the two implementations diverge, then we
>> can consider having a dedicated function for VRB2 at that time.
> 
> If I may, I believe this is best as-is notably for patches and support.
> The functions are fairly small (not much code overlap quantitatively) and the underlying IP is different
> (with more differences we can enable over time). I don’t think it would help anyone really to try to make them
> coexist for a small period of time.
> Does that sound fair?

I disagree, as I explained the code currently is almost identical, so
just share the code.

You will diverge, if *really* necessary, when it will make more sense to
have two separate functions. For now it is not the case in my opinion.

Thanks,
Maxime

> 
> 
>>
>>>>
>>>>> +}
>>>>> +
>>>>>     static inline int
>>>>>     vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
>>>>>     		struct acc_dma_req_desc *desc,
>>>>> @@ -3882,6 +3944,58 @@ vrb1_dma_desc_fft_fill(struct
>>>>> rte_bbdev_fft_op
>>>> *op,
>>>>>     	return 0;
>>>>>     }
>>>>>
>>>>> +static inline int
>>>>> +vrb2_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
>>>>> +		struct acc_dma_req_desc *desc,
>>>>> +		struct rte_mbuf *input, struct rte_mbuf *output, struct
>>>> rte_mbuf *win_input,
>>>>> +		struct rte_mbuf *pwr, uint32_t *in_offset, uint32_t
>>>> *out_offset,
>>>>> +		uint32_t *win_offset, uint32_t *pwr_offset) {
>>>>> +	bool pwr_en = check_bit(op->fft.op_flags,
>>>> RTE_BBDEV_FFT_POWER_MEAS);
>>>>> +	bool win_en = check_bit(op->fft.op_flags,
>>>> RTE_BBDEV_FFT_DEWINDOWING);
>>>>> +	int num_cs = 0, i, bd_idx = 1;
>>>>> +
>>>>> +	/* FCW already done */
>>>>> +	acc_header_init(desc);
>>>>> +
>>>>> +	RTE_SET_USED(win_input);
>>>>> +	RTE_SET_USED(win_offset);
>>>>> +
>>>>> +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(input,
>>>> *in_offset);
>>>>> +	desc->data_ptrs[bd_idx].blen = op->fft.input_sequence_size *
>>>> ACC_IQ_SIZE;
>>>>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_IN;
>>>>> +	desc->data_ptrs[bd_idx].last = 1;
>>>>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
>>>>> +	bd_idx++;
>>>>> +
>>>>> +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(output,
>>>> *out_offset);
>>>>> +	desc->data_ptrs[bd_idx].blen = op->fft.output_sequence_size *
>>>> ACC_IQ_SIZE;
>>>>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_HARD;
>>>>> +	desc->data_ptrs[bd_idx].last = pwr_en ? 0 : 1;
>>>>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
>>>>> +	desc->m2dlen = win_en ? 3 : 2;
>>>>> +	desc->d2mlen = pwr_en ? 2 : 1;
>>>>> +	desc->ib_ant_offset = op->fft.input_sequence_size;
>>>>> +	desc->num_ant = op->fft.num_antennas_log2 - 3;
>>>>> +
>>>>> +	for (i = 0; i < RTE_BBDEV_MAX_CS; i++)
>>>>> +		if (check_bit(op->fft.cs_bitmap, 1 << i))
>>>>> +			num_cs++;
>>>>> +	desc->num_cs = num_cs;
>>>>> +
>>>>> +	if (pwr_en && pwr) {
>>>>> +		bd_idx++;
>>>>> +		desc->data_ptrs[bd_idx].address =
>>>> rte_pktmbuf_iova_offset(pwr, *pwr_offset);
>>>>> +		desc->data_ptrs[bd_idx].blen = num_cs * (1 << op-
>>>>> fft.num_antennas_log2) * 4;
>>>>> +		desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_SOFT;
>>>>> +		desc->data_ptrs[bd_idx].last = 1;
>>>>> +		desc->data_ptrs[bd_idx].dma_ext = 0;
>>>>> +	}
>>>>> +	desc->ob_cyc_offset = op->fft.output_sequence_size;
>>>>> +	desc->ob_ant_offset = op->fft.output_sequence_size * num_cs;
>>>>> +	desc->op_addr = op;
>>>>> +	return 0;
>>>>> +}
>>>>>
>>>>>     /** Enqueue one FFT operation for device. */
>>>>>     static inline int
>>>>> @@ -3889,22 +4003,32 @@ vrb_enqueue_fft_one_op(struct acc_queue
>> *q,
>>>> struct rte_bbdev_fft_op *op,
>>>>>     		uint16_t total_enqueued_cbs)
>>>>>     {
>>>>>     	union acc_dma_desc *desc;
>>>>> -	struct rte_mbuf *input, *output;
>>>>> -	uint32_t in_offset, out_offset;
>>>>> +	struct rte_mbuf *input, *output, *pwr, *win;
>>>>> +	uint32_t in_offset, out_offset, pwr_offset, win_offset;
>>>>>     	struct acc_fcw_fft *fcw;
>>>>>
>>>>>     	desc = acc_desc(q, total_enqueued_cbs);
>>>>>     	input = op->fft.base_input.data;
>>>>>     	output = op->fft.base_output.data;
>>>>> +	pwr = op->fft.power_meas_output.data;
>>>>> +	win = op->fft.dewindowing_input.data;
>>>>>     	in_offset = op->fft.base_input.offset;
>>>>>     	out_offset = op->fft.base_output.offset;
>>>>> +	pwr_offset = op->fft.power_meas_output.offset;
>>>>> +	win_offset = op->fft.dewindowing_input.offset;
>>>>>
>>>>>     	fcw = (struct acc_fcw_fft *) (q->fcw_ring +
>>>>>     			((q->sw_ring_head + total_enqueued_cbs) & q-
>>>>> sw_ring_wrap_mask)
>>>>>     			* ACC_MAX_FCW_SIZE);
>>>>>
>>>>> -	vrb1_fcw_fft_fill(op, fcw);
>>>>> -	vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset,
>>>> &out_offset);
>>>>> +	if (q->d->device_variant == VRB1_VARIANT) {
>>>>> +		vrb1_fcw_fft_fill(op, fcw);
>>>>> +		vrb1_dma_desc_fft_fill(op, &desc->req, input, output,
>>>> &in_offset, &out_offset);
>>>>> +	} else {
>>>>> +		vrb2_fcw_fft_fill(op, (struct acc_fcw_fft_3 *) fcw);
>>>>> +		vrb2_dma_desc_fft_fill(op, &desc->req, input, output, win,
>>>> pwr,
>>>>> +				&in_offset, &out_offset, &win_offset,
>>>> &pwr_offset);
>>>>> +	}
>>>>>     #ifdef RTE_LIBRTE_BBDEV_DEBUG
>>>>>     	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
>>>>>     			sizeof(desc->req.fcw_fft));
>>>
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 08/12] baseband/acc: add FEC capabilities for the VRB2 variant
  2023-10-04 21:11     ` Chautru, Nicolas
@ 2023-10-05 14:36       ` Maxime Coquelin
  0 siblings, 0 replies; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-05 14:36 UTC (permalink / raw)
  To: Chautru, Nicolas, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan



On 10/4/23 23:11, Chautru, Nicolas wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Tuesday, October 3, 2023 7:28 AM
>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
>> <hernan.vargas@intel.com>
>> Subject: Re: [PATCH v3 08/12] baseband/acc: add FEC capabilities for the VRB2
>> variant
>>
>>
>>
>> On 9/29/23 18:35, Nicolas Chautru wrote:
>>> New implementation for some of the FEC features specific to the VRB2
>>> variant.
>>>
>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
>>> ---
>>>    drivers/baseband/acc/rte_vrb_pmd.c | 567
>> ++++++++++++++++++++++++++++-
>>>    1 file changed, 548 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
>>> b/drivers/baseband/acc/rte_vrb_pmd.c
>>> index 48e779ce77..93add82947 100644
>>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
>>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
>>> @@ -1235,6 +1235,94 @@ vrb_dev_info_get(struct rte_bbdev *dev, struct
>> rte_bbdev_driver_info *dev_info)
>>>    	};
>>>
>>>    	static const struct rte_bbdev_op_cap vrb2_bbdev_capabilities[] = {
>>> +		{
>>> +			.type = RTE_BBDEV_OP_TURBO_DEC,
>>> +			.cap.turbo_dec = {
>>> +				.capability_flags =
>>> +
>> 	RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE |
>>> +					RTE_BBDEV_TURBO_CRC_TYPE_24B |
>>> +
>> 	RTE_BBDEV_TURBO_DEC_CRC_24B_DROP |
>>> +					RTE_BBDEV_TURBO_EQUALIZER |
>>> +
>> 	RTE_BBDEV_TURBO_SOFT_OUT_SATURATE |
>>> +
>> 	RTE_BBDEV_TURBO_HALF_ITERATION_EVEN |
>>> +
>> 	RTE_BBDEV_TURBO_CONTINUE_CRC_MATCH |
>>> +					RTE_BBDEV_TURBO_SOFT_OUTPUT |
>>> +
>> 	RTE_BBDEV_TURBO_EARLY_TERMINATION |
>>> +
>> 	RTE_BBDEV_TURBO_DEC_INTERRUPTS |
>>> +
>> 	RTE_BBDEV_TURBO_NEG_LLR_1_BIT_IN |
>>> +
>> 	RTE_BBDEV_TURBO_NEG_LLR_1_BIT_SOFT_OUT |
>>> +					RTE_BBDEV_TURBO_MAP_DEC |
>>> +
>> 	RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP |
>>> +
>> 	RTE_BBDEV_TURBO_DEC_SCATTER_GATHER,
>>> +				.max_llr_modulus = INT8_MAX,
>>> +				.num_buffers_src =
>>> +
>> 	RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
>>> +				.num_buffers_hard_out =
>>> +
>> 	RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
>>> +				.num_buffers_soft_out =
>>> +
>> 	RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
>>> +			}
>>> +		},
>>> +		{
>>> +			.type = RTE_BBDEV_OP_TURBO_ENC,
>>> +			.cap.turbo_enc = {
>>> +				.capability_flags =
>>> +
>> 	RTE_BBDEV_TURBO_CRC_24B_ATTACH |
>>> +
>> 	RTE_BBDEV_TURBO_RV_INDEX_BYPASS |
>>> +					RTE_BBDEV_TURBO_RATE_MATCH |
>>> +
>> 	RTE_BBDEV_TURBO_ENC_INTERRUPTS |
>>> +
>> 	RTE_BBDEV_TURBO_ENC_SCATTER_GATHER,
>>> +				.num_buffers_src =
>>> +
>> 	RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
>>> +				.num_buffers_dst =
>>> +
>> 	RTE_BBDEV_TURBO_MAX_CODE_BLOCKS,
>>> +			}
>>> +		},
>>> +		{
>>> +			.type   = RTE_BBDEV_OP_LDPC_ENC,
>>> +			.cap.ldpc_enc = {
>>> +				.capability_flags =
>>> +					RTE_BBDEV_LDPC_RATE_MATCH |
>>> +					RTE_BBDEV_LDPC_CRC_24B_ATTACH
>> |
>>> +
>> 	RTE_BBDEV_LDPC_INTERLEAVER_BYPASS |
>>> +					RTE_BBDEV_LDPC_ENC_INTERRUPTS
>> |
>>> +
>> 	RTE_BBDEV_LDPC_ENC_SCATTER_GATHER |
>>> +
>> 	RTE_BBDEV_LDPC_ENC_CONCATENATION,
>>> +				.num_buffers_src =
>>> +
>> 	RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
>>> +				.num_buffers_dst =
>>> +
>> 	RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
>>> +			}
>>> +		},
>>> +		{
>>> +			.type   = RTE_BBDEV_OP_LDPC_DEC,
>>> +			.cap.ldpc_dec = {
>>> +			.capability_flags =
>>> +				RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK |
>>> +				RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP |
>>> +				RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK |
>>> +				RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK |
>>> +				RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE
>> |
>>> +
>> 	RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE |
>>> +				RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE
>> |
>>> +				RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS |
>>> +				RTE_BBDEV_LDPC_DEC_SCATTER_GATHER |
>>> +
>> 	RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION |
>>> +
>> 	RTE_BBDEV_LDPC_HARQ_4BIT_COMPRESSION |
>>> +				RTE_BBDEV_LDPC_LLR_COMPRESSION |
>>> +				RTE_BBDEV_LDPC_SOFT_OUT_ENABLE |
>>> +				RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS |
>>> +
>> 	RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS |
>>> +				RTE_BBDEV_LDPC_DEC_INTERRUPTS,
>>> +			.llr_size = 8,
>>> +			.llr_decimals = 2,
>>> +			.num_buffers_src =
>>> +
>> 	RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
>>> +			.num_buffers_hard_out =
>>> +
>> 	RTE_BBDEV_LDPC_MAX_CODE_BLOCKS,
>>> +			.num_buffers_soft_out = 0,
>>> +			}
>>> +		},
>>>    		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
>>>    	};
>>>
>>> @@ -1774,6 +1862,141 @@ vrb1_dma_desc_ld_fill(struct rte_bbdev_dec_op
>> *op,
>>>    	return 0;
>>>    }
>>>
>>> +/* Fill in a frame control word for LDPC decoding. */ static inline
>>> +void vrb2_fcw_ld_fill(struct rte_bbdev_dec_op *op, struct acc_fcw_ld
>>> +*fcw,
>>> +		union acc_harq_layout_data *harq_layout) {
>>> +	uint16_t harq_out_length, harq_in_length, ncb_p, k0_p, parity_offset;
>>> +	uint32_t harq_index;
>>> +	uint32_t l;
>>
>>
>> This is so similar with vrb1_fcw_ld_fill() that it does not make sense
>> to duplicate so much code.
>>
>> Do you confirm there are no other difference than the SOFT_OUT stuff,
>> and reusing vrb2_fcw_ld_fill on VRB1 would just work as the op_flags are
>> checked (and they should not be set if capability is not advertized)?
> 
> There are quite of lot of difference to the fundamental underlying IP, the  IP decoder is different with different tuning point, the SO and HARQ support are different.
> Still I believe we can support both in the same function without being a too much a problem moving forward. Doing this in v4.

Thanks,


> 
> 
>>
>>> +	fcw->qm = op->ldpc_dec.q_m;
>>> +	fcw->nfiller = op->ldpc_dec.n_filler;
>>> +	fcw->BG = (op->ldpc_dec.basegraph - 1);
>>> +	fcw->Zc = op->ldpc_dec.z_c;
>>> +	fcw->ncb = op->ldpc_dec.n_cb;
>>> +	fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_dec.basegraph,
>>> +			op->ldpc_dec.rv_index);
>>> +	if (op->ldpc_dec.code_block_mode == RTE_BBDEV_CODE_BLOCK)
>>> +		fcw->rm_e = op->ldpc_dec.cb_params.e;
>>> +	else
>>> +		fcw->rm_e = (op->ldpc_dec.tb_params.r <
>>> +				op->ldpc_dec.tb_params.cab) ?
>>> +						op->ldpc_dec.tb_params.ea :
>>> +						op->ldpc_dec.tb_params.eb;
>>> +
>>> +	if (unlikely(check_bit(op->ldpc_dec.op_flags,
>>> +			RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE) &&
>>> +			(op->ldpc_dec.harq_combined_input.length == 0))) {
>>> +		rte_bbdev_log(WARNING, "Null HARQ input size provided");
>>> +		/* Disable HARQ input in that case to carry forward. */
>>> +		op->ldpc_dec.op_flags ^=
>> RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE;
>>> +	}
>>> +	if (unlikely(fcw->rm_e == 0)) {
>>> +		rte_bbdev_log(WARNING, "Null E input provided");
>>> +		fcw->rm_e = 2;
>>> +	}
>>> +
>>> +	fcw->hcin_en = check_bit(op->ldpc_dec.op_flags,
>>> +			RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE);
>>> +	fcw->hcout_en = check_bit(op->ldpc_dec.op_flags,
>>> +			RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE);
>>> +	fcw->crc_select = check_bit(op->ldpc_dec.op_flags,
>>> +			RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK);
>>> +	fcw->so_en = check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_SOFT_OUT_ENABLE);
>>> +	fcw->so_bypass_intlv = check_bit(op->ldpc_dec.op_flags,
>>> +
>> 	RTE_BBDEV_LDPC_SOFT_OUT_DEINTERLEAVER_BYPASS);
>>> +	fcw->so_bypass_rm = check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_SOFT_OUT_RM_BYPASS);
>>> +	fcw->bypass_dec = 0;
>>> +	fcw->bypass_intlv = check_bit(op->ldpc_dec.op_flags,
>>> +			RTE_BBDEV_LDPC_DEINTERLEAVER_BYPASS);
>>> +	if (op->ldpc_dec.q_m == 1) {
>>> +		fcw->bypass_intlv = 1;
>>> +		fcw->qm = 2;
>>> +	}
>>> +	if (check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_HARQ_6BIT_COMPRESSION)) {
>>> +		fcw->hcin_decomp_mode = 1;
>>> +		fcw->hcout_comp_mode = 1;
>>> +	} else if (check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_HARQ_4BIT_COMPRESSION)) {
>>> +		fcw->hcin_decomp_mode = 4;
>>> +		fcw->hcout_comp_mode = 4;
>>> +	} else {
>>> +		fcw->hcin_decomp_mode = 0;
>>> +		fcw->hcout_comp_mode = 0;
>>> +	}
>>> +
>>> +	fcw->llr_pack_mode = check_bit(op->ldpc_dec.op_flags,
>>> +			RTE_BBDEV_LDPC_LLR_COMPRESSION);
>>> +	harq_index = hq_index(op->ldpc_dec.harq_combined_output.offset);
>>> +	if (fcw->hcin_en > 0) {
>>> +		harq_in_length = op->ldpc_dec.harq_combined_input.length;
>>> +		if (fcw->hcin_decomp_mode == 1)
>>> +			harq_in_length = harq_in_length * 8 / 6;
>>> +		else if (fcw->hcin_decomp_mode == 4)
>>> +			harq_in_length = harq_in_length * 2;
>>> +		harq_in_length = RTE_MIN(harq_in_length, op->ldpc_dec.n_cb
>>> +				- op->ldpc_dec.n_filler);
>>> +		harq_in_length = RTE_ALIGN_CEIL(harq_in_length, 64);
>>> +		fcw->hcin_size0 = harq_in_length;
>>> +		fcw->hcin_offset = 0;
>>> +		fcw->hcin_size1 = 0;
>>> +	} else {
>>> +		fcw->hcin_size0 = 0;
>>> +		fcw->hcin_offset = 0;
>>> +		fcw->hcin_size1 = 0;
>>> +	}
>>> +
>>> +	fcw->itmax = op->ldpc_dec.iter_max;
>>> +	fcw->so_it = op->ldpc_dec.iter_max;
>>> +	fcw->itstop = check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE);
>>> +	fcw->cnu_algo = ACC_ALGO_MSA;
>>> +	fcw->synd_precoder = fcw->itstop;
>>> +
>>> +	fcw->minsum_offset = 1;
>>> +	fcw->dec_llrclip   = 2;
>>> +
>>> +	/*
>>> +	 * These are all implicitly set
>>> +	 * fcw->synd_post = 0;
>>> +	 * fcw->dec_convllr = 0;
>>> +	 * fcw->hcout_convllr = 0;
>>> +	 * fcw->hcout_size1 = 0;
>>> +	 * fcw->hcout_offset = 0;
>>> +	 * fcw->negstop_th = 0;
>>> +	 * fcw->negstop_it = 0;
>>> +	 * fcw->negstop_en = 0;
>>> +	 * fcw->gain_i = 1;
>>> +	 * fcw->gain_h = 1;
>>> +	 */
>>> +	if (fcw->hcout_en > 0) {
>>> +		parity_offset = (op->ldpc_dec.basegraph == 1 ? 20 : 8)
>>> +			* op->ldpc_dec.z_c - op->ldpc_dec.n_filler;
>>> +		k0_p = (fcw->k0 > parity_offset) ?
>>> +				fcw->k0 - op->ldpc_dec.n_filler : fcw->k0;
>>> +		ncb_p = fcw->ncb - op->ldpc_dec.n_filler;
>>> +		l = k0_p + fcw->rm_e;
>>> +		harq_out_length = (uint16_t) fcw->hcin_size0;
>>> +		harq_out_length = RTE_MIN(RTE_MAX(harq_out_length, l),
>> ncb_p);
>>> +		harq_out_length = RTE_ALIGN_CEIL(harq_out_length, 64);
>>> +		fcw->hcout_size0 = harq_out_length;
>>> +		fcw->hcout_size1 = 0;
>>> +		fcw->hcout_offset = 0;
>>> +		harq_layout[harq_index].offset = fcw->hcout_offset;
>>> +		harq_layout[harq_index].size0 = fcw->hcout_size0;
>>> +	} else {
>>> +		fcw->hcout_size0 = 0;
>>> +		fcw->hcout_size1 = 0;
>>> +		fcw->hcout_offset = 0;
>>> +	}
>>> +
>>> +	fcw->tb_crc_select = 0;
>>> +	if (check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK))
>>> +		fcw->tb_crc_select = 2;
>>> +	if (check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_CRC_TYPE_16_CHECK))
>>> +		fcw->tb_crc_select = 1;
>>> +}
>>> +
>>>    static inline void
>>>    vrb_dma_desc_ld_update(struct rte_bbdev_dec_op *op,
>>>    		struct acc_dma_req_desc *desc,
>>> @@ -1817,6 +2040,139 @@ vrb_dma_desc_ld_update(struct
>> rte_bbdev_dec_op *op,
>>>    	desc->op_addr = op;
>>>    }
>>>
>>> +static inline int
>>> +vrb2_dma_desc_ld_fill(struct rte_bbdev_dec_op *op,
>>> +		struct acc_dma_req_desc *desc,
>>> +		struct rte_mbuf **input, struct rte_mbuf *h_output,
>>> +		uint32_t *in_offset, uint32_t *h_out_offset,
>>> +		uint32_t *h_out_length, uint32_t *mbuf_total_left,
>>> +		uint32_t *seg_total_left, struct acc_fcw_ld *fcw)
>>> +{
>> Same here.
>>
>> I compared with vrb1_dma_desc_ld_fill(), and I don't see why we need two
>> functions.
>>
>> The only differences are either backed by capability checks, and vrb1
>> already sets fcw->hcin_decomp_mode, so this code should work as-is on
>> vrb1 if I'm not mistaken.
> 
> Yes fair enough, doing this in v3.

Thanks.

> 
>>
>>> +	struct rte_bbdev_op_ldpc_dec *dec = &op->ldpc_dec;
>>> +	int next_triplet = 1; /* FCW already done. */
>>> +	uint32_t input_length;
>>> +	uint16_t output_length, crc24_overlap = 0;
>>> +	uint16_t sys_cols, K, h_p_size, h_np_size;
>>> +
>>> +	acc_header_init(desc);
>>> +
>>> +	if (check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP))
>>> +		crc24_overlap = 24;
>>> +
>>> +	/* Compute some LDPC BG lengths. */
>>> +	input_length = fcw->rm_e;
>>> +	if (check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_LLR_COMPRESSION))
>>> +		input_length = (input_length * 3 + 3) / 4;
>>> +	sys_cols = (dec->basegraph == 1) ? 22 : 10;
>>> +	K = sys_cols * dec->z_c;
>>> +	output_length = K - dec->n_filler - crc24_overlap;
>>> +
>>> +	if (unlikely((*mbuf_total_left == 0) || (*mbuf_total_left <
>> input_length))) {
>>> +		rte_bbdev_log(ERR,
>>> +				"Mismatch between mbuf length and included
>> CB sizes: mbuf len %u, cb len %u",
>>> +				*mbuf_total_left, input_length);
>>> +		return -1;
>>> +	}
>>> +
>>> +	next_triplet = acc_dma_fill_blk_type_in(desc, input,
>>> +			in_offset, input_length,
>>> +			seg_total_left, next_triplet,
>>> +			check_bit(op->ldpc_dec.op_flags,
>>> +			RTE_BBDEV_LDPC_DEC_SCATTER_GATHER));
>>> +
>>> +	if (unlikely(next_triplet < 0)) {
>>> +		rte_bbdev_log(ERR,
>>> +				"Mismatch between data to process and mbuf
>> data length in bbdev_op: %p",
>>> +				op);
>>> +		return -1;
>>> +	}
>>> +
>>> +	if (check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE)) {
>>> +		if (op->ldpc_dec.harq_combined_input.data == 0) {
>>> +			rte_bbdev_log(ERR, "HARQ input is not defined");
>>> +			return -1;
>>> +		}
>>> +		h_p_size = fcw->hcin_size0 + fcw->hcin_size1;
>>> +		if (fcw->hcin_decomp_mode == 1)
>>> +			h_p_size = (h_p_size * 3 + 3) / 4;
>>> +		else if (fcw->hcin_decomp_mode == 4)
>>> +			h_p_size = h_p_size / 2;
>>> +		if (op->ldpc_dec.harq_combined_input.data == 0) {
>>> +			rte_bbdev_log(ERR, "HARQ input is not defined");
>>> +			return -1;
>>> +		}
>>> +		acc_dma_fill_blk_type(
>>> +				desc,
>>> +				op->ldpc_dec.harq_combined_input.data,
>>> +				op->ldpc_dec.harq_combined_input.offset,
>>> +				h_p_size,
>>> +				next_triplet,
>>> +				ACC_DMA_BLKID_IN_HARQ);
>>> +		next_triplet++;
>>> +	}
>>> +
>>> +	desc->data_ptrs[next_triplet - 1].last = 1;
>>> +	desc->m2dlen = next_triplet;
>>> +	*mbuf_total_left -= input_length;
>>> +
>>> +	next_triplet = acc_dma_fill_blk_type(desc, h_output,
>>> +			*h_out_offset, output_length >> 3, next_triplet,
>>> +			ACC_DMA_BLKID_OUT_HARD);
>>> +
>>> +	if (check_bit(op->ldpc_dec.op_flags,
>> RTE_BBDEV_LDPC_SOFT_OUT_ENABLE)) {
>>> +		if (op->ldpc_dec.soft_output.data == 0) {
>>> +			rte_bbdev_log(ERR, "Soft output is not defined");
>>> +			return -1;
>>> +		}
>>> +		dec->soft_output.length = fcw->rm_e;
>>> +		acc_dma_fill_blk_type(desc, dec->soft_output.data, dec-
>>> soft_output.offset,
>>> +				fcw->rm_e, next_triplet,
>> ACC_DMA_BLKID_OUT_SOFT);
>>> +		next_triplet++;
>>> +	}
>>> +
>>> +	if (check_bit(op->ldpc_dec.op_flags,
>>> +
>> 	RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE)) {
>>> +		if (op->ldpc_dec.harq_combined_output.data == 0) {
>>> +			rte_bbdev_log(ERR, "HARQ output is not defined");
>>> +			return -1;
>>> +		}
>>> +
>>> +		/* Pruned size of the HARQ */
>>> +		h_p_size = fcw->hcout_size0 + fcw->hcout_size1;
>>> +		/* Non-Pruned size of the HARQ */
>>> +		h_np_size = fcw->hcout_offset > 0 ?
>>> +				fcw->hcout_offset + fcw->hcout_size1 :
>>> +				h_p_size;
>>> +		if (fcw->hcin_decomp_mode == 1) {
>>> +			h_np_size = (h_np_size * 3 + 3) / 4;
>>> +			h_p_size = (h_p_size * 3 + 3) / 4;
>>> +		} else if (fcw->hcin_decomp_mode == 4) {
>>> +			h_np_size = h_np_size / 2;
>>> +			h_p_size = h_p_size / 2;
>>> +		}
>>> +		dec->harq_combined_output.length = h_np_size;
>>> +		acc_dma_fill_blk_type(
>>> +				desc,
>>> +				dec->harq_combined_output.data,
>>> +				dec->harq_combined_output.offset,
>>> +				h_p_size,
>>> +				next_triplet,
>>> +				ACC_DMA_BLKID_OUT_HARQ);
>>> +
>>> +		next_triplet++;
>>> +	}
>>> +
>>> +	*h_out_length = output_length >> 3;
>>> +	dec->hard_output.length += *h_out_length;
>>> +	*h_out_offset += *h_out_length;
>>> +	desc->data_ptrs[next_triplet - 1].last = 1;
>>> +	desc->d2mlen = next_triplet - desc->m2dlen;
>>> +
>>> +	desc->op_addr = op;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>    /* Enqueue one encode operations for device in CB mode. */
>>>    static inline int
>>>    enqueue_enc_one_op_cb(struct acc_queue *q, struct rte_bbdev_enc_op
>> *op,
>>> @@ -1877,6 +2233,7 @@ enqueue_ldpc_enc_n_op_cb(struct acc_queue *q,
>> struct rte_bbdev_enc_op **ops,
>>>    	/** This could be done at polling. */
>>>    	acc_header_init(&desc->req);
>>>    	desc->req.numCBs = num;
>>> +	desc->req.dltb = 0;
>>>
>>>    	in_length_in_bytes = ops[0]->ldpc_enc.input.data->data_len;
>>>    	out_length = (enc->cb_params.e + 7) >> 3;
>>> @@ -2102,6 +2459,105 @@ vrb1_enqueue_ldpc_enc_one_op_tb(struct
>> acc_queue *q, struct rte_bbdev_enc_op *op
>>>    	return return_descs;
>>>    }
>>>
>>> +/* Fill in a frame control word for LDPC encoding. */
>>> +static inline void
>>> +vrb2_fcw_letb_fill(const struct rte_bbdev_enc_op *op, struct acc_fcw_le
>> *fcw)
>>> +{
>>> +	fcw->qm = op->ldpc_enc.q_m;
>>> +	fcw->nfiller = op->ldpc_enc.n_filler;
>>> +	fcw->BG = (op->ldpc_enc.basegraph - 1);
>>> +	fcw->Zc = op->ldpc_enc.z_c;
>>> +	fcw->ncb = op->ldpc_enc.n_cb;
>>> +	fcw->k0 = get_k0(fcw->ncb, fcw->Zc, op->ldpc_enc.basegraph,
>>> +			op->ldpc_enc.rv_index);
>>> +	fcw->rm_e = op->ldpc_enc.tb_params.ea;
>>> +	fcw->rm_e_b = op->ldpc_enc.tb_params.eb;
>>> +	fcw->crc_select = check_bit(op->ldpc_enc.op_flags,
>>> +			RTE_BBDEV_LDPC_CRC_24B_ATTACH);
>>> +	fcw->bypass_intlv = 0;
>>> +	if (op->ldpc_enc.tb_params.c > 1) {
>>> +		fcw->mcb_count = 0;
>>> +		fcw->C = op->ldpc_enc.tb_params.c;
>>> +		fcw->Cab = op->ldpc_enc.tb_params.cab;
>>> +	} else {
>>> +		fcw->mcb_count = 1;
>>> +		fcw->C = 0;
>>> +	}
>>> +}
>>> +
>>> +/* Enqueue one encode operations for device in TB mode.
>>> + * returns the number of descs used.
>>> + */
>>> +static inline int
>>> +vrb2_enqueue_ldpc_enc_one_op_tb(struct acc_queue *q, struct
>> rte_bbdev_enc_op *op,
>>> +		uint16_t enq_descs)
>>> +{
>>> +	union acc_dma_desc *desc = NULL;
>>> +	uint32_t in_offset, out_offset, out_length, seg_total_left;
>>> +	struct rte_mbuf *input, *output_head, *output;
>>> +
>>> +	uint16_t desc_idx = ((q->sw_ring_head + enq_descs) & q-
>>> sw_ring_wrap_mask);
>>> +	desc = q->ring_addr + desc_idx;
>>
>> Use acc_desc()?
> 
> thanks
> 
>>
>>> +	vrb2_fcw_letb_fill(op, &desc->req.fcw_le);
>>> +	struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc;
>>> +	int next_triplet = 1; /* FCW already done */
>>> +	uint32_t in_length_in_bytes;
>>> +	uint16_t K, in_length_in_bits;
>>> +
>>> +	input = enc->input.data;
>>> +	output_head = output = enc->output.data;
>>> +	in_offset = enc->input.offset;
>>> +	out_offset = enc->output.offset;
>>> +	seg_total_left = rte_pktmbuf_data_len(enc->input.data) - in_offset;
>>> +
>>> +	acc_header_init(&desc->req);
>>> +	K = (enc->basegraph == 1 ? 22 : 10) * enc->z_c;
>>> +	in_length_in_bits = K - enc->n_filler;
>>> +	if ((enc->op_flags & RTE_BBDEV_LDPC_CRC_24A_ATTACH) ||
>>> +			(enc->op_flags &
>> RTE_BBDEV_LDPC_CRC_24B_ATTACH))
>>> +		in_length_in_bits -= 24;
>>> +	in_length_in_bytes = (in_length_in_bits >> 3) * enc->tb_params.c;
>>> +
>>> +	next_triplet = acc_dma_fill_blk_type_in(&desc->req, &input,
>> &in_offset,
>>> +			in_length_in_bytes, &seg_total_left, next_triplet,
>>> +			check_bit(enc->op_flags,
>> RTE_BBDEV_LDPC_ENC_SCATTER_GATHER));
>>> +	if (unlikely(next_triplet < 0)) {
>>> +		rte_bbdev_log(ERR,
>>> +				"Mismatch between data to process and mbuf
>> data length in bbdev_op: %p",
>>> +				op);
>>> +		return -1;
>>> +	}
>>> +	desc->req.data_ptrs[next_triplet - 1].last = 1;
>>> +	desc->req.m2dlen = next_triplet;
>>> +
>>> +	/* Set output length */
>>> +	/* Integer round up division by 8 */
>>> +	out_length = (enc->tb_params.ea * enc->tb_params.cab +
>>> +			enc->tb_params.eb * (enc->tb_params.c - enc-
>>> tb_params.cab)  + 7) >> 3;
>>> +
>>> +	next_triplet = acc_dma_fill_blk_type(&desc->req, output, out_offset,
>>> +			out_length, next_triplet, ACC_DMA_BLKID_OUT_ENC);
>>> +	enc->output.length = out_length;
>>> +	out_offset += out_length;
>>> +	desc->req.data_ptrs[next_triplet - 1].last = 1;
>>> +	desc->req.data_ptrs[next_triplet - 1].dma_ext = 0;
>>> +	desc->req.d2mlen = next_triplet - desc->req.m2dlen;
>>> +	desc->req.numCBs = enc->tb_params.c;
>>> +	if (desc->req.numCBs > 1)
>>> +		desc->req.dltb = 1;
>>> +	desc->req.op_addr = op;
>>> +
>>> +	if (out_length < ACC_MAX_E_MBUF)
>>> +		mbuf_append(output_head, output, out_length);
>>> +
>>> +#ifdef RTE_LIBRTE_BBDEV_DEBUG
>>> +	rte_memdump(stderr, "FCW", &desc->req.fcw_le, sizeof(desc-
>>> req.fcw_le));
>>> +	rte_memdump(stderr, "Req Desc.", desc, sizeof(*desc));
>>> +#endif
>>> +	/* One CB (one op) was successfully prepared to enqueue */
>>> +	return 1;
>>
>> This function is quite different from the VRB1 variant.
>> Is the underlying hardware completely different, or just a different
>> implementation?
> 
> The underlying HW is different in this mode of operation, notably as it
> supports RTE_BBDEV_LDPC_ENC_CONCATENATION hence more of true TB
> implementation.
> Kept separate on purpose.

Ack, makes sense here.

>>
>>> +}
>>> +
>>>    /** Enqueue one decode operations for device in CB mode. */
>>>    static inline int
>>>    enqueue_dec_one_op_cb(struct acc_queue *q, struct rte_bbdev_dec_op
>> *op,
>>> @@ -2215,10 +2671,16 @@ vrb_enqueue_ldpc_dec_one_op_cb(struct
>> acc_queue *q, struct rte_bbdev_dec_op *op,
>>>    		else
>>>    			seg_total_left = fcw->rm_e;
>>>
>>> -		ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input, h_output,
>>> -				&in_offset, &h_out_offset,
>>> -				&h_out_length, &mbuf_total_left,
>>> -				&seg_total_left, fcw);
>>> +		if (q->d->device_variant == VRB1_VARIANT)
>>> +			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
>> h_output,
>>> +					&in_offset, &h_out_offset,
>>> +					&h_out_length, &mbuf_total_left,
>>> +					&seg_total_left, fcw);
>>> +		else
>>> +			ret = vrb2_dma_desc_ld_fill(op, &desc->req, &input,
>> h_output,
>>> +					&in_offset, &h_out_offset,
>>> +					&h_out_length, &mbuf_total_left,
>>> +					&seg_total_left, fcw);
>>>    		if (unlikely(ret < 0))
>>>    			return ret;
>>>    	}
>>> @@ -2308,11 +2770,18 @@ vrb_enqueue_ldpc_dec_one_op_tb(struct
>> acc_queue *q, struct rte_bbdev_dec_op *op,
>>>    		rte_memcpy(&desc->req.fcw_ld, &desc_first->req.fcw_ld,
>> ACC_FCW_LD_BLEN);
>>>    		desc->req.fcw_ld.tb_trailer_size = (c - r - 1) * trail_len;
>>>
>>> -		ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
>>> -				h_output, &in_offset, &h_out_offset,
>>> -				&h_out_length,
>>> -				&mbuf_total_left, &seg_total_left,
>>> -				&desc->req.fcw_ld);
>>> +		if (q->d->device_variant == VRB1_VARIANT)
>>> +			ret = vrb1_dma_desc_ld_fill(op, &desc->req, &input,
>>> +					h_output, &in_offset, &h_out_offset,
>>> +					&h_out_length,
>>> +					&mbuf_total_left, &seg_total_left,
>>> +					&desc->req.fcw_ld);
>>> +		else
>>> +			ret = vrb2_dma_desc_ld_fill(op, &desc->req, &input,
>>> +					h_output, &in_offset, &h_out_offset,
>>> +					&h_out_length,
>>> +					&mbuf_total_left, &seg_total_left,
>>> +					&desc->req.fcw_ld);
>>>
>>>    		if (unlikely(ret < 0))
>>>    			return ret;
>>> @@ -2576,14 +3045,22 @@ vrb_enqueue_ldpc_enc_tb(struct
>> rte_bbdev_queue_data *q_data,
>>>    	int descs_used;
>>>
>>>    	for (i = 0; i < num; ++i) {
>>> -		cbs_in_tb = get_num_cbs_in_tb_ldpc_enc(&ops[i]->ldpc_enc);
>>> -		/* Check if there are available space for further processing. */
>>> -		if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) {
>>> -			acc_enqueue_ring_full(q_data);
>>> -			break;
>>> +		if (q->d->device_variant == VRB1_VARIANT) {
>>> +			cbs_in_tb = get_num_cbs_in_tb_ldpc_enc(&ops[i]-
>>> ldpc_enc);
>>> +			/* Check if there are available space for further
>> processing. */
>>> +			if (unlikely((avail - cbs_in_tb < 0) || (cbs_in_tb == 0))) {
>>> +				acc_enqueue_ring_full(q_data);
>>> +				break;
>>> +			}
>>> +			descs_used = vrb1_enqueue_ldpc_enc_one_op_tb(q,
>> ops[i],
>>> +					enqueued_descs, cbs_in_tb);
>>> +		} else {
>>> +			if (unlikely(avail < 1)) {
>>> +				acc_enqueue_ring_full(q_data);
>>> +				break;
>>> +			}
>>> +			descs_used = vrb2_enqueue_ldpc_enc_one_op_tb(q,
>> ops[i], enqueued_descs);
>>>    		}
>>> -
>>> -		descs_used = vrb1_enqueue_ldpc_enc_one_op_tb(q, ops[i],
>> enqueued_descs, cbs_in_tb);
>>>    		if (descs_used < 0) {
>>>    			acc_enqueue_invalid(q_data);
>>>    			break;
>>> @@ -2865,6 +3342,52 @@ vrb_dequeue_enc_one_op_cb(struct acc_queue
>> *q, struct rte_bbdev_enc_op **ref_op,
>>>    	return desc->req.numCBs;
>>>    }
>>>
>>> +/* Dequeue one LDPC encode operations from VRB2 device in TB mode. */
>>> +static inline int
>>> +vrb2_dequeue_ldpc_enc_one_op_tb(struct acc_queue *q, struct
>> rte_bbdev_enc_op **ref_op,
>>> +		uint16_t *dequeued_ops, uint32_t *aq_dequeued,
>>> +		uint16_t *dequeued_descs)
>>> +{
>>> +	union acc_dma_desc *desc, atom_desc;
>>> +	union acc_dma_rsp_desc rsp;
>>> +	struct rte_bbdev_enc_op *op;
>>> +	int desc_idx = ((q->sw_ring_tail + *dequeued_descs) & q-
>>> sw_ring_wrap_mask);
>>> +
>>> +	desc = q->ring_addr + desc_idx;
>>> +	atom_desc.atom_hdr = __atomic_load_n((uint64_t *)desc,
>> __ATOMIC_RELAXED);
>>> +
>>> +	/* Check fdone bit. */
>>> +	if (!(atom_desc.rsp.val & ACC_FDONE))
>>> +		return -1;
>>> +
>>> +	rsp.val = atom_desc.rsp.val;
>>> +	rte_bbdev_log_debug("Resp. desc %p: %x", desc, rsp.val);
>>> +
>>> +	/* Dequeue. */
>>> +	op = desc->req.op_addr;
>>> +
>>> +	/* Clearing status, it will be set based on response. */
>>> +	op->status = 0;
>>> +	op->status |= rsp.input_err << RTE_BBDEV_DATA_ERROR;
>>> +	op->status |= rsp.dma_err << RTE_BBDEV_DRV_ERROR;
>>> +	op->status |= rsp.fcw_err << RTE_BBDEV_DRV_ERROR;
>>> +	op->status |= rsp.engine_hung << RTE_BBDEV_ENGINE_ERROR;
>>> +
>>> +	if (desc->req.last_desc_in_batch) {
>>> +		(*aq_dequeued)++;
>>> +		desc->req.last_desc_in_batch = 0;
>>> +	}
>>> +	desc->rsp.val = ACC_DMA_DESC_TYPE;
>>> +	desc->rsp.add_info_0 = 0; /* Reserved bits. */
>>> +	desc->rsp.add_info_1 = 0; /* Reserved bits. */
>>> +
>>> +	/* One op was successfully dequeued */
>>> +	ref_op[0] = op;
>>> +	(*dequeued_descs)++;
>>> +	(*dequeued_ops)++;
>>> +	return 1;
>>> +}
>>> +
>>>    /* Dequeue one LDPC encode operations from device in TB mode.
>>>     * That operation may cover multiple descriptors.
>>>     */
>>> @@ -3189,9 +3712,14 @@ vrb_dequeue_ldpc_enc(struct
>> rte_bbdev_queue_data *q_data,
>>>
>>>    	for (i = 0; i < avail; i++) {
>>>    		if (cbm == RTE_BBDEV_TRANSPORT_BLOCK)
>>> -			ret = vrb_dequeue_enc_one_op_tb(q,
>> &ops[dequeued_ops],
>>> -					&dequeued_ops, &aq_dequeued,
>>> -					&dequeued_descs, num);
>>> +			if (q->d->device_variant == VRB1_VARIANT)
>>> +				ret = vrb_dequeue_enc_one_op_tb(q,
>> &ops[dequeued_ops],
>>> +						&dequeued_ops,
>> &aq_dequeued,
>>> +						&dequeued_descs, num);
>>> +			else
>>> +				ret = vrb2_dequeue_ldpc_enc_one_op_tb(q,
>> &ops[dequeued_ops],
>>> +						&dequeued_ops,
>> &aq_dequeued,
>>> +						&dequeued_descs);
>>>    		else
>>>    			ret = vrb_dequeue_enc_one_op_cb(q,
>> &ops[dequeued_ops],
>>>    					&dequeued_ops, &aq_dequeued,
>>> @@ -3536,6 +4064,7 @@ vrb_bbdev_init(struct rte_bbdev *dev, struct
>> rte_pci_driver *drv)
>>>    	} else {
>>>    		d->device_variant = VRB2_VARIANT;
>>>    		d->queue_offset = vrb2_queue_offset;
>>> +		d->fcw_ld_fill = vrb2_fcw_ld_fill;
>>>    		d->num_qgroups = VRB2_NUM_QGRPS;
>>>    		d->num_aqs = VRB2_NUM_AQS;
>>>    		if (d->pf_device)
>>
>>
>> It looks like most (60%+) of the code in this patch could be removed if
>> duplication was avoided.
>>
>> Thanks,
>> Maxime
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 06/12] baseband/acc: refactor to allow unified driver extension
  2023-10-05 14:31           ` Maxime Coquelin
@ 2023-10-05 15:00             ` Chautru, Nicolas
  0 siblings, 0 replies; 42+ messages in thread
From: Chautru, Nicolas @ 2023-10-05 15:00 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan

Hi Maxime, 

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Thursday, October 5, 2023 7:31 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v3 06/12] baseband/acc: refactor to allow unified driver
> extension
> 
> 
> 
> On 10/4/23 23:28, Chautru, Nicolas wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Wednesday, October 4, 2023 12:36 AM
> >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> >> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
> Hernan
> >> <hernan.vargas@intel.com>
> >> Subject: Re: [PATCH v3 06/12] baseband/acc: refactor to allow unified
> >> driver extension
> >>
> >>
> >>
> >> On 10/3/23 20:54, Chautru, Nicolas wrote:
> >>> Hi Maxime,
> >>>
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>> Sent: Tuesday, October 3, 2023 6:15 AM
> >>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> >>>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
> >> Hernan
> >>>> <hernan.vargas@intel.com>
> >>>> Subject: Re: [PATCH v3 06/12] baseband/acc: refactor to allow
> >>>> unified driver extension
> >>>>
> >>>> Thanks for doing the split, that will ease review.
> >>>>
> >>>> On 9/29/23 18:35, Nicolas Chautru wrote:
> >>>>> Adding a few functions and common code prior to extending the VRB
> >>>>> driver.
> >>>>>
> >>>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> >>>>> ---
> >>>>>     drivers/baseband/acc/acc_common.h     | 164
> >> +++++++++++++++++++++++-
> >>>> --
> >>>>>     drivers/baseband/acc/rte_acc100_pmd.c |   4 +-
> >>>>>     drivers/baseband/acc/rte_vrb_pmd.c    |  62 +++++-----
> >>>>>     3 files changed, 184 insertions(+), 46 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/baseband/acc/acc_common.h
> >>>>> b/drivers/baseband/acc/acc_common.h
> >>>>> index 788abf1a3c..89893eae43 100644
> >>>>> --- a/drivers/baseband/acc/acc_common.h
> >>>>> +++ b/drivers/baseband/acc/acc_common.h
> >>>>> @@ -18,6 +18,7 @@
> >>>>>     #define ACC_DMA_BLKID_OUT_HARQ      3
> >>>>>     #define ACC_DMA_BLKID_IN_HARQ       3
> >>>>>     #define ACC_DMA_BLKID_IN_MLD_R      3
> >>>>> +#define ACC_DMA_BLKID_DEWIN_IN      3
> >>>>>
> >>>>>     /* Values used in filling in decode FCWs */
> >>>>>     #define ACC_FCW_TD_VER              1
> >>>>> @@ -103,6 +104,9 @@
> >>>>>     #define ACC_MAX_NUM_QGRPS              32
> >>>>>     #define ACC_RING_SIZE_GRANULARITY      64
> >>>>>     #define ACC_MAX_FCW_SIZE              128
> >>>>> +#define ACC_IQ_SIZE                    4
> >>>>> +
> >>>>> +#define ACC_FCW_FFT_BLEN_3             28
> >>>>>
> >>>>>     /* Constants from K0 computation from 3GPP 38.212 Table 5.4.2.1-2
> */
> >>>>>     #define ACC_N_ZC_1 66 /* N = 66 Zc for BG 1 */ @@ -132,6 +136,17
> @@
> >>>>>     #define ACC_LIM_21 14 /* 0.21 */
> >>>>>     #define ACC_LIM_31 20 /* 0.31 */
> >>>>>     #define ACC_MAX_E (128 * 1024 - 2)
> >>>>> +#define ACC_MAX_CS 12
> >>>>> +
> >>>>> +#define ACC100_VARIANT          0
> >>>>> +#define VRB1_VARIANT		2
> >>>>> +#define VRB2_VARIANT		3
> >>>>> +
> >>>>> +/* Queue Index Hierarchy */
> >>>>> +#define VRB1_GRP_ID_SHIFT    10
> >>>>> +#define VRB1_VF_ID_SHIFT     4
> >>>>> +#define VRB2_GRP_ID_SHIFT    12
> >>>>> +#define VRB2_VF_ID_SHIFT     6
> >>>>>
> >>>>>     /* Helper macro for logging */
> >>>>>     #define rte_acc_log(level, fmt, ...) \ @@ -332,6 +347,37 @@
> >>>>> struct __rte_packed acc_fcw_fft {
> >>>>>     		res:19;
> >>>>>     };
> >>>>>
> >>>>> +/* FFT Frame Control Word. */
> >>>>> +struct __rte_packed acc_fcw_fft_3 {
> >>>>> +	uint32_t in_frame_size:16,
> >>>>> +		leading_pad_size:16;
> >>>>> +	uint32_t out_frame_size:16,
> >>>>> +		leading_depad_size:16;
> >>>>> +	uint32_t cs_window_sel;
> >>>>> +	uint32_t cs_window_sel2:16,
> >>>>> +		cs_enable_bmap:16;
> >>>>> +	uint32_t num_antennas:8,
> >>>>> +		idft_size:8,
> >>>>> +		dft_size:8,
> >>>>> +		cs_offset:8;
> >>>>> +	uint32_t idft_shift:8,
> >>>>> +		dft_shift:8,
> >>>>> +		cs_multiplier:16;
> >>>>> +	uint32_t bypass:2,
> >>>>> +		fp16_in:1,
> >>>>> +		fp16_out:1,
> >>>>> +		exp_adj:4,
> >>>>> +		power_shift:4,
> >>>>> +		power_en:1,
> >>>>> +		enable_dewin:1,
> >>>>> +		freq_resample_mode:2,
> >>>>> +		depad_output_size:16;
> >>>>> +	uint16_t cs_theta_0[ACC_MAX_CS];
> >>>>> +	uint32_t cs_theta_d[ACC_MAX_CS];
> >>>>> +	int8_t cs_time_offset[ACC_MAX_CS]; };
> >>>>> +
> >>>>> +
> >>>>>     /* MLD-TS Frame Control Word */
> >>>>>     struct __rte_packed acc_fcw_mldts {
> >>>>>     	uint32_t fcw_version:4,
> >>>>> @@ -473,14 +519,14 @@ union acc_info_ring_data {
> >>>>>     		uint16_t valid: 1;
> >>>>>     	};
> >>>>>     	struct {
> >>>>> -		uint32_t aq_id_3: 6;
> >>>>> -		uint32_t qg_id_3: 5;
> >>>>> -		uint32_t vf_id_3: 6;
> >>>>> -		uint32_t int_nb_3: 6;
> >>>>> -		uint32_t msi_0_3: 1;
> >>>>> -		uint32_t vf2pf_3: 6;
> >>>>> -		uint32_t loop_3: 1;
> >>>>> -		uint32_t valid_3: 1;
> >>>>> +		uint32_t aq_id_vrb2: 6;
> >>>>> +		uint32_t qg_id_vrb2: 5;
> >>>>> +		uint32_t vf_id_vrb2: 6;
> >>>>> +		uint32_t int_nb_vrb2: 6;
> >>>>> +		uint32_t msi_0_vrb2: 1;
> >>>>> +		uint32_t vf2pf_vrb2: 6;
> >>>>> +		uint32_t loop_vrb2: 1;
> >>>>> +		uint32_t valid_vrb2: 1;
> >>>>>     	};
> >>>>>     } __rte_packed;
> >>>>>
> >>>>> @@ -761,22 +807,105 @@ alloc_sw_rings_min_mem(struct rte_bbdev
> >> *dev,
> >>>> struct acc_device *d,
> >>>>>     	free_base_addresses(base_addrs, i);
> >>>>>     }
> >>>>>
> >>>>> +/* Wrapper to provide VF index from ring data. */ static inline
> >>>>> +uint16_t vf_from_ring(const union acc_info_ring_data ring_data,
> >>>>> +uint16_t device_variant) {
> >>>>
> >>>> curly braces on a new line.
> >>>
> >>> Thanks.
> >>>
> >>>>
> >>>>> +	if (device_variant == VRB2_VARIANT)
> >>>>> +		return ring_data.vf_id_vrb2;
> >>>>> +	else
> >>>>> +		return ring_data.vf_id;
> >>>>> +}
> >>>>> +
> >>>>> +/* Wrapper to provide QG index from ring data. */ static inline
> >>>>> +uint16_t qg_from_ring(const union acc_info_ring_data ring_data,
> >>>>> +uint16_t device_variant) {
> >>>>> +	if (device_variant == VRB2_VARIANT)
> >>>>> +		return ring_data.qg_id_vrb2;
> >>>>> +	else
> >>>>> +		return ring_data.qg_id;
> >>>>> +}
> >>>>> +
> >>>>> +/* Wrapper to provide AQ index from ring data. */ static inline
> >>>>> +uint16_t aq_from_ring(const union acc_info_ring_data ring_data,
> >>>>> +uint16_t device_variant) {
> >>>>> +	if (device_variant == VRB2_VARIANT)
> >>>>> +		return ring_data.aq_id_vrb2;
> >>>>> +	else
> >>>>> +		return ring_data.aq_id;
> >>>>> +}
> >>>>> +
> >>>>> +/* Wrapper to provide int index from ring data. */ static inline
> >>>>> +uint16_t int_from_ring(const union acc_info_ring_data ring_data,
> >>>>> +uint16_t device_variant) {
> >>>>> +	if (device_variant == VRB2_VARIANT)
> >>>>> +		return ring_data.int_nb_vrb2;
> >>>>> +	else
> >>>>> +		return ring_data.int_nb;
> >>>>> +}
> >>>>> +
> >>>>> +/* Wrapper to provide queue index from group and aq index. */
> >>>>> +static inline int queue_index(uint16_t group_idx, uint16_t
> >>>>> +aq_idx, uint16_t
> >>>>> +device_variant) {
> >>>>> +	if (device_variant == VRB2_VARIANT)
> >>>>> +		return (group_idx << VRB2_GRP_ID_SHIFT) + aq_idx;
> >>>>> +	else
> >>>>> +		return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx; }
> >>>>> +
> >>>>> +/* Wrapper to provide queue group from queue index. */ static
> >>>>> +inline int qg_from_q(uint32_t q_idx, uint16_t device_variant) {
> >>>>> +	if (device_variant == VRB2_VARIANT)
> >>>>> +		return (q_idx >> VRB2_GRP_ID_SHIFT) & 0x1F;
> >>>>> +	else
> >>>>> +		return (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF; }
> >>>>> +
> >>>>> +/* Wrapper to provide vf from queue index. */ static inline
> >>>>> +int32_t vf_from_q(uint32_t q_idx, uint16_t device_variant) {
> >>>>> +	if (device_variant == VRB2_VARIANT)
> >>>>> +		return (q_idx >> VRB2_VF_ID_SHIFT)  & 0x3F;
> >>>>> +	else
> >>>>> +		return (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F; }
> >>>>> +
> >>>>> +/* Wrapper to provide aq index from queue index. */ static inline
> >>>>> +int32_t aq_from_q(uint32_t q_idx, uint16_t device_variant) {
> >>>>> +	if (device_variant == VRB2_VARIANT)
> >>>>> +		return q_idx & 0x3F;
> >>>>> +	else
> >>>>> +		return q_idx & 0xF;
> >>>>> +}
> >>>>> +
> >>>>> +/* Wrapper to set VF index in ring data. */ static inline int32_t
> >>>>> +set_vf_in_ring(volatile union acc_info_ring_data *ring_data,
> >>>>> +		uint16_t device_variant, uint16_t value) {
> >>>>> +	if (device_variant == VRB2_VARIANT)
> >>>>> +		return ring_data->vf_id_vrb2 = value;
> >>>>> +	else
> >>>>> +		return ring_data->vf_id = value; }
> >>>>> +
> >>>>>     /*
> >>>>>      * Find queue_id of a device queue based on details from the Info Ring.
> >>>>>      * If a queue isn't found UINT16_MAX is returned.
> >>>>>      */
> >>>>>     static inline uint16_t
> >>>>>     get_queue_id_from_ring_info(struct rte_bbdev_data *data,
> >>>>> -		const union acc_info_ring_data ring_data)
> >>>>> +		const union acc_info_ring_data ring_data, uint16_t
> >>>> device_variant)
> >>>>
> >>>> As I suggested on v2:
> >>>>
> >>>> get_queue_id_from_ring_info(struct rte_bbdev_data *data,
> >>>> 	const union acc_info_ring_data ring_data) {
> >>>> 	struct acc_device *d = data->dev_private;
> >>>>
> >>>> 	...
> >>>>
> >>>> 	if (acc_q != NULL && acc_q->aq_id == aq_from_ring(d, ring_data) &&
> >>>> ...
> >>>>
> >>>> }
> >>>>
> >>>> with
> >>>>
> >>>> /* Wrapper to provide AQ index from ring data. */ tatic inline
> >>>> uint16_t aq_from_ring(struct acc_device *d, const union
> >>>> acc_info_ring_data ring_data) {
> >>>> 	if (d->device_variant == VRB2_VARIANT)
> >>>> 		return ring_data.aq_id_vrb2;
> >>>> 	else
> >>>> 		return ring_data.aq_id;
> >>>> }
> >>>>
> >>>
> >>> I will change the get_queue_id_from_ring_info() to have a smaller
> >>> prototype but I don’t plan on changing the other new underlying funs
> >>> to use dev instead of the variant in prototype, I don’t see a reason
> >>> to as these only need this very member.
> >>
> >> IMHO, reason is it cost nothing and is more future proof.
> >
> > Thanks, on that very case I believe it the prototype is cleaner with the device
> variant. I don’t see future proof concern.
> >
> >>
> >> Also, my initial idea was to have an intermediate representation, like:
> >>
> >> struct acc_queue_info { // Not sure about the name
> >> 	uint16_t vf_id;
> >> 	uint8_t qgrp_id;
> >> 	uint16_t aq_id;
> >> };
> >>
> >> Then we have a single callback for each variant
> >>
> >> static void
> >> vrb1_ring_data_to_queue_info(const union acc_info_ring_data ring_data,
> >> 		struct acc_queue_info *queue_info)
> >> {
> >> 	queue_info->vf_id = ring_data.vf_id;
> >> 	queue_info->qgrp_id = ...
> >> }
> >>
> >> static void
> >> vrb2_ring_data_to_queue_info(const union acc_info_ring_data ring_data,
> >> 		struct acc_queue_info *queue_info)
> >> {
> >>
> >> }
> >>
> >> The acc_queue_info struct can also be used in struct acc_queue, so we
> >> use same format everywhere.
> >>
> >> I think it will be less verbose, and quicker to add new variants
> >> without risking to miss adding "else if (d->device_variant == VRBx_VARIANT)"
> >> anywhere.
> >>
> >> What do you think?
> >
> > I think both would work. The intermediate structure may be a bit artificial, and
> it would have different members when getting info from queue or ring (ie. the
> int index). Also there is no reciprocal function, ie we set only the VF into the ring.
> And there is a location where we only need one of information not all of the
> other members.
> > Again both are okay to me without super strong preference, so for now I
> would suggest to keep as is.
> 
> Ok, but please pass dev and not variant directly in the helpers.

I really think it is way proper to keep the actual variant in this function prototype and not providing the full dev. This is what the function is about. Kindly look at v4. 

> 
> >>
> >>>
> >>>>>     {
> >>>>>     	uint16_t queue_id;
> >>>>> +	struct acc_queue *acc_q;
> >>>>>
> >>>>>     	for (queue_id = 0; queue_id < data->num_queues; ++queue_id)
> {
> >>>>> -		struct acc_queue *acc_q =
> >>>>> -				data->queues[queue_id].queue_private;
> >>>>> -		if (acc_q != NULL && acc_q->aq_id == ring_data.aq_id &&
> >>>>> -				acc_q->qgrp_id == ring_data.qg_id &&
> >>>>> -				acc_q->vf_id == ring_data.vf_id)
> >>>>> +		acc_q = data->queues[queue_id].queue_private;
> >>>>> +
> >>>>> +		if (acc_q != NULL && acc_q->aq_id ==
> >>>> aq_from_ring(ring_data, device_variant) &&
> >>>>> +				acc_q->qgrp_id == qg_from_ring(ring_data,
> >>>> device_variant) &&
> >>>>> +				acc_q->vf_id == vf_from_ring(ring_data,
> >>>> device_variant))
> >>>>>     			return queue_id;
> >>>>>     	}
> >>>>>
> >>>>> @@ -1438,4 +1567,11 @@ get_num_cbs_in_tb_ldpc_enc(struct
> >>>> rte_bbdev_op_ldpc_enc *ldpc_enc)
> >>>>>     	return cbs_in_tb;
> >>>>>     }
> >>>>>
> >>>>> +static inline void
> >>>>> +acc_reg_fast_write(struct acc_device *d, uint32_t offset,
> >>>>> +uint32_t
> >>>>> +value) {
> >>>>> +	void *reg_addr = RTE_PTR_ADD(d->mmio_base, offset);
> >>>>> +	mmio_write(reg_addr, value);
> >>>>> +}
> >>>>> +
> >>>>>     #endif /* _ACC_COMMON_H_ */
> >>>>> diff --git a/drivers/baseband/acc/rte_acc100_pmd.c
> >>>>> b/drivers/baseband/acc/rte_acc100_pmd.c
> >>>>> index 5362d39c30..7f8d05b5a9 100644
> >>>>> --- a/drivers/baseband/acc/rte_acc100_pmd.c
> >>>>> +++ b/drivers/baseband/acc/rte_acc100_pmd.c
> >>>>> @@ -294,7 +294,7 @@ acc100_pf_interrupt_handler(struct rte_bbdev
> >> *dev)
> >>>>>     		case ACC100_PF_INT_DMA_UL5G_DESC_IRQ:
> >>>>>     		case ACC100_PF_INT_DMA_DL5G_DESC_IRQ:
> >>>>>     			deq_intr_det.queue_id =
> >>>> get_queue_id_from_ring_info(
> >>>>> -					dev->data, *ring_data);
> >>>>> +					dev->data, *ring_data, acc100_dev-
> >>>>> device_variant);
> >>>>>     			if (deq_intr_det.queue_id == UINT16_MAX) {
> >>>>>     				rte_bbdev_log(ERR,
> >>>>>     						"Couldn't find queue:
> aq_id:
> >>>> %u, qg_id: %u, vf_id: %u", @@
> >>>>> -348,7 +348,7 @@ acc100_vf_interrupt_handler(struct rte_bbdev *dev)
> >>>>>     			 */
> >>>>>     			ring_data->vf_id = 0;
> >>>>>     			deq_intr_det.queue_id =
> >>>> get_queue_id_from_ring_info(
> >>>>> -					dev->data, *ring_data);
> >>>>> +					dev->data, *ring_data, acc100_dev-
> >>>>> device_variant);
> >>>>>     			if (deq_intr_det.queue_id == UINT16_MAX) {
> >>>>>     				rte_bbdev_log(ERR,
> >>>>>     						"Couldn't find queue:
> aq_id:
> >>>> %u, qg_id: %u", diff --git
> >>>>> a/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>> b/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>> index a1de012b40..c89c26c59a 100644
> >>>>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>> @@ -341,17 +341,18 @@ static inline void
> >>>>>     vrb_check_ir(struct acc_device *acc_dev)
> >>>>>     {
> >>>>>     	volatile union acc_info_ring_data *ring_data;
> >>>>> -	uint16_t info_ring_head = acc_dev->info_ring_head;
> >>>>> +	uint16_t info_ring_head = acc_dev->info_ring_head, int_nb;
> >>>>>     	if (unlikely(acc_dev->info_ring == NULL))
> >>>>>     		return;
> >>>>>
> >>>>>     	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
> >>>>> ACC_INFO_RING_MASK);
> >>>>>
> >>>>>     	while (ring_data->valid) {
> >>>>> -		if ((ring_data->int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
> >>>>> -				ring_data->int_nb >
> >>>> ACC_PF_INT_DMA_MLD_DESC_IRQ)) {
> >>>>> +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
> >>>>> +		if ((int_nb < ACC_PF_INT_DMA_DL_DESC_IRQ) || (
> >>>>> +				int_nb > ACC_PF_INT_DMA_MLD_DESC_IRQ))
> >>>> {
> >>>>>     			rte_bbdev_log(WARNING, "InfoRing: ITR:%d
> >>>> Info:0x%x",
> >>>>> -					ring_data->int_nb, ring_data-
> >>>>> detailed_info);
> >>>>> +					int_nb, ring_data->detailed_info);
> >>>>>     			/* Initialize Info Ring entry and move forward.
> */
> >>>>>     			ring_data->val = 0;
> >>>>>     		}
> >>>>> @@ -368,16 +369,21 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >>>>>     	struct acc_device *acc_dev = dev->data->dev_private;
> >>>>>     	volatile union acc_info_ring_data *ring_data;
> >>>>>     	struct acc_deq_intr_details deq_intr_det;
> >>>>> +	uint16_t vf_id, aq_id, qg_id, int_nb;
> >>>>>
> >>>>>     	ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
> >>>>> ACC_INFO_RING_MASK);
> >>>>>
> >>>>>     	while (ring_data->valid) {
> >>>>> +		vf_id = vf_from_ring(*ring_data, acc_dev->device_variant);
> >>>>> +		aq_id = aq_from_ring(*ring_data, acc_dev->device_variant);
> >>>>> +		qg_id = qg_from_ring(*ring_data, acc_dev->device_variant);
> >>>>> +		int_nb = int_from_ring(*ring_data, acc_dev->device_variant);
> >>>>>     		if (acc_dev->pf_device) {
> >>>>>     			rte_bbdev_log_debug(
> >>>>> -					"VRB1 PF Interrupt received, Info Ring
> >>>> data: 0x%x -> %d",
> >>>>> -					ring_data->val, ring_data->int_nb);
> >>>>> +					"PF Interrupt received, Info Ring data:
> >>>> 0x%x -> %d",
> >>>>> +					ring_data->val, int_nb);
> >>>>>
> >>>>> -			switch (ring_data->int_nb) {
> >>>>> +			switch (int_nb) {
> >>>>>     			case ACC_PF_INT_DMA_DL_DESC_IRQ:
> >>>>>     			case ACC_PF_INT_DMA_UL_DESC_IRQ:
> >>>>>     			case ACC_PF_INT_DMA_FFT_DESC_IRQ:
> >>>>> @@ -385,13 +391,11 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >>>>>     			case ACC_PF_INT_DMA_DL5G_DESC_IRQ:
> >>>>>     			case ACC_PF_INT_DMA_MLD_DESC_IRQ:
> >>>>>     				deq_intr_det.queue_id =
> >>>> get_queue_id_from_ring_info(
> >>>>> -						dev->data, *ring_data);
> >>>>> +						dev->data, *ring_data,
> >>>> acc_dev->device_variant);
> >>>>>     				if (deq_intr_det.queue_id ==
> UINT16_MAX) {
> >>>>>     					rte_bbdev_log(ERR,
> >>>>>     							"Couldn't find
> queue:
> >>>> aq_id: %u, qg_id: %u, vf_id: %u",
> >>>>> -							ring_data->aq_id,
> >>>>> -							ring_data->qg_id,
> >>>>> -							ring_data->vf_id);
> >>>>> +							aq_id, qg_id, vf_id);
> >>>>>     					return;
> >>>>>     				}
> >>>>>     				rte_bbdev_pmd_callback_process(dev,
> >>>>> @@ -403,9 +407,9 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >>>>>     			}
> >>>>>     		} else {
> >>>>>     			rte_bbdev_log_debug(
> >>>>> -					"VRB1 VF Interrupt received, Info Ring
> >>>> data: 0x%x\n",
> >>>>> +					"VRB VF Interrupt received, Info Ring
> >>>> data: 0x%x\n",
> >>>>>     					ring_data->val);
> >>>>> -			switch (ring_data->int_nb) {
> >>>>> +			switch (int_nb) {
> >>>>>     			case ACC_VF_INT_DMA_DL_DESC_IRQ:
> >>>>>     			case ACC_VF_INT_DMA_UL_DESC_IRQ:
> >>>>>     			case ACC_VF_INT_DMA_FFT_DESC_IRQ:
> >>>>> @@ -413,14 +417,13 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >>>>>     			case ACC_VF_INT_DMA_DL5G_DESC_IRQ:
> >>>>>     			case ACC_VF_INT_DMA_MLD_DESC_IRQ:
> >>>>>     				/* VFs are not aware of their vf_id - it's
> set to
> >>>> 0.  */
> >>>>> -				ring_data->vf_id = 0;
> >>>>> +				set_vf_in_ring(ring_data, acc_dev-
> >>>>> device_variant, 0);
> >>>>>     				deq_intr_det.queue_id =
> >>>> get_queue_id_from_ring_info(
> >>>>> -						dev->data, *ring_data);
> >>>>> +						dev->data, *ring_data,
> >>>> acc_dev->device_variant);
> >>>>>     				if (deq_intr_det.queue_id ==
> UINT16_MAX) {
> >>>>>     					rte_bbdev_log(ERR,
> >>>>>     							"Couldn't find
> queue:
> >>>> aq_id: %u, qg_id: %u",
> >>>>> -							ring_data->aq_id,
> >>>>> -							ring_data->qg_id);
> >>>>> +							aq_id, qg_id);
> >>>>>     					return;
> >>>>>     				}
> >>>>>     				rte_bbdev_pmd_callback_process(dev,
> >>>>> @@ -435,8 +438,7 @@ vrb_dev_interrupt_handler(void *cb_arg)
> >>>>>     		/* Initialize Info Ring entry and move forward. */
> >>>>>     		ring_data->val = 0;
> >>>>>     		++acc_dev->info_ring_head;
> >>>>> -		ring_data = acc_dev->info_ring +
> >>>>> -				(acc_dev->info_ring_head &
> >>>> ACC_INFO_RING_MASK);
> >>>>> +		ring_data = acc_dev->info_ring + (acc_dev->info_ring_head &
> >>>>> +ACC_INFO_RING_MASK);
> >>>>>     	}
> >>>>>     }
> >>>>>
> >>>>> @@ -556,8 +558,7 @@ vrb_setup_queues(struct rte_bbdev *dev,
> >>>>> uint16_t num_queues, int socket_id)
> >>>>>
> >>>>>     	/* Configure tail pointer for use when SDONE enabled. */
> >>>>>     	if (d->tail_ptrs == NULL)
> >>>>> -		d->tail_ptrs = rte_zmalloc_socket(
> >>>>> -				dev->device->driver->name,
> >>>>> +		d->tail_ptrs = rte_zmalloc_socket(dev->device->driver->name,
> >>>>>     				VRB_MAX_QGRPS * VRB_MAX_AQS *
> >>>> sizeof(uint32_t),
> >>>>>     				RTE_CACHE_LINE_SIZE, socket_id);
> >>>>>     	if (d->tail_ptrs == NULL) {
> >>>>> @@ -783,7 +784,7 @@ vrb_find_free_queue_idx(struct rte_bbdev *dev,
> >>>>>     			/* Mark the Queue as assigned. */
> >>>>>     			d->q_assigned_bit_map[group_idx] |= (1ULL <<
> >>>> aq_idx);
> >>>>>     			/* Report the AQ Index. */
> >>>>> -			return (group_idx << VRB1_GRP_ID_SHIFT) + aq_idx;
> >>>>> +			return queue_index(group_idx, aq_idx, d-
> >>>>> device_variant);
> >>>>>     		}
> >>>>>     	}
> >>>>>     	rte_bbdev_log(INFO, "Failed to find free queue on %s,
> >>>>> priority %u", @@ -922,9 +923,10 @@ vrb_queue_setup(struct
> >>>>> rte_bbdev *dev, uint16_t
> >>>> queue_id,
> >>>>>     		}
> >>>>>     	}
> >>>>>
> >>>>> -	q->qgrp_id = (q_idx >> VRB1_GRP_ID_SHIFT) & 0xF;
> >>>>> -	q->vf_id = (q_idx >> VRB1_VF_ID_SHIFT)  & 0x3F;
> >>>>> -	q->aq_id = q_idx & 0xF;
> >>>>> +	q->qgrp_id = qg_from_q(q_idx, d->device_variant);
> >>>>> +	q->vf_id = vf_from_q(q_idx, d->device_variant);
> >>>>> +	q->aq_id = aq_from_q(q_idx, d->device_variant);
> >>>>> +
> >>>>>     	q->aq_depth = 0;
> >>>>>     	if (conf->op_type ==  RTE_BBDEV_OP_TURBO_DEC)
> >>>>>     		q->aq_depth = (1 << d-
> >acc_conf.q_ul_4g.aq_depth_log2);
> >>>>> @@ -1311,7 +1313,7 @@ vrb_fcw_td_fill(const struct
> >>>>> rte_bbdev_dec_op
> >>>> *op, struct acc_fcw_td *fcw)
> >>>>>     		fcw->bypass_teq = 0;
> >>>>>     	}
> >>>>>
> >>>>> -	fcw->code_block_mode = 1; /* FIXME */
> >>>>> +	fcw->code_block_mode = 1;
> >>>>
> >>>> Could you remind me what was the issue?
> >>>
> >>> Historically there was the intention to use a difference format
> >>> option in the
> >> fcw to help with the TB mode but that is not considered anymore.
> >>
> >> Ok.
> >>
> >>>
> >>>>
> >>>>>     	fcw->turbo_crc_type = check_bit(op->turbo_dec.op_flags,
> >>>>>     			RTE_BBDEV_TURBO_CRC_TYPE_24B);
> >>>>>
> >>>>> @@ -1471,8 +1473,8 @@ vrb_dma_desc_td_fill(struct
> rte_bbdev_dec_op
> >>>> *op,
> >>>>>     	if (op->turbo_dec.code_block_mode ==
> >>>> RTE_BBDEV_TRANSPORT_BLOCK) {
> >>>>>     		k = op->turbo_dec.tb_params.k_pos;
> >>>>>     		e = (r < op->turbo_dec.tb_params.cab)
> >>>>> -			? op->turbo_dec.tb_params.ea
> >>>>> -			: op->turbo_dec.tb_params.eb;
> >>>>> +				? op->turbo_dec.tb_params.ea
> >>>>> +				: op->turbo_dec.tb_params.eb;
> >>>>>     	} else {
> >>>>>     		k = op->turbo_dec.cb_params.k;
> >>>>>     		e = op->turbo_dec.cb_params.e; @@ -1726,7 +1728,7
> @@
> >>>>> vrb_dma_desc_ld_update(struct
> >>>> rte_bbdev_dec_op *op,
> >>>>>     	desc->op_addr = op;
> >>>>>     }
> >>>>>
> >>>>> -/* Enqueue one encode operations for device in CB mode */
> >>>>> +/* Enqueue one encode operations for device in CB mode. */
> >>>>>     static inline int
> >>>>>     enqueue_enc_one_op_cb(struct acc_queue *q, struct
> >>>>> rte_bbdev_enc_op
> >>>> *op,
> >>>>>     		uint16_t total_enqueued_cbs) @@ -2263,7 +2265,7
> @@
> >>>>> vrb_enqueue_ldpc_dec_one_op_tb(struct
> >>>> acc_queue *q, struct rte_bbdev_dec_op *op,
> >>>>>     	return current_enqueued_cbs;
> >>>>>     }
> >>>>>
> >>>>> -/* Enqueue one decode operations for device in TB mode */
> >>>>> +/* Enqueue one decode operations for device in TB mode. */
> >>>>>     static inline int
> >>>>>     enqueue_dec_one_op_tb(struct acc_queue *q, struct
> >>>>> rte_bbdev_dec_op
> >>>> *op,
> >>>>>     		uint16_t total_enqueued_cbs, uint8_t cbs_in_tb)
> >>>
> >


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
  2023-10-05 14:34           ` Maxime Coquelin
@ 2023-10-05 17:59             ` Chautru, Nicolas
  2023-10-06 12:05               ` Maxime Coquelin
  0 siblings, 1 reply; 42+ messages in thread
From: Chautru, Nicolas @ 2023-10-05 17:59 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan

Hi Maxime, 

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Thursday, October 5, 2023 7:35 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
> 
> 
> 
> On 10/4/23 23:18, Chautru, Nicolas wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Wednesday, October 4, 2023 12:11 AM
> >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> >> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
> Hernan
> >> <hernan.vargas@intel.com>
> >> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2
> >> variant
> >>
> >>
> >>
> >> On 10/3/23 20:20, Chautru, Nicolas wrote:
> >>> Hi Maxime,
> >>>
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>> Sent: Tuesday, October 3, 2023 7:37 AM
> >>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> >>>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
> >> Hernan
> >>>> <hernan.vargas@intel.com>
> >>>> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2
> >>>> variant
> >>>>
> >>>>
> >>>>
> >>>> On 9/29/23 18:35, Nicolas Chautru wrote:
> >>>>> Support for the FFT the processing specific to the
> >>>>> VRB2 variant.
> >>>>>
> >>>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> >>>>> ---
> >>>>>     drivers/baseband/acc/rte_vrb_pmd.c | 132
> >>>> ++++++++++++++++++++++++++++-
> >>>>>     1 file changed, 128 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>> b/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>> index 93add82947..ce4b90d8e7 100644
> >>>>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>> @@ -903,6 +903,9 @@ vrb_queue_setup(struct rte_bbdev *dev,
> >>>>> uint16_t
> >>>> queue_id,
> >>>>>     			ACC_FCW_LD_BLEN : (conf->op_type ==
> >>>> RTE_BBDEV_OP_FFT ?
> >>>>>     			ACC_FCW_FFT_BLEN :
> ACC_FCW_MLDTS_BLEN))));
> >>>>>
> >>>>> +	if ((q->d->device_variant == VRB2_VARIANT) && (conf->op_type ==
> >>>> RTE_BBDEV_OP_FFT))
> >>>>> +		fcw_len = ACC_FCW_FFT_BLEN_3;
> >>>>> +
> >>>>>     	for (desc_idx = 0; desc_idx < d->sw_ring_max_depth;
> desc_idx++) {
> >>>>>     		desc = q->ring_addr + desc_idx;
> >>>>>     		desc->req.word0 = ACC_DMA_DESC_TYPE; @@ -
> 1323,6
> >>>> +1326,24 @@
> >>>>> vrb_dev_info_get(struct rte_bbdev *dev, struct
> >>>>> rte_bbdev_driver_info
> >>>> *dev_info)
> >>>>>     			.num_buffers_soft_out = 0,
> >>>>>     			}
> >>>>>     		},
> >>>>> +		{
> >>>>> +			.type	= RTE_BBDEV_OP_FFT,
> >>>>> +			.cap.fft = {
> >>>>> +				.capability_flags =
> >>>>> +
> >>>> 	RTE_BBDEV_FFT_WINDOWING |
> >>>>> +
> >>>> 	RTE_BBDEV_FFT_CS_ADJUSTMENT |
> >>>>> +
> >>>> 	RTE_BBDEV_FFT_DFT_BYPASS |
> >>>>> +
> >>>> 	RTE_BBDEV_FFT_IDFT_BYPASS |
> >>>>> +						RTE_BBDEV_FFT_FP16_INPUT
> >>>> |
> >>>>> +
> >>>> 	RTE_BBDEV_FFT_FP16_OUTPUT |
> >>>>> +
> >>>> 	RTE_BBDEV_FFT_POWER_MEAS |
> >>>>> +
> >>>> 	RTE_BBDEV_FFT_WINDOWING_BYPASS,
> >>>>> +				.num_buffers_src =
> >>>>> +						1,
> >>>>> +				.num_buffers_dst =
> >>>>> +						1,
> >>>>> +			}
> >>>>> +		},
> >>>>>     		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
> >>>>>     	};
> >>>>>
> >>>>> @@ -3849,6 +3870,47 @@ vrb1_fcw_fft_fill(struct rte_bbdev_fft_op
> >>>>> *op,
> >>>> struct acc_fcw_fft *fcw)
> >>>>>     		fcw->bypass = 0;
> >>>>>     }
> >>>>>
> >>>>> +/* Fill in a frame control word for FFT processing. */ static
> >>>>> +inline void vrb2_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct
> >>>>> +acc_fcw_fft_3 *fcw) {
> >>>>> +	fcw->in_frame_size = op->fft.input_sequence_size;
> >>>>> +	fcw->leading_pad_size = op->fft.input_leading_padding;
> >>>>> +	fcw->out_frame_size = op->fft.output_sequence_size;
> >>>>> +	fcw->leading_depad_size = op->fft.output_leading_depadding;
> >>>>> +	fcw->cs_window_sel = op->fft.window_index[0] +
> >>>>> +			(op->fft.window_index[1] << 8) +
> >>>>> +			(op->fft.window_index[2] << 16) +
> >>>>> +			(op->fft.window_index[3] << 24);
> >>>>> +	fcw->cs_window_sel2 = op->fft.window_index[4] +
> >>>>> +			(op->fft.window_index[5] << 8);
> >>>>> +	fcw->cs_enable_bmap = op->fft.cs_bitmap;
> >>>>> +	fcw->num_antennas = op->fft.num_antennas_log2;
> >>>>> +	fcw->idft_size = op->fft.idft_log2;
> >>>>> +	fcw->dft_size = op->fft.dft_log2;
> >>>>> +	fcw->cs_offset = op->fft.cs_time_adjustment;
> >>>>> +	fcw->idft_shift = op->fft.idft_shift;
> >>>>> +	fcw->dft_shift = op->fft.dft_shift;
> >>>>> +	fcw->cs_multiplier = op->fft.ncs_reciprocal;
> >>>>> +	fcw->power_shift = op->fft.power_shift; > +	fcw->exp_adj = op-
> >>>>> fft.fp16_exp_adjust;
> >>>>> +	fcw->fp16_in = check_bit(op->fft.op_flags,
> >>>> RTE_BBDEV_FFT_FP16_INPUT);
> >>>>> +	fcw->fp16_out = check_bit(op->fft.op_flags,
> >>>> RTE_BBDEV_FFT_FP16_OUTPUT);
> >>>>> +	fcw->power_en = check_bit(op->fft.op_flags,
> >>>> RTE_BBDEV_FFT_POWER_MEAS);
> >>>>> +	if (check_bit(op->fft.op_flags,
> >>>>> +			RTE_BBDEV_FFT_IDFT_BYPASS)) {
> >>>>> +		if (check_bit(op->fft.op_flags,
> >>>>> +				RTE_BBDEV_FFT_WINDOWING_BYPASS))
> >>>>> +			fcw->bypass = 2;
> >>>>> +		else
> >>>>> +			fcw->bypass = 1;
> >>>>> +	} else if (check_bit(op->fft.op_flags,
> >>>>> +			RTE_BBDEV_FFT_DFT_BYPASS))
> >>>>> +		fcw->bypass = 3;
> >>>>> +	else
> >>>>> +		fcw->bypass = 0;
> >>>>
> >>>> The only difference I see with VRB1 are backed by corresponding
> >>>> op_flags (POWER & FP16), is that correct? If so, it does not make
> >>>> sense to me to have a specific function for VRB2.
> >>>
> >>> There are more changes but these are only formally enabled in the
> >>> next stepping hence some of the related code is not included yet.
> >>> More generally
> >> the FCW and IP is different from VRB1 implementation.
> >>
> >> Currently, the code is almost identical so vrb1 implementation should
> >> be reused. If some later changes makes the two implementations
> >> diverge, then we can consider having a dedicated function for VRB2 at that
> time.
> >
> > If I may, I believe this is best as-is notably for patches and support.
> > The functions are fairly small (not much code overlap quantitatively)
> > and the underlying IP is different (with more differences we can
> > enable over time). I don’t think it would help anyone really to try to make
> them coexist for a small period of time.
> > Does that sound fair?
> 
> I disagree, as I explained the code currently is almost identical, so just share the
> code.
> 
> You will diverge, if *really* necessary, when it will make more sense to have
> two separate functions. For now it is not the case in my opinion.

OK I had another look. I can share a common descriptor generation function. For the FCW generation these are just different structures and sizes, different prototype, I really don't think it would make sense to try to artificially generate them together.
Updating in new v5 this week. 


> 
> Thanks,
> Maxime
> 
> >
> >
> >>
> >>>>
> >>>>> +}
> >>>>> +
> >>>>>     static inline int
> >>>>>     vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
> >>>>>     		struct acc_dma_req_desc *desc, @@ -3882,6
> +3944,58 @@
> >>>>> vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op
> >>>> *op,
> >>>>>     	return 0;
> >>>>>     }
> >>>>>
> >>>>> +static inline int
> >>>>> +vrb2_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
> >>>>> +		struct acc_dma_req_desc *desc,
> >>>>> +		struct rte_mbuf *input, struct rte_mbuf *output, struct
> >>>> rte_mbuf *win_input,
> >>>>> +		struct rte_mbuf *pwr, uint32_t *in_offset, uint32_t
> >>>> *out_offset,
> >>>>> +		uint32_t *win_offset, uint32_t *pwr_offset) {
> >>>>> +	bool pwr_en = check_bit(op->fft.op_flags,
> >>>> RTE_BBDEV_FFT_POWER_MEAS);
> >>>>> +	bool win_en = check_bit(op->fft.op_flags,
> >>>> RTE_BBDEV_FFT_DEWINDOWING);
> >>>>> +	int num_cs = 0, i, bd_idx = 1;
> >>>>> +
> >>>>> +	/* FCW already done */
> >>>>> +	acc_header_init(desc);
> >>>>> +
> >>>>> +	RTE_SET_USED(win_input);
> >>>>> +	RTE_SET_USED(win_offset);
> >>>>> +
> >>>>> +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(input,
> >>>> *in_offset);
> >>>>> +	desc->data_ptrs[bd_idx].blen = op->fft.input_sequence_size *
> >>>> ACC_IQ_SIZE;
> >>>>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_IN;
> >>>>> +	desc->data_ptrs[bd_idx].last = 1;
> >>>>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
> >>>>> +	bd_idx++;
> >>>>> +
> >>>>> +	desc->data_ptrs[bd_idx].address =
> >>>>> +rte_pktmbuf_iova_offset(output,
> >>>> *out_offset);
> >>>>> +	desc->data_ptrs[bd_idx].blen = op->fft.output_sequence_size *
> >>>> ACC_IQ_SIZE;
> >>>>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_HARD;
> >>>>> +	desc->data_ptrs[bd_idx].last = pwr_en ? 0 : 1;
> >>>>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
> >>>>> +	desc->m2dlen = win_en ? 3 : 2;
> >>>>> +	desc->d2mlen = pwr_en ? 2 : 1;
> >>>>> +	desc->ib_ant_offset = op->fft.input_sequence_size;
> >>>>> +	desc->num_ant = op->fft.num_antennas_log2 - 3;
> >>>>> +
> >>>>> +	for (i = 0; i < RTE_BBDEV_MAX_CS; i++)
> >>>>> +		if (check_bit(op->fft.cs_bitmap, 1 << i))
> >>>>> +			num_cs++;
> >>>>> +	desc->num_cs = num_cs;
> >>>>> +
> >>>>> +	if (pwr_en && pwr) {
> >>>>> +		bd_idx++;
> >>>>> +		desc->data_ptrs[bd_idx].address =
> >>>> rte_pktmbuf_iova_offset(pwr, *pwr_offset);
> >>>>> +		desc->data_ptrs[bd_idx].blen = num_cs * (1 << op-
> >>>>> fft.num_antennas_log2) * 4;
> >>>>> +		desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_SOFT;
> >>>>> +		desc->data_ptrs[bd_idx].last = 1;
> >>>>> +		desc->data_ptrs[bd_idx].dma_ext = 0;
> >>>>> +	}
> >>>>> +	desc->ob_cyc_offset = op->fft.output_sequence_size;
> >>>>> +	desc->ob_ant_offset = op->fft.output_sequence_size * num_cs;
> >>>>> +	desc->op_addr = op;
> >>>>> +	return 0;
> >>>>> +}
> >>>>>
> >>>>>     /** Enqueue one FFT operation for device. */
> >>>>>     static inline int
> >>>>> @@ -3889,22 +4003,32 @@ vrb_enqueue_fft_one_op(struct acc_queue
> >> *q,
> >>>> struct rte_bbdev_fft_op *op,
> >>>>>     		uint16_t total_enqueued_cbs)
> >>>>>     {
> >>>>>     	union acc_dma_desc *desc;
> >>>>> -	struct rte_mbuf *input, *output;
> >>>>> -	uint32_t in_offset, out_offset;
> >>>>> +	struct rte_mbuf *input, *output, *pwr, *win;
> >>>>> +	uint32_t in_offset, out_offset, pwr_offset, win_offset;
> >>>>>     	struct acc_fcw_fft *fcw;
> >>>>>
> >>>>>     	desc = acc_desc(q, total_enqueued_cbs);
> >>>>>     	input = op->fft.base_input.data;
> >>>>>     	output = op->fft.base_output.data;
> >>>>> +	pwr = op->fft.power_meas_output.data;
> >>>>> +	win = op->fft.dewindowing_input.data;
> >>>>>     	in_offset = op->fft.base_input.offset;
> >>>>>     	out_offset = op->fft.base_output.offset;
> >>>>> +	pwr_offset = op->fft.power_meas_output.offset;
> >>>>> +	win_offset = op->fft.dewindowing_input.offset;
> >>>>>
> >>>>>     	fcw = (struct acc_fcw_fft *) (q->fcw_ring +
> >>>>>     			((q->sw_ring_head + total_enqueued_cbs) & q-
> >>>>> sw_ring_wrap_mask)
> >>>>>     			* ACC_MAX_FCW_SIZE);
> >>>>>
> >>>>> -	vrb1_fcw_fft_fill(op, fcw);
> >>>>> -	vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset,
> >>>> &out_offset);
> >>>>> +	if (q->d->device_variant == VRB1_VARIANT) {
> >>>>> +		vrb1_fcw_fft_fill(op, fcw);
> >>>>> +		vrb1_dma_desc_fft_fill(op, &desc->req, input, output,
> >>>> &in_offset, &out_offset);
> >>>>> +	} else {
> >>>>> +		vrb2_fcw_fft_fill(op, (struct acc_fcw_fft_3 *) fcw);
> >>>>> +		vrb2_dma_desc_fft_fill(op, &desc->req, input, output, win,
> >>>> pwr,
> >>>>> +				&in_offset, &out_offset, &win_offset,
> >>>> &pwr_offset);
> >>>>> +	}
> >>>>>     #ifdef RTE_LIBRTE_BBDEV_DEBUG
> >>>>>     	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
> >>>>>     			sizeof(desc->req.fcw_fft));
> >>>
> >


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
  2023-10-05 17:59             ` Chautru, Nicolas
@ 2023-10-06 12:05               ` Maxime Coquelin
  2023-10-06 20:25                 ` Chautru, Nicolas
  0 siblings, 1 reply; 42+ messages in thread
From: Maxime Coquelin @ 2023-10-06 12:05 UTC (permalink / raw)
  To: Chautru, Nicolas, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan



On 10/5/23 19:59, Chautru, Nicolas wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Thursday, October 5, 2023 7:35 AM
>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
>> <hernan.vargas@intel.com>
>> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
>>
>>
>>
>> On 10/4/23 23:18, Chautru, Nicolas wrote:
>>> Hi Maxime,
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> Sent: Wednesday, October 4, 2023 12:11 AM
>>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>>>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
>> Hernan
>>>> <hernan.vargas@intel.com>
>>>> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2
>>>> variant
>>>>
>>>>
>>>>
>>>> On 10/3/23 20:20, Chautru, Nicolas wrote:
>>>>> Hi Maxime,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>> Sent: Tuesday, October 3, 2023 7:37 AM
>>>>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>>>>>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
>>>> Hernan
>>>>>> <hernan.vargas@intel.com>
>>>>>> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2
>>>>>> variant
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/29/23 18:35, Nicolas Chautru wrote:
>>>>>>> Support for the FFT the processing specific to the
>>>>>>> VRB2 variant.
>>>>>>>
>>>>>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
>>>>>>> ---
>>>>>>>      drivers/baseband/acc/rte_vrb_pmd.c | 132
>>>>>> ++++++++++++++++++++++++++++-
>>>>>>>      1 file changed, 128 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
>>>>>>> b/drivers/baseband/acc/rte_vrb_pmd.c
>>>>>>> index 93add82947..ce4b90d8e7 100644
>>>>>>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
>>>>>>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
>>>>>>> @@ -903,6 +903,9 @@ vrb_queue_setup(struct rte_bbdev *dev,
>>>>>>> uint16_t
>>>>>> queue_id,
>>>>>>>      			ACC_FCW_LD_BLEN : (conf->op_type ==
>>>>>> RTE_BBDEV_OP_FFT ?
>>>>>>>      			ACC_FCW_FFT_BLEN :
>> ACC_FCW_MLDTS_BLEN))));
>>>>>>>
>>>>>>> +	if ((q->d->device_variant == VRB2_VARIANT) && (conf->op_type ==
>>>>>> RTE_BBDEV_OP_FFT))
>>>>>>> +		fcw_len = ACC_FCW_FFT_BLEN_3;
>>>>>>> +
>>>>>>>      	for (desc_idx = 0; desc_idx < d->sw_ring_max_depth;
>> desc_idx++) {
>>>>>>>      		desc = q->ring_addr + desc_idx;
>>>>>>>      		desc->req.word0 = ACC_DMA_DESC_TYPE; @@ -
>> 1323,6
>>>>>> +1326,24 @@
>>>>>>> vrb_dev_info_get(struct rte_bbdev *dev, struct
>>>>>>> rte_bbdev_driver_info
>>>>>> *dev_info)
>>>>>>>      			.num_buffers_soft_out = 0,
>>>>>>>      			}
>>>>>>>      		},
>>>>>>> +		{
>>>>>>> +			.type	= RTE_BBDEV_OP_FFT,
>>>>>>> +			.cap.fft = {
>>>>>>> +				.capability_flags =
>>>>>>> +
>>>>>> 	RTE_BBDEV_FFT_WINDOWING |
>>>>>>> +
>>>>>> 	RTE_BBDEV_FFT_CS_ADJUSTMENT |
>>>>>>> +
>>>>>> 	RTE_BBDEV_FFT_DFT_BYPASS |
>>>>>>> +
>>>>>> 	RTE_BBDEV_FFT_IDFT_BYPASS |
>>>>>>> +						RTE_BBDEV_FFT_FP16_INPUT
>>>>>> |
>>>>>>> +
>>>>>> 	RTE_BBDEV_FFT_FP16_OUTPUT |
>>>>>>> +
>>>>>> 	RTE_BBDEV_FFT_POWER_MEAS |
>>>>>>> +
>>>>>> 	RTE_BBDEV_FFT_WINDOWING_BYPASS,
>>>>>>> +				.num_buffers_src =
>>>>>>> +						1,
>>>>>>> +				.num_buffers_dst =
>>>>>>> +						1,
>>>>>>> +			}
>>>>>>> +		},
>>>>>>>      		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
>>>>>>>      	};
>>>>>>>
>>>>>>> @@ -3849,6 +3870,47 @@ vrb1_fcw_fft_fill(struct rte_bbdev_fft_op
>>>>>>> *op,
>>>>>> struct acc_fcw_fft *fcw)
>>>>>>>      		fcw->bypass = 0;
>>>>>>>      }
>>>>>>>
>>>>>>> +/* Fill in a frame control word for FFT processing. */ static
>>>>>>> +inline void vrb2_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct
>>>>>>> +acc_fcw_fft_3 *fcw) {
>>>>>>> +	fcw->in_frame_size = op->fft.input_sequence_size;
>>>>>>> +	fcw->leading_pad_size = op->fft.input_leading_padding;
>>>>>>> +	fcw->out_frame_size = op->fft.output_sequence_size;
>>>>>>> +	fcw->leading_depad_size = op->fft.output_leading_depadding;
>>>>>>> +	fcw->cs_window_sel = op->fft.window_index[0] +
>>>>>>> +			(op->fft.window_index[1] << 8) +
>>>>>>> +			(op->fft.window_index[2] << 16) +
>>>>>>> +			(op->fft.window_index[3] << 24);
>>>>>>> +	fcw->cs_window_sel2 = op->fft.window_index[4] +
>>>>>>> +			(op->fft.window_index[5] << 8);
>>>>>>> +	fcw->cs_enable_bmap = op->fft.cs_bitmap;
>>>>>>> +	fcw->num_antennas = op->fft.num_antennas_log2;
>>>>>>> +	fcw->idft_size = op->fft.idft_log2;
>>>>>>> +	fcw->dft_size = op->fft.dft_log2;
>>>>>>> +	fcw->cs_offset = op->fft.cs_time_adjustment;
>>>>>>> +	fcw->idft_shift = op->fft.idft_shift;
>>>>>>> +	fcw->dft_shift = op->fft.dft_shift;
>>>>>>> +	fcw->cs_multiplier = op->fft.ncs_reciprocal;
>>>>>>> +	fcw->power_shift = op->fft.power_shift; > +	fcw->exp_adj = op-
>>>>>>> fft.fp16_exp_adjust;
>>>>>>> +	fcw->fp16_in = check_bit(op->fft.op_flags,
>>>>>> RTE_BBDEV_FFT_FP16_INPUT);
>>>>>>> +	fcw->fp16_out = check_bit(op->fft.op_flags,
>>>>>> RTE_BBDEV_FFT_FP16_OUTPUT);
>>>>>>> +	fcw->power_en = check_bit(op->fft.op_flags,
>>>>>> RTE_BBDEV_FFT_POWER_MEAS);
>>>>>>> +	if (check_bit(op->fft.op_flags,
>>>>>>> +			RTE_BBDEV_FFT_IDFT_BYPASS)) {
>>>>>>> +		if (check_bit(op->fft.op_flags,
>>>>>>> +				RTE_BBDEV_FFT_WINDOWING_BYPASS))
>>>>>>> +			fcw->bypass = 2;
>>>>>>> +		else
>>>>>>> +			fcw->bypass = 1;
>>>>>>> +	} else if (check_bit(op->fft.op_flags,
>>>>>>> +			RTE_BBDEV_FFT_DFT_BYPASS))
>>>>>>> +		fcw->bypass = 3;
>>>>>>> +	else
>>>>>>> +		fcw->bypass = 0;
>>>>>>
>>>>>> The only difference I see with VRB1 are backed by corresponding
>>>>>> op_flags (POWER & FP16), is that correct? If so, it does not make
>>>>>> sense to me to have a specific function for VRB2.
>>>>>
>>>>> There are more changes but these are only formally enabled in the
>>>>> next stepping hence some of the related code is not included yet.
>>>>> More generally
>>>> the FCW and IP is different from VRB1 implementation.
>>>>
>>>> Currently, the code is almost identical so vrb1 implementation should
>>>> be reused. If some later changes makes the two implementations
>>>> diverge, then we can consider having a dedicated function for VRB2 at that
>> time.
>>>
>>> If I may, I believe this is best as-is notably for patches and support.
>>> The functions are fairly small (not much code overlap quantitatively)
>>> and the underlying IP is different (with more differences we can
>>> enable over time). I don’t think it would help anyone really to try to make
>> them coexist for a small period of time.
>>> Does that sound fair?
>>
>> I disagree, as I explained the code currently is almost identical, so just share the
>> code.
>>
>> You will diverge, if *really* necessary, when it will make more sense to have
>> two separate functions. For now it is not the case in my opinion.
> 
> OK I had another look. I can share a common descriptor generation function. For the FCW generation these are just different structures and sizes, different prototype, I really don't think it would make sense to try to artificially generate them together.
> Updating in new v5 this week.

struct __rte_packed acc_fcw_fft {
	uint32_t in_frame_size:16,
		leading_pad_size:16;
	uint32_t out_frame_size:16,
		leading_depad_size:16;
	uint32_t cs_window_sel;
	uint32_t cs_window_sel2:16,
		cs_enable_bmap:16;
	uint32_t num_antennas:8,
		idft_size:8,
		dft_size:8,
		cs_offset:8;
	uint32_t idft_shift:8,
		dft_shift:8,
		cs_multiplier:16;
	uint32_t bypass:2,
		fp16_in:1, /* Not supported in VRB1 */
		fp16_out:1,
		exp_adj:4,
		power_shift:4,
		power_en:1,
		res:19;
};

+struct __rte_packed acc_fcw_fft_3 {
	uint32_t in_frame_size:16,
		leading_pad_size:16;
	uint32_t out_frame_size:16,
		leading_depad_size:16;
	uint32_t cs_window_sel;
	uint32_t cs_window_sel2:16,
		cs_enable_bmap:16;
	uint32_t num_antennas:8,
		idft_size:8,
		dft_size:8,
		cs_offset:8;
	uint32_t idft_shift:8,
		dft_shift:8,
		cs_multiplier:16;
	uint32_t bypass:2,
		fp16_in:1,
		fp16_out:1,
		exp_adj:4,
		power_shift:4,
		power_en:1,

===> New fields:
		enable_dewin:1,
		freq_resample_mode:2,
		depad_output_size:16;
	uint16_t cs_theta_0[ACC_MAX_CS];
	uint32_t cs_theta_d[ACC_MAX_CS];
	int8_t cs_time_offset[ACC_MAX_CS];
};

HW designers did it right with SW in mind. There a just new fields on
VRB2, so IMHO it can be shared also.

> 
> 
>>
>> Thanks,
>> Maxime
>>
>>>
>>>
>>>>
>>>>>>
>>>>>>> +}
>>>>>>> +
>>>>>>>      static inline int
>>>>>>>      vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
>>>>>>>      		struct acc_dma_req_desc *desc, @@ -3882,6
>> +3944,58 @@
>>>>>>> vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op
>>>>>> *op,
>>>>>>>      	return 0;
>>>>>>>      }
>>>>>>>
>>>>>>> +static inline int
>>>>>>> +vrb2_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
>>>>>>> +		struct acc_dma_req_desc *desc,
>>>>>>> +		struct rte_mbuf *input, struct rte_mbuf *output, struct
>>>>>> rte_mbuf *win_input,
>>>>>>> +		struct rte_mbuf *pwr, uint32_t *in_offset, uint32_t
>>>>>> *out_offset,
>>>>>>> +		uint32_t *win_offset, uint32_t *pwr_offset) {
>>>>>>> +	bool pwr_en = check_bit(op->fft.op_flags,
>>>>>> RTE_BBDEV_FFT_POWER_MEAS);
>>>>>>> +	bool win_en = check_bit(op->fft.op_flags,
>>>>>> RTE_BBDEV_FFT_DEWINDOWING);
>>>>>>> +	int num_cs = 0, i, bd_idx = 1;
>>>>>>> +
>>>>>>> +	/* FCW already done */
>>>>>>> +	acc_header_init(desc);
>>>>>>> +
>>>>>>> +	RTE_SET_USED(win_input);
>>>>>>> +	RTE_SET_USED(win_offset);
>>>>>>> +
>>>>>>> +	desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(input,
>>>>>> *in_offset);
>>>>>>> +	desc->data_ptrs[bd_idx].blen = op->fft.input_sequence_size *
>>>>>> ACC_IQ_SIZE;
>>>>>>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_IN;
>>>>>>> +	desc->data_ptrs[bd_idx].last = 1;
>>>>>>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
>>>>>>> +	bd_idx++;
>>>>>>> +
>>>>>>> +	desc->data_ptrs[bd_idx].address =
>>>>>>> +rte_pktmbuf_iova_offset(output,
>>>>>> *out_offset);
>>>>>>> +	desc->data_ptrs[bd_idx].blen = op->fft.output_sequence_size *
>>>>>> ACC_IQ_SIZE;
>>>>>>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_HARD;
>>>>>>> +	desc->data_ptrs[bd_idx].last = pwr_en ? 0 : 1;
>>>>>>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
>>>>>>> +	desc->m2dlen = win_en ? 3 : 2;
>>>>>>> +	desc->d2mlen = pwr_en ? 2 : 1;
>>>>>>> +	desc->ib_ant_offset = op->fft.input_sequence_size;
>>>>>>> +	desc->num_ant = op->fft.num_antennas_log2 - 3;
>>>>>>> +
>>>>>>> +	for (i = 0; i < RTE_BBDEV_MAX_CS; i++)
>>>>>>> +		if (check_bit(op->fft.cs_bitmap, 1 << i))
>>>>>>> +			num_cs++;
>>>>>>> +	desc->num_cs = num_cs;
>>>>>>> +
>>>>>>> +	if (pwr_en && pwr) {
>>>>>>> +		bd_idx++;
>>>>>>> +		desc->data_ptrs[bd_idx].address =
>>>>>> rte_pktmbuf_iova_offset(pwr, *pwr_offset);
>>>>>>> +		desc->data_ptrs[bd_idx].blen = num_cs * (1 << op-
>>>>>>> fft.num_antennas_log2) * 4;
>>>>>>> +		desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_SOFT;
>>>>>>> +		desc->data_ptrs[bd_idx].last = 1;
>>>>>>> +		desc->data_ptrs[bd_idx].dma_ext = 0;
>>>>>>> +	}
>>>>>>> +	desc->ob_cyc_offset = op->fft.output_sequence_size;
>>>>>>> +	desc->ob_ant_offset = op->fft.output_sequence_size * num_cs;
>>>>>>> +	desc->op_addr = op;
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>>
>>>>>>>      /** Enqueue one FFT operation for device. */
>>>>>>>      static inline int
>>>>>>> @@ -3889,22 +4003,32 @@ vrb_enqueue_fft_one_op(struct acc_queue
>>>> *q,
>>>>>> struct rte_bbdev_fft_op *op,
>>>>>>>      		uint16_t total_enqueued_cbs)
>>>>>>>      {
>>>>>>>      	union acc_dma_desc *desc;
>>>>>>> -	struct rte_mbuf *input, *output;
>>>>>>> -	uint32_t in_offset, out_offset;
>>>>>>> +	struct rte_mbuf *input, *output, *pwr, *win;
>>>>>>> +	uint32_t in_offset, out_offset, pwr_offset, win_offset;
>>>>>>>      	struct acc_fcw_fft *fcw;
>>>>>>>
>>>>>>>      	desc = acc_desc(q, total_enqueued_cbs);
>>>>>>>      	input = op->fft.base_input.data;
>>>>>>>      	output = op->fft.base_output.data;
>>>>>>> +	pwr = op->fft.power_meas_output.data;
>>>>>>> +	win = op->fft.dewindowing_input.data;
>>>>>>>      	in_offset = op->fft.base_input.offset;
>>>>>>>      	out_offset = op->fft.base_output.offset;
>>>>>>> +	pwr_offset = op->fft.power_meas_output.offset;
>>>>>>> +	win_offset = op->fft.dewindowing_input.offset;
>>>>>>>
>>>>>>>      	fcw = (struct acc_fcw_fft *) (q->fcw_ring +
>>>>>>>      			((q->sw_ring_head + total_enqueued_cbs) & q-
>>>>>>> sw_ring_wrap_mask)
>>>>>>>      			* ACC_MAX_FCW_SIZE);
>>>>>>>
>>>>>>> -	vrb1_fcw_fft_fill(op, fcw);
>>>>>>> -	vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset,
>>>>>> &out_offset);
>>>>>>> +	if (q->d->device_variant == VRB1_VARIANT) {
>>>>>>> +		vrb1_fcw_fft_fill(op, fcw);
>>>>>>> +		vrb1_dma_desc_fft_fill(op, &desc->req, input, output,
>>>>>> &in_offset, &out_offset);
>>>>>>> +	} else {
>>>>>>> +		vrb2_fcw_fft_fill(op, (struct acc_fcw_fft_3 *) fcw);
>>>>>>> +		vrb2_dma_desc_fft_fill(op, &desc->req, input, output, win,
>>>>>> pwr,
>>>>>>> +				&in_offset, &out_offset, &win_offset,
>>>>>> &pwr_offset);
>>>>>>> +	}
>>>>>>>      #ifdef RTE_LIBRTE_BBDEV_DEBUG
>>>>>>>      	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
>>>>>>>      			sizeof(desc->req.fcw_fft));
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
  2023-10-06 12:05               ` Maxime Coquelin
@ 2023-10-06 20:25                 ` Chautru, Nicolas
  0 siblings, 0 replies; 42+ messages in thread
From: Chautru, Nicolas @ 2023-10-06 20:25 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: hemant.agrawal, david.marchand, Vargas, Hernan

Hi Maxime, 

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, October 6, 2023 5:06 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant
> 
> 
> 
> On 10/5/23 19:59, Chautru, Nicolas wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Thursday, October 5, 2023 7:35 AM
> >> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> >> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
> Hernan
> >> <hernan.vargas@intel.com>
> >> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2
> >> variant
> >>
> >>
> >>
> >> On 10/4/23 23:18, Chautru, Nicolas wrote:
> >>> Hi Maxime,
> >>>
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>> Sent: Wednesday, October 4, 2023 12:11 AM
> >>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> >>>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
> >> Hernan
> >>>> <hernan.vargas@intel.com>
> >>>> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2
> >>>> variant
> >>>>
> >>>>
> >>>>
> >>>> On 10/3/23 20:20, Chautru, Nicolas wrote:
> >>>>> Hi Maxime,
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>>>> Sent: Tuesday, October 3, 2023 7:37 AM
> >>>>>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> >>>>>> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas,
> >>>> Hernan
> >>>>>> <hernan.vargas@intel.com>
> >>>>>> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to
> >>>>>> VRB2 variant
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 9/29/23 18:35, Nicolas Chautru wrote:
> >>>>>>> Support for the FFT the processing specific to the
> >>>>>>> VRB2 variant.
> >>>>>>>
> >>>>>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> >>>>>>> ---
> >>>>>>>      drivers/baseband/acc/rte_vrb_pmd.c | 132
> >>>>>> ++++++++++++++++++++++++++++-
> >>>>>>>      1 file changed, 128 insertions(+), 4 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>>>> b/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>>>> index 93add82947..ce4b90d8e7 100644
> >>>>>>> --- a/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>>>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c
> >>>>>>> @@ -903,6 +903,9 @@ vrb_queue_setup(struct rte_bbdev *dev,
> >>>>>>> uint16_t
> >>>>>> queue_id,
> >>>>>>>      			ACC_FCW_LD_BLEN : (conf->op_type ==
> >>>>>> RTE_BBDEV_OP_FFT ?
> >>>>>>>      			ACC_FCW_FFT_BLEN :
> >> ACC_FCW_MLDTS_BLEN))));
> >>>>>>>
> >>>>>>> +	if ((q->d->device_variant == VRB2_VARIANT) && (conf->op_type
> >>>>>>> +==
> >>>>>> RTE_BBDEV_OP_FFT))
> >>>>>>> +		fcw_len = ACC_FCW_FFT_BLEN_3;
> >>>>>>> +
> >>>>>>>      	for (desc_idx = 0; desc_idx < d->sw_ring_max_depth;
> >> desc_idx++) {
> >>>>>>>      		desc = q->ring_addr + desc_idx;
> >>>>>>>      		desc->req.word0 = ACC_DMA_DESC_TYPE; @@ -
> >> 1323,6
> >>>>>> +1326,24 @@
> >>>>>>> vrb_dev_info_get(struct rte_bbdev *dev, struct
> >>>>>>> rte_bbdev_driver_info
> >>>>>> *dev_info)
> >>>>>>>      			.num_buffers_soft_out = 0,
> >>>>>>>      			}
> >>>>>>>      		},
> >>>>>>> +		{
> >>>>>>> +			.type	= RTE_BBDEV_OP_FFT,
> >>>>>>> +			.cap.fft = {
> >>>>>>> +				.capability_flags =
> >>>>>>> +
> >>>>>> 	RTE_BBDEV_FFT_WINDOWING |
> >>>>>>> +
> >>>>>> 	RTE_BBDEV_FFT_CS_ADJUSTMENT |
> >>>>>>> +
> >>>>>> 	RTE_BBDEV_FFT_DFT_BYPASS |
> >>>>>>> +
> >>>>>> 	RTE_BBDEV_FFT_IDFT_BYPASS |
> >>>>>>> +
> 	RTE_BBDEV_FFT_FP16_INPUT
> >>>>>> |
> >>>>>>> +
> >>>>>> 	RTE_BBDEV_FFT_FP16_OUTPUT |
> >>>>>>> +
> >>>>>> 	RTE_BBDEV_FFT_POWER_MEAS |
> >>>>>>> +
> >>>>>> 	RTE_BBDEV_FFT_WINDOWING_BYPASS,
> >>>>>>> +				.num_buffers_src =
> >>>>>>> +						1,
> >>>>>>> +				.num_buffers_dst =
> >>>>>>> +						1,
> >>>>>>> +			}
> >>>>>>> +		},
> >>>>>>>      		RTE_BBDEV_END_OF_CAPABILITIES_LIST()
> >>>>>>>      	};
> >>>>>>>
> >>>>>>> @@ -3849,6 +3870,47 @@ vrb1_fcw_fft_fill(struct rte_bbdev_fft_op
> >>>>>>> *op,
> >>>>>> struct acc_fcw_fft *fcw)
> >>>>>>>      		fcw->bypass = 0;
> >>>>>>>      }
> >>>>>>>
> >>>>>>> +/* Fill in a frame control word for FFT processing. */ static
> >>>>>>> +inline void vrb2_fcw_fft_fill(struct rte_bbdev_fft_op *op,
> >>>>>>> +struct
> >>>>>>> +acc_fcw_fft_3 *fcw) {
> >>>>>>> +	fcw->in_frame_size = op->fft.input_sequence_size;
> >>>>>>> +	fcw->leading_pad_size = op->fft.input_leading_padding;
> >>>>>>> +	fcw->out_frame_size = op->fft.output_sequence_size;
> >>>>>>> +	fcw->leading_depad_size = op->fft.output_leading_depadding;
> >>>>>>> +	fcw->cs_window_sel = op->fft.window_index[0] +
> >>>>>>> +			(op->fft.window_index[1] << 8) +
> >>>>>>> +			(op->fft.window_index[2] << 16) +
> >>>>>>> +			(op->fft.window_index[3] << 24);
> >>>>>>> +	fcw->cs_window_sel2 = op->fft.window_index[4] +
> >>>>>>> +			(op->fft.window_index[5] << 8);
> >>>>>>> +	fcw->cs_enable_bmap = op->fft.cs_bitmap;
> >>>>>>> +	fcw->num_antennas = op->fft.num_antennas_log2;
> >>>>>>> +	fcw->idft_size = op->fft.idft_log2;
> >>>>>>> +	fcw->dft_size = op->fft.dft_log2;
> >>>>>>> +	fcw->cs_offset = op->fft.cs_time_adjustment;
> >>>>>>> +	fcw->idft_shift = op->fft.idft_shift;
> >>>>>>> +	fcw->dft_shift = op->fft.dft_shift;
> >>>>>>> +	fcw->cs_multiplier = op->fft.ncs_reciprocal;
> >>>>>>> +	fcw->power_shift = op->fft.power_shift; > +	fcw->exp_adj
> = op-
> >>>>>>> fft.fp16_exp_adjust;
> >>>>>>> +	fcw->fp16_in = check_bit(op->fft.op_flags,
> >>>>>> RTE_BBDEV_FFT_FP16_INPUT);
> >>>>>>> +	fcw->fp16_out = check_bit(op->fft.op_flags,
> >>>>>> RTE_BBDEV_FFT_FP16_OUTPUT);
> >>>>>>> +	fcw->power_en = check_bit(op->fft.op_flags,
> >>>>>> RTE_BBDEV_FFT_POWER_MEAS);
> >>>>>>> +	if (check_bit(op->fft.op_flags,
> >>>>>>> +			RTE_BBDEV_FFT_IDFT_BYPASS)) {
> >>>>>>> +		if (check_bit(op->fft.op_flags,
> >>>>>>> +
> 	RTE_BBDEV_FFT_WINDOWING_BYPASS))
> >>>>>>> +			fcw->bypass = 2;
> >>>>>>> +		else
> >>>>>>> +			fcw->bypass = 1;
> >>>>>>> +	} else if (check_bit(op->fft.op_flags,
> >>>>>>> +			RTE_BBDEV_FFT_DFT_BYPASS))
> >>>>>>> +		fcw->bypass = 3;
> >>>>>>> +	else
> >>>>>>> +		fcw->bypass = 0;
> >>>>>>
> >>>>>> The only difference I see with VRB1 are backed by corresponding
> >>>>>> op_flags (POWER & FP16), is that correct? If so, it does not make
> >>>>>> sense to me to have a specific function for VRB2.
> >>>>>
> >>>>> There are more changes but these are only formally enabled in the
> >>>>> next stepping hence some of the related code is not included yet.
> >>>>> More generally
> >>>> the FCW and IP is different from VRB1 implementation.
> >>>>
> >>>> Currently, the code is almost identical so vrb1 implementation
> >>>> should be reused. If some later changes makes the two
> >>>> implementations diverge, then we can consider having a dedicated
> >>>> function for VRB2 at that
> >> time.
> >>>
> >>> If I may, I believe this is best as-is notably for patches and support.
> >>> The functions are fairly small (not much code overlap
> >>> quantitatively) and the underlying IP is different (with more
> >>> differences we can enable over time). I don’t think it would help
> >>> anyone really to try to make
> >> them coexist for a small period of time.
> >>> Does that sound fair?
> >>
> >> I disagree, as I explained the code currently is almost identical, so
> >> just share the code.
> >>
> >> You will diverge, if *really* necessary, when it will make more sense
> >> to have two separate functions. For now it is not the case in my opinion.
> >
> > OK I had another look. I can share a common descriptor generation function.
> For the FCW generation these are just different structures and sizes, different
> prototype, I really don't think it would make sense to try to artificially generate
> them together.
> > Updating in new v5 this week.
> 
> struct __rte_packed acc_fcw_fft {
> 	uint32_t in_frame_size:16,
> 		leading_pad_size:16;
> 	uint32_t out_frame_size:16,
> 		leading_depad_size:16;
> 	uint32_t cs_window_sel;
> 	uint32_t cs_window_sel2:16,
> 		cs_enable_bmap:16;
> 	uint32_t num_antennas:8,
> 		idft_size:8,
> 		dft_size:8,
> 		cs_offset:8;
> 	uint32_t idft_shift:8,
> 		dft_shift:8,
> 		cs_multiplier:16;
> 	uint32_t bypass:2,
> 		fp16_in:1, /* Not supported in VRB1 */
> 		fp16_out:1,
> 		exp_adj:4,
> 		power_shift:4,
> 		power_en:1,
> 		res:19;
> };
> 
> +struct __rte_packed acc_fcw_fft_3 {
> 	uint32_t in_frame_size:16,
> 		leading_pad_size:16;
> 	uint32_t out_frame_size:16,
> 		leading_depad_size:16;
> 	uint32_t cs_window_sel;
> 	uint32_t cs_window_sel2:16,
> 		cs_enable_bmap:16;
> 	uint32_t num_antennas:8,
> 		idft_size:8,
> 		dft_size:8,
> 		cs_offset:8;
> 	uint32_t idft_shift:8,
> 		dft_shift:8,
> 		cs_multiplier:16;
> 	uint32_t bypass:2,
> 		fp16_in:1,
> 		fp16_out:1,
> 		exp_adj:4,
> 		power_shift:4,
> 		power_en:1,
> 
> ===> New fields:
> 		enable_dewin:1,
> 		freq_resample_mode:2,
> 		depad_output_size:16;
> 	uint16_t cs_theta_0[ACC_MAX_CS];
> 	uint32_t cs_theta_d[ACC_MAX_CS];
> 	int8_t cs_time_offset[ACC_MAX_CS];
> };
> 
> HW designers did it right with SW in mind. There a just new fields on VRB2, so
> IMHO it can be shared also.

I really don’t believe we should, they have different size and different FCW. 
I don't want to obfuscate this in the code and make it look
artificially as if they have the same FCW by doing cast for structure of different size. 
Also there are extra stepping variations that we have to manage here. 

I believe the previous suggestion to change the descriptor was very valuable
but on this very one that would be too artificial to me for the reasons above. 

Thanks

> 
> >
> >
> >>
> >> Thanks,
> >> Maxime
> >>
> >>>
> >>>
> >>>>
> >>>>>>
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>      static inline int
> >>>>>>>      vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
> >>>>>>>      		struct acc_dma_req_desc *desc, @@ -3882,6
> >> +3944,58 @@
> >>>>>>> vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op
> >>>>>> *op,
> >>>>>>>      	return 0;
> >>>>>>>      }
> >>>>>>>
> >>>>>>> +static inline int
> >>>>>>> +vrb2_dma_desc_fft_fill(struct rte_bbdev_fft_op *op,
> >>>>>>> +		struct acc_dma_req_desc *desc,
> >>>>>>> +		struct rte_mbuf *input, struct rte_mbuf *output, struct
> >>>>>> rte_mbuf *win_input,
> >>>>>>> +		struct rte_mbuf *pwr, uint32_t *in_offset, uint32_t
> >>>>>> *out_offset,
> >>>>>>> +		uint32_t *win_offset, uint32_t *pwr_offset) {
> >>>>>>> +	bool pwr_en = check_bit(op->fft.op_flags,
> >>>>>> RTE_BBDEV_FFT_POWER_MEAS);
> >>>>>>> +	bool win_en = check_bit(op->fft.op_flags,
> >>>>>> RTE_BBDEV_FFT_DEWINDOWING);
> >>>>>>> +	int num_cs = 0, i, bd_idx = 1;
> >>>>>>> +
> >>>>>>> +	/* FCW already done */
> >>>>>>> +	acc_header_init(desc);
> >>>>>>> +
> >>>>>>> +	RTE_SET_USED(win_input);
> >>>>>>> +	RTE_SET_USED(win_offset);
> >>>>>>> +
> >>>>>>> +	desc->data_ptrs[bd_idx].address =
> >>>>>>> +rte_pktmbuf_iova_offset(input,
> >>>>>> *in_offset);
> >>>>>>> +	desc->data_ptrs[bd_idx].blen = op->fft.input_sequence_size *
> >>>>>> ACC_IQ_SIZE;
> >>>>>>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_IN;
> >>>>>>> +	desc->data_ptrs[bd_idx].last = 1;
> >>>>>>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
> >>>>>>> +	bd_idx++;
> >>>>>>> +
> >>>>>>> +	desc->data_ptrs[bd_idx].address =
> >>>>>>> +rte_pktmbuf_iova_offset(output,
> >>>>>> *out_offset);
> >>>>>>> +	desc->data_ptrs[bd_idx].blen = op->fft.output_sequence_size *
> >>>>>> ACC_IQ_SIZE;
> >>>>>>> +	desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_HARD;
> >>>>>>> +	desc->data_ptrs[bd_idx].last = pwr_en ? 0 : 1;
> >>>>>>> +	desc->data_ptrs[bd_idx].dma_ext = 0;
> >>>>>>> +	desc->m2dlen = win_en ? 3 : 2;
> >>>>>>> +	desc->d2mlen = pwr_en ? 2 : 1;
> >>>>>>> +	desc->ib_ant_offset = op->fft.input_sequence_size;
> >>>>>>> +	desc->num_ant = op->fft.num_antennas_log2 - 3;
> >>>>>>> +
> >>>>>>> +	for (i = 0; i < RTE_BBDEV_MAX_CS; i++)
> >>>>>>> +		if (check_bit(op->fft.cs_bitmap, 1 << i))
> >>>>>>> +			num_cs++;
> >>>>>>> +	desc->num_cs = num_cs;
> >>>>>>> +
> >>>>>>> +	if (pwr_en && pwr) {
> >>>>>>> +		bd_idx++;
> >>>>>>> +		desc->data_ptrs[bd_idx].address =
> >>>>>> rte_pktmbuf_iova_offset(pwr, *pwr_offset);
> >>>>>>> +		desc->data_ptrs[bd_idx].blen = num_cs * (1 << op-
> >>>>>>> fft.num_antennas_log2) * 4;
> >>>>>>> +		desc->data_ptrs[bd_idx].blkid =
> ACC_DMA_BLKID_OUT_SOFT;
> >>>>>>> +		desc->data_ptrs[bd_idx].last = 1;
> >>>>>>> +		desc->data_ptrs[bd_idx].dma_ext = 0;
> >>>>>>> +	}
> >>>>>>> +	desc->ob_cyc_offset = op->fft.output_sequence_size;
> >>>>>>> +	desc->ob_ant_offset = op->fft.output_sequence_size * num_cs;
> >>>>>>> +	desc->op_addr = op;
> >>>>>>> +	return 0;
> >>>>>>> +}
> >>>>>>>
> >>>>>>>      /** Enqueue one FFT operation for device. */
> >>>>>>>      static inline int
> >>>>>>> @@ -3889,22 +4003,32 @@ vrb_enqueue_fft_one_op(struct
> acc_queue
> >>>> *q,
> >>>>>> struct rte_bbdev_fft_op *op,
> >>>>>>>      		uint16_t total_enqueued_cbs)
> >>>>>>>      {
> >>>>>>>      	union acc_dma_desc *desc;
> >>>>>>> -	struct rte_mbuf *input, *output;
> >>>>>>> -	uint32_t in_offset, out_offset;
> >>>>>>> +	struct rte_mbuf *input, *output, *pwr, *win;
> >>>>>>> +	uint32_t in_offset, out_offset, pwr_offset, win_offset;
> >>>>>>>      	struct acc_fcw_fft *fcw;
> >>>>>>>
> >>>>>>>      	desc = acc_desc(q, total_enqueued_cbs);
> >>>>>>>      	input = op->fft.base_input.data;
> >>>>>>>      	output = op->fft.base_output.data;
> >>>>>>> +	pwr = op->fft.power_meas_output.data;
> >>>>>>> +	win = op->fft.dewindowing_input.data;
> >>>>>>>      	in_offset = op->fft.base_input.offset;
> >>>>>>>      	out_offset = op->fft.base_output.offset;
> >>>>>>> +	pwr_offset = op->fft.power_meas_output.offset;
> >>>>>>> +	win_offset = op->fft.dewindowing_input.offset;
> >>>>>>>
> >>>>>>>      	fcw = (struct acc_fcw_fft *) (q->fcw_ring +
> >>>>>>>      			((q->sw_ring_head + total_enqueued_cbs) & q-
> >>>>>>> sw_ring_wrap_mask)
> >>>>>>>      			* ACC_MAX_FCW_SIZE);
> >>>>>>>
> >>>>>>> -	vrb1_fcw_fft_fill(op, fcw);
> >>>>>>> -	vrb1_dma_desc_fft_fill(op, &desc->req, input, output,
> &in_offset,
> >>>>>> &out_offset);
> >>>>>>> +	if (q->d->device_variant == VRB1_VARIANT) {
> >>>>>>> +		vrb1_fcw_fft_fill(op, fcw);
> >>>>>>> +		vrb1_dma_desc_fft_fill(op, &desc->req, input, output,
> >>>>>> &in_offset, &out_offset);
> >>>>>>> +	} else {
> >>>>>>> +		vrb2_fcw_fft_fill(op, (struct acc_fcw_fft_3 *) fcw);
> >>>>>>> +		vrb2_dma_desc_fft_fill(op, &desc->req, input, output,
> win,
> >>>>>> pwr,
> >>>>>>> +				&in_offset, &out_offset, &win_offset,
> >>>>>> &pwr_offset);
> >>>>>>> +	}
> >>>>>>>      #ifdef RTE_LIBRTE_BBDEV_DEBUG
> >>>>>>>      	rte_memdump(stderr, "FCW", &desc->req.fcw_fft,
> >>>>>>>      			sizeof(desc->req.fcw_fft));
> >>>>>
> >>>
> >


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2023-10-06 20:25 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-29 16:35 [PATCH v3 00/12] VRB2 bbdev PMD introduction Nicolas Chautru
2023-09-29 16:35 ` [PATCH v3 01/12] bbdev: add FFT window width member in driver info Nicolas Chautru
2023-09-29 16:35 ` [PATCH v3 02/12] baseband/acc: add FFT window width in the VRB PMD Nicolas Chautru
2023-10-03 11:52   ` Maxime Coquelin
2023-10-03 19:06     ` Chautru, Nicolas
2023-10-04  7:55       ` Maxime Coquelin
2023-09-29 16:35 ` [PATCH v3 03/12] baseband/acc: remove the 4G SO capability for VRB1 Nicolas Chautru
2023-10-03 12:04   ` Maxime Coquelin
2023-09-29 16:35 ` [PATCH v3 04/12] baseband/acc: allocate FCW memory separately Nicolas Chautru
2023-10-03 12:51   ` Maxime Coquelin
2023-09-29 16:35 ` [PATCH v3 05/12] baseband/acc: add support for MLD operation Nicolas Chautru
2023-09-29 16:35 ` [PATCH v3 06/12] baseband/acc: refactor to allow unified driver extension Nicolas Chautru
2023-10-03 13:14   ` Maxime Coquelin
2023-10-03 18:54     ` Chautru, Nicolas
2023-10-04  7:35       ` Maxime Coquelin
2023-10-04 21:28         ` Chautru, Nicolas
2023-10-05 14:31           ` Maxime Coquelin
2023-10-05 15:00             ` Chautru, Nicolas
2023-09-29 16:35 ` [PATCH v3 07/12] baseband/acc: adding VRB2 device variant Nicolas Chautru
2023-10-03 13:41   ` Maxime Coquelin
2023-09-29 16:35 ` [PATCH v3 08/12] baseband/acc: add FEC capabilities for the VRB2 variant Nicolas Chautru
2023-10-03 14:28   ` Maxime Coquelin
2023-10-04 21:11     ` Chautru, Nicolas
2023-10-05 14:36       ` Maxime Coquelin
2023-09-29 16:35 ` [PATCH v3 09/12] baseband/acc: add FFT support to " Nicolas Chautru
2023-10-03 14:36   ` Maxime Coquelin
2023-10-03 18:20     ` Chautru, Nicolas
2023-10-04  7:11       ` Maxime Coquelin
2023-10-04 21:18         ` Chautru, Nicolas
2023-10-05 14:34           ` Maxime Coquelin
2023-10-05 17:59             ` Chautru, Nicolas
2023-10-06 12:05               ` Maxime Coquelin
2023-10-06 20:25                 ` Chautru, Nicolas
2023-09-29 16:35 ` [PATCH v3 10/12] baseband/acc: add MLD support in " Nicolas Chautru
2023-10-03 15:12   ` Maxime Coquelin
2023-10-03 18:12     ` Chautru, Nicolas
2023-09-29 16:35 ` [PATCH v3 11/12] baseband/acc: add support for VRB2 engine error detection Nicolas Chautru
2023-10-03 15:16   ` Maxime Coquelin
2023-10-03 17:22     ` Chautru, Nicolas
2023-10-03 17:26       ` Maxime Coquelin
2023-09-29 16:35 ` [PATCH v3 12/12] baseband/acc: add configure helper for VRB2 Nicolas Chautru
2023-10-03 15:30   ` Maxime Coquelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).