From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 62221426B6; Wed, 4 Oct 2023 09:11:20 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E9F334029A; Wed, 4 Oct 2023 09:11:19 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id DF2BC40289 for ; Wed, 4 Oct 2023 09:11:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696403478; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C5grN0iBkM5/h7Tm0iGevFgyG7Y3pKFvM570A7hPBx4=; b=gWilyCjavjtIiOGVxA4Csef1soTockcyChA02p/kXROF+TqIVhD+gy66CRpmrvoXgOoied JoRyGTqnvzFddYk+ambMekrb//eSQfh2OzlgCew694poyBbxDNF3cwAFfjWEmGQ2LTYA86 qQkHTCng97F1LS9kJyf/yLYcq6CkrUo= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-336-EJJwghaENuyrIMpKFu8TyA-1; Wed, 04 Oct 2023 03:11:15 -0400 X-MC-Unique: EJJwghaENuyrIMpKFu8TyA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 80B9028237CD; Wed, 4 Oct 2023 07:11:14 +0000 (UTC) Received: from [10.39.208.4] (unknown [10.39.208.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4875D493113; Wed, 4 Oct 2023 07:11:13 +0000 (UTC) Message-ID: <32b6b788-eae3-8540-07da-ff528b06de2b@redhat.com> Date: Wed, 4 Oct 2023 09:11:08 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant To: "Chautru, Nicolas" , "dev@dpdk.org" Cc: "hemant.agrawal@nxp.com" , "david.marchand@redhat.com" , "Vargas, Hernan" References: <20230929163516.3636499-1-nicolas.chautru@intel.com> <20230929163516.3636499-10-nicolas.chautru@intel.com> From: Maxime Coquelin In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 10/3/23 20:20, Chautru, Nicolas wrote: > Hi Maxime, > >> -----Original Message----- >> From: Maxime Coquelin >> Sent: Tuesday, October 3, 2023 7:37 AM >> To: Chautru, Nicolas ; dev@dpdk.org >> Cc: hemant.agrawal@nxp.com; david.marchand@redhat.com; Vargas, Hernan >> >> Subject: Re: [PATCH v3 09/12] baseband/acc: add FFT support to VRB2 variant >> >> >> >> On 9/29/23 18:35, Nicolas Chautru wrote: >>> Support for the FFT the processing specific to the >>> VRB2 variant. >>> >>> Signed-off-by: Nicolas Chautru >>> --- >>> drivers/baseband/acc/rte_vrb_pmd.c | 132 >> ++++++++++++++++++++++++++++- >>> 1 file changed, 128 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/baseband/acc/rte_vrb_pmd.c >>> b/drivers/baseband/acc/rte_vrb_pmd.c >>> index 93add82947..ce4b90d8e7 100644 >>> --- a/drivers/baseband/acc/rte_vrb_pmd.c >>> +++ b/drivers/baseband/acc/rte_vrb_pmd.c >>> @@ -903,6 +903,9 @@ vrb_queue_setup(struct rte_bbdev *dev, uint16_t >> queue_id, >>> ACC_FCW_LD_BLEN : (conf->op_type == >> RTE_BBDEV_OP_FFT ? >>> ACC_FCW_FFT_BLEN : ACC_FCW_MLDTS_BLEN)))); >>> >>> + if ((q->d->device_variant == VRB2_VARIANT) && (conf->op_type == >> RTE_BBDEV_OP_FFT)) >>> + fcw_len = ACC_FCW_FFT_BLEN_3; >>> + >>> for (desc_idx = 0; desc_idx < d->sw_ring_max_depth; desc_idx++) { >>> desc = q->ring_addr + desc_idx; >>> desc->req.word0 = ACC_DMA_DESC_TYPE; @@ -1323,6 >> +1326,24 @@ >>> vrb_dev_info_get(struct rte_bbdev *dev, struct rte_bbdev_driver_info >> *dev_info) >>> .num_buffers_soft_out = 0, >>> } >>> }, >>> + { >>> + .type = RTE_BBDEV_OP_FFT, >>> + .cap.fft = { >>> + .capability_flags = >>> + >> RTE_BBDEV_FFT_WINDOWING | >>> + >> RTE_BBDEV_FFT_CS_ADJUSTMENT | >>> + >> RTE_BBDEV_FFT_DFT_BYPASS | >>> + >> RTE_BBDEV_FFT_IDFT_BYPASS | >>> + RTE_BBDEV_FFT_FP16_INPUT >> | >>> + >> RTE_BBDEV_FFT_FP16_OUTPUT | >>> + >> RTE_BBDEV_FFT_POWER_MEAS | >>> + >> RTE_BBDEV_FFT_WINDOWING_BYPASS, >>> + .num_buffers_src = >>> + 1, >>> + .num_buffers_dst = >>> + 1, >>> + } >>> + }, >>> RTE_BBDEV_END_OF_CAPABILITIES_LIST() >>> }; >>> >>> @@ -3849,6 +3870,47 @@ vrb1_fcw_fft_fill(struct rte_bbdev_fft_op *op, >> struct acc_fcw_fft *fcw) >>> fcw->bypass = 0; >>> } >>> >>> +/* Fill in a frame control word for FFT processing. */ static inline >>> +void vrb2_fcw_fft_fill(struct rte_bbdev_fft_op *op, struct >>> +acc_fcw_fft_3 *fcw) { >>> + fcw->in_frame_size = op->fft.input_sequence_size; >>> + fcw->leading_pad_size = op->fft.input_leading_padding; >>> + fcw->out_frame_size = op->fft.output_sequence_size; >>> + fcw->leading_depad_size = op->fft.output_leading_depadding; >>> + fcw->cs_window_sel = op->fft.window_index[0] + >>> + (op->fft.window_index[1] << 8) + >>> + (op->fft.window_index[2] << 16) + >>> + (op->fft.window_index[3] << 24); >>> + fcw->cs_window_sel2 = op->fft.window_index[4] + >>> + (op->fft.window_index[5] << 8); >>> + fcw->cs_enable_bmap = op->fft.cs_bitmap; >>> + fcw->num_antennas = op->fft.num_antennas_log2; >>> + fcw->idft_size = op->fft.idft_log2; >>> + fcw->dft_size = op->fft.dft_log2; >>> + fcw->cs_offset = op->fft.cs_time_adjustment; >>> + fcw->idft_shift = op->fft.idft_shift; >>> + fcw->dft_shift = op->fft.dft_shift; >>> + fcw->cs_multiplier = op->fft.ncs_reciprocal; >>> + fcw->power_shift = op->fft.power_shift; > + fcw->exp_adj = op- >>> fft.fp16_exp_adjust; >>> + fcw->fp16_in = check_bit(op->fft.op_flags, >> RTE_BBDEV_FFT_FP16_INPUT); >>> + fcw->fp16_out = check_bit(op->fft.op_flags, >> RTE_BBDEV_FFT_FP16_OUTPUT); >>> + fcw->power_en = check_bit(op->fft.op_flags, >> RTE_BBDEV_FFT_POWER_MEAS); >>> + if (check_bit(op->fft.op_flags, >>> + RTE_BBDEV_FFT_IDFT_BYPASS)) { >>> + if (check_bit(op->fft.op_flags, >>> + RTE_BBDEV_FFT_WINDOWING_BYPASS)) >>> + fcw->bypass = 2; >>> + else >>> + fcw->bypass = 1; >>> + } else if (check_bit(op->fft.op_flags, >>> + RTE_BBDEV_FFT_DFT_BYPASS)) >>> + fcw->bypass = 3; >>> + else >>> + fcw->bypass = 0; >> >> The only difference I see with VRB1 are backed by corresponding op_flags >> (POWER & FP16), is that correct? If so, it does not make sense to me to have a >> specific function for VRB2. > > There are more changes but these are only formally enabled in the next stepping hence some of the > related code is not included yet. More generally the FCW and IP is different from VRB1 implementation. Currently, the code is almost identical so vrb1 implementation should be reused. If some later changes makes the two implementations diverge, then we can consider having a dedicated function for VRB2 at that time. >> >>> +} >>> + >>> static inline int >>> vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op *op, >>> struct acc_dma_req_desc *desc, >>> @@ -3882,6 +3944,58 @@ vrb1_dma_desc_fft_fill(struct rte_bbdev_fft_op >> *op, >>> return 0; >>> } >>> >>> +static inline int >>> +vrb2_dma_desc_fft_fill(struct rte_bbdev_fft_op *op, >>> + struct acc_dma_req_desc *desc, >>> + struct rte_mbuf *input, struct rte_mbuf *output, struct >> rte_mbuf *win_input, >>> + struct rte_mbuf *pwr, uint32_t *in_offset, uint32_t >> *out_offset, >>> + uint32_t *win_offset, uint32_t *pwr_offset) { >>> + bool pwr_en = check_bit(op->fft.op_flags, >> RTE_BBDEV_FFT_POWER_MEAS); >>> + bool win_en = check_bit(op->fft.op_flags, >> RTE_BBDEV_FFT_DEWINDOWING); >>> + int num_cs = 0, i, bd_idx = 1; >>> + >>> + /* FCW already done */ >>> + acc_header_init(desc); >>> + >>> + RTE_SET_USED(win_input); >>> + RTE_SET_USED(win_offset); >>> + >>> + desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(input, >> *in_offset); >>> + desc->data_ptrs[bd_idx].blen = op->fft.input_sequence_size * >> ACC_IQ_SIZE; >>> + desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_IN; >>> + desc->data_ptrs[bd_idx].last = 1; >>> + desc->data_ptrs[bd_idx].dma_ext = 0; >>> + bd_idx++; >>> + >>> + desc->data_ptrs[bd_idx].address = rte_pktmbuf_iova_offset(output, >> *out_offset); >>> + desc->data_ptrs[bd_idx].blen = op->fft.output_sequence_size * >> ACC_IQ_SIZE; >>> + desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_HARD; >>> + desc->data_ptrs[bd_idx].last = pwr_en ? 0 : 1; >>> + desc->data_ptrs[bd_idx].dma_ext = 0; >>> + desc->m2dlen = win_en ? 3 : 2; >>> + desc->d2mlen = pwr_en ? 2 : 1; >>> + desc->ib_ant_offset = op->fft.input_sequence_size; >>> + desc->num_ant = op->fft.num_antennas_log2 - 3; >>> + >>> + for (i = 0; i < RTE_BBDEV_MAX_CS; i++) >>> + if (check_bit(op->fft.cs_bitmap, 1 << i)) >>> + num_cs++; >>> + desc->num_cs = num_cs; >>> + >>> + if (pwr_en && pwr) { >>> + bd_idx++; >>> + desc->data_ptrs[bd_idx].address = >> rte_pktmbuf_iova_offset(pwr, *pwr_offset); >>> + desc->data_ptrs[bd_idx].blen = num_cs * (1 << op- >>> fft.num_antennas_log2) * 4; >>> + desc->data_ptrs[bd_idx].blkid = ACC_DMA_BLKID_OUT_SOFT; >>> + desc->data_ptrs[bd_idx].last = 1; >>> + desc->data_ptrs[bd_idx].dma_ext = 0; >>> + } >>> + desc->ob_cyc_offset = op->fft.output_sequence_size; >>> + desc->ob_ant_offset = op->fft.output_sequence_size * num_cs; >>> + desc->op_addr = op; >>> + return 0; >>> +} >>> >>> /** Enqueue one FFT operation for device. */ >>> static inline int >>> @@ -3889,22 +4003,32 @@ vrb_enqueue_fft_one_op(struct acc_queue *q, >> struct rte_bbdev_fft_op *op, >>> uint16_t total_enqueued_cbs) >>> { >>> union acc_dma_desc *desc; >>> - struct rte_mbuf *input, *output; >>> - uint32_t in_offset, out_offset; >>> + struct rte_mbuf *input, *output, *pwr, *win; >>> + uint32_t in_offset, out_offset, pwr_offset, win_offset; >>> struct acc_fcw_fft *fcw; >>> >>> desc = acc_desc(q, total_enqueued_cbs); >>> input = op->fft.base_input.data; >>> output = op->fft.base_output.data; >>> + pwr = op->fft.power_meas_output.data; >>> + win = op->fft.dewindowing_input.data; >>> in_offset = op->fft.base_input.offset; >>> out_offset = op->fft.base_output.offset; >>> + pwr_offset = op->fft.power_meas_output.offset; >>> + win_offset = op->fft.dewindowing_input.offset; >>> >>> fcw = (struct acc_fcw_fft *) (q->fcw_ring + >>> ((q->sw_ring_head + total_enqueued_cbs) & q- >>> sw_ring_wrap_mask) >>> * ACC_MAX_FCW_SIZE); >>> >>> - vrb1_fcw_fft_fill(op, fcw); >>> - vrb1_dma_desc_fft_fill(op, &desc->req, input, output, &in_offset, >> &out_offset); >>> + if (q->d->device_variant == VRB1_VARIANT) { >>> + vrb1_fcw_fft_fill(op, fcw); >>> + vrb1_dma_desc_fft_fill(op, &desc->req, input, output, >> &in_offset, &out_offset); >>> + } else { >>> + vrb2_fcw_fft_fill(op, (struct acc_fcw_fft_3 *) fcw); >>> + vrb2_dma_desc_fft_fill(op, &desc->req, input, output, win, >> pwr, >>> + &in_offset, &out_offset, &win_offset, >> &pwr_offset); >>> + } >>> #ifdef RTE_LIBRTE_BBDEV_DEBUG >>> rte_memdump(stderr, "FCW", &desc->req.fcw_fft, >>> sizeof(desc->req.fcw_fft)); >