From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EE37C4391E; Sun, 21 Jan 2024 17:43:49 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D9F37402AE; Sun, 21 Jan 2024 17:43:49 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 7A3E940150 for ; Sun, 21 Jan 2024 17:43:48 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 40LGPDJC011423; Sun, 21 Jan 2024 08:43:44 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s= pfpt0220; bh=u6wa1eO9xiXwrhQZFqWLHX2JacTmLDWce1uGxdNiJRs=; b=IrF 3sOIdpNNmga0Xowr0jc0J14qH6o3pblItoD86DMI8vCKXndomqTY01anwgGV5B5P e2do+pAPJs20hSyD7X+t7Zx8Hb6tEL/+eqKjRswOKrhNLKjNpiRnmlpvgSui5LJH FHsk1Kg9YV29rRJ62zS4CZpjnBrnqCHss89QrdnsBSUz8I+UrWJTImdTwoAh2RU0 o3EN3/1caLvcyAgwDv9vn704DOm+93iZOJuyn9fIa4SobQZy8HRiHbdFVQXD7JzA yKk6WnuAYTevxrL/JD414swsLbOaDJb2BsjC+yO0Ayc/dk/BEkXLc3Bg7JifQSF+ lOfeE3pBTi+sFPLSY5Q== Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3vrcdpjnwj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Sun, 21 Jan 2024 08:43:43 -0800 (PST) Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Sun, 21 Jan 2024 08:43:42 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Sun, 21 Jan 2024 08:43:42 -0800 Received: from MININT-80QBFE8.corp.innovium.com (MININT-80QBFE8.marvell.com [10.28.164.106]) by maili.marvell.com (Postfix) with ESMTP id C50013F7050; Sun, 21 Jan 2024 08:43:39 -0800 (PST) From: To: , Ruifeng Wang , Vamsi Attunuru CC: , Pavan Nikhilesh Subject: [PATCH v3 2/2] net/octeon_ep: add Rx NEON routine Date: Sun, 21 Jan 2024 22:13:34 +0530 Message-ID: <20240121164334.9269-2-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240121164334.9269-1-pbhagavatula@marvell.com> References: <20240121164334.9269-1-pbhagavatula@marvell.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Proofpoint-ORIG-GUID: 7Gii9rpPo8yVy27glmdybgdB7tJnFfQr X-Proofpoint-GUID: 7Gii9rpPo8yVy27glmdybgdB7tJnFfQr X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-01-20_06,2024-01-19_02,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Add Rx ARM NEON SIMD routine. Signed-off-by: Pavan Nikhilesh --- drivers/net/octeon_ep/cnxk_ep_rx_neon.c | 140 ++++++++++++++++++++++++ drivers/net/octeon_ep/meson.build | 6 +- drivers/net/octeon_ep/otx_ep_ethdev.c | 5 +- drivers/net/octeon_ep/otx_ep_rxtx.h | 6 + 4 files changed, 155 insertions(+), 2 deletions(-) create mode 100644 drivers/net/octeon_ep/cnxk_ep_rx_neon.c diff --git a/drivers/net/octeon_ep/cnxk_ep_rx_neon.c b/drivers/net/octeon_ep/cnxk_ep_rx_neon.c new file mode 100644 index 0000000000..b13a5897f9 --- /dev/null +++ b/drivers/net/octeon_ep/cnxk_ep_rx_neon.c @@ -0,0 +1,140 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Marvell. + */ + +#include "cnxk_ep_rx.h" + +static __rte_always_inline void +cnxk_ep_process_pkts_vec_neon(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, + uint16_t new_pkts) +{ + struct rte_mbuf **recv_buf_list = droq->recv_buf_list; + uint32_t pidx0, pidx1, pidx2, pidx3; + struct rte_mbuf *m0, *m1, *m2, *m3; + uint32_t read_idx = droq->read_idx; + uint16_t nb_desc = droq->nb_desc; + uint32_t idx0, idx1, idx2, idx3; + uint32x4_t bytes; + uint16_t pkts = 0; + + idx0 = read_idx; + bytes = vdupq_n_u32(0); + while (pkts < new_pkts) { + const uint8x16_t mask0 = {0, 1, 0xff, 0xff, 0, 1, 0xff, 0xff, + 4, 5, 0xff, 0xff, 4, 5, 0xff, 0xff}; + const uint8x16_t mask1 = {8, 9, 0xff, 0xff, 8, 9, 0xff, 0xff, + 12, 13, 0xff, 0xff, 12, 13, 0xff, 0xff}; + uint64x2_t s01, s23; + + idx1 = otx_ep_incr_index(idx0, 1, nb_desc); + idx2 = otx_ep_incr_index(idx1, 1, nb_desc); + idx3 = otx_ep_incr_index(idx2, 1, nb_desc); + + if (new_pkts - pkts > 4) { + pidx0 = otx_ep_incr_index(idx3, 1, nb_desc); + pidx1 = otx_ep_incr_index(pidx0, 1, nb_desc); + pidx2 = otx_ep_incr_index(pidx1, 1, nb_desc); + pidx3 = otx_ep_incr_index(pidx2, 1, nb_desc); + + rte_prefetch_non_temporal(cnxk_pktmbuf_mtod(recv_buf_list[pidx0], void *)); + rte_prefetch_non_temporal(cnxk_pktmbuf_mtod(recv_buf_list[pidx1], void *)); + rte_prefetch_non_temporal(cnxk_pktmbuf_mtod(recv_buf_list[pidx2], void *)); + rte_prefetch_non_temporal(cnxk_pktmbuf_mtod(recv_buf_list[pidx3], void *)); + } + + m0 = recv_buf_list[idx0]; + m1 = recv_buf_list[idx1]; + m2 = recv_buf_list[idx2]; + m3 = recv_buf_list[idx3]; + + /* Load packet size big-endian. */ + s01 = vsetq_lane_u32(cnxk_pktmbuf_mtod(m0, struct otx_ep_droq_info *)->length >> 48, + s01, 0); + s01 = vsetq_lane_u32(cnxk_pktmbuf_mtod(m1, struct otx_ep_droq_info *)->length >> 48, + s01, 1); + s01 = vsetq_lane_u32(cnxk_pktmbuf_mtod(m2, struct otx_ep_droq_info *)->length >> 48, + s01, 2); + s01 = vsetq_lane_u32(cnxk_pktmbuf_mtod(m3, struct otx_ep_droq_info *)->length >> 48, + s01, 3); + /* Convert to little-endian. */ + s01 = vrev16q_u8(s01); + + /* Vertical add, consolidate outside the loop. */ + bytes += vaddq_u32(bytes, s01); + /* Segregate to packet length and data length. */ + s23 = vqtbl1q_u8(s01, mask1); + s01 = vqtbl1q_u8(s01, mask0); + + /* Store packet length and data length to mbuf. */ + *(uint64_t *)&m0->pkt_len = vgetq_lane_u64(s01, 0); + *(uint64_t *)&m1->pkt_len = vgetq_lane_u64(s01, 1); + *(uint64_t *)&m2->pkt_len = vgetq_lane_u64(s23, 0); + *(uint64_t *)&m3->pkt_len = vgetq_lane_u64(s23, 1); + + /* Reset rearm data. */ + *(uint64_t *)&m0->rearm_data = droq->rearm_data; + *(uint64_t *)&m1->rearm_data = droq->rearm_data; + *(uint64_t *)&m2->rearm_data = droq->rearm_data; + *(uint64_t *)&m3->rearm_data = droq->rearm_data; + + rx_pkts[pkts++] = m0; + rx_pkts[pkts++] = m1; + rx_pkts[pkts++] = m2; + rx_pkts[pkts++] = m3; + idx0 = otx_ep_incr_index(idx3, 1, nb_desc); + } + droq->read_idx = idx0; + + droq->refill_count += new_pkts; + droq->pkts_pending -= new_pkts; + /* Stats */ + droq->stats.pkts_received += new_pkts; + droq->stats.bytes_received += vaddvq_u32(bytes); +} + +uint16_t __rte_noinline __rte_hot +cnxk_ep_recv_pkts_neon(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) + cnxk_ep_rx_refill(droq); + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_SSE); + cnxk_ep_process_pkts_vec_neon(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + return new_pkts; +} + +uint16_t __rte_noinline __rte_hot +cn9k_ep_recv_pkts_neon(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) { + cnxk_ep_rx_refill(droq); + } else { + /* SDP output goes into DROP state when output doorbell count + * goes below drop count. When door bell count is written with + * a value greater than drop count SDP output should come out + * of DROP state. Due to a race condition this is not happening. + * Writing doorbell register with 0 again may make SDP output + * come out of this state. + */ + + rte_write32(0, droq->pkts_credit_reg); + } + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_SSE); + cnxk_ep_process_pkts_vec_neon(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + return new_pkts; +} diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build index e8ae56018d..d5d40b23a1 100644 --- a/drivers/net/octeon_ep/meson.build +++ b/drivers/net/octeon_ep/meson.build @@ -29,7 +29,11 @@ if arch_subdir == 'x86' endif endif -extra_flags = ['-Wno-strict-aliasing'] +if arch_subdir == 'arm' + sources += files('cnxk_ep_rx_neon.c') +endif + +extra_flags = ['-Wno-strict-aliasing', '-flax-vector-conversions'] foreach flag: extra_flags if cc.has_argument(flag) cflags += flag diff --git a/drivers/net/octeon_ep/otx_ep_ethdev.c b/drivers/net/octeon_ep/otx_ep_ethdev.c index 42a97ea110..8daa7d225c 100644 --- a/drivers/net/octeon_ep/otx_ep_ethdev.c +++ b/drivers/net/octeon_ep/otx_ep_ethdev.c @@ -59,6 +59,8 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_avx; #endif +#elif defined(RTE_ARCH_ARM64) + eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_neon; #endif if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_mseg; @@ -71,8 +73,9 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_avx; #endif +#elif defined(RTE_ARCH_ARM64) + eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_neon; #endif - if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_mseg; } else { diff --git a/drivers/net/octeon_ep/otx_ep_rxtx.h b/drivers/net/octeon_ep/otx_ep_rxtx.h index 8f306bd94e..f5bc807dc0 100644 --- a/drivers/net/octeon_ep/otx_ep_rxtx.h +++ b/drivers/net/octeon_ep/otx_ep_rxtx.h @@ -60,12 +60,18 @@ cnxk_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budge uint16_t cn9k_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cnxk_ep_recv_pkts_neon(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); + uint16_t cn9k_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); uint16_t cn9k_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cn9k_ep_recv_pkts_neon(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); + uint16_t cn9k_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); #endif /* _OTX_EP_RXTX_H_ */ -- 2.25.1