From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E70A5A034D; Mon, 13 Dec 2021 22:15:45 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0FF9B41145; Mon, 13 Dec 2021 22:15:36 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 55DFC406FF for ; Mon, 13 Dec 2021 22:15:34 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 1BDElBZE029954 for ; Mon, 13 Dec 2021 13:15:33 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=KaKwfhxut6Yb5B9aWOttwwKLW4/M5L6qWTIes774Mh4=; b=e3O7NXeDFs6XKRsbOtmyqVkcxd6WVCQvCBTl2w0O6gHHNgBgoI0KpFBLjCvdePdd9iMQ UXiCYFD9D6yltpsQ6PrLS+uCFvK2V5LPbqOQzOBdBHSHy9/0qtgaFa0kwJKL5cmfYP/+ Ow+IZ2ax8ERgd+OXkFphwLv7fMQSOtAfp6mXwm8Yl/De46PpXoNqwea3bhE1bOjntQRm uZETAGbgpArLUYYGYC60DqCWZinlkTrwsE6TsG1avqBzXLjHWyx+lKhCWBSVtfJKsP6R fYl0DOy0IW78c9wuG1QfSRKO0q0qERGioDtJtiMUyIaK3eltrA49RmzE9bxsbP/PWDLQ zg== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3cx88ahnnb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Mon, 13 Dec 2021 13:15:33 -0800 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 13 Dec 2021 13:15:32 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Mon, 13 Dec 2021 13:15:32 -0800 Received: from BG-LT7430.marvell.com (BG-LT7430.marvell.com [10.28.177.176]) by maili.marvell.com (Postfix) with ESMTP id 711613F7079; Mon, 13 Dec 2021 13:15:29 -0800 (PST) From: To: , Pavan Nikhilesh , "Shijith Thotton" , Nithin Dabilpuram , Kiran Kumar K , Sunil Kumar Kori , Satha Rao CC: Subject: [PATCH 4/4] net/cnxk: improve Rx performance Date: Tue, 14 Dec 2021 02:44:24 +0530 Message-ID: <20211213211425.6332-4-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211213211425.6332-1-pbhagavatula@marvell.com> References: <20211213211425.6332-1-pbhagavatula@marvell.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Proofpoint-GUID: Eebs81Wx_tLSNB9ypSHy9Vp_-mzhD2J2 X-Proofpoint-ORIG-GUID: Eebs81Wx_tLSNB9ypSHy9Vp_-mzhD2J2 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2021-12-13_10,2021-12-13_01,2021-12-02_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Improve vWQE and CQ Rx performance by tuning perfetches to 64B cacheline size. Also, prefetch the vWQE array offsets at cacheline boundaries. Signed-off-by: Pavan Nikhilesh --- drivers/event/cnxk/cn10k_worker.h | 25 +++++++++++++++---------- drivers/net/cnxk/cn10k_rx.h | 8 ++++---- drivers/net/cnxk/cn9k_rx.h | 20 ++++++++++---------- 3 files changed, 29 insertions(+), 24 deletions(-) diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h index 65602a632e..6e77d32827 100644 --- a/drivers/event/cnxk/cn10k_worker.h +++ b/drivers/event/cnxk/cn10k_worker.h @@ -118,11 +118,17 @@ cn10k_process_vwqe(uintptr_t vwqe, uint16_t port_id, const uint32_t flags, uint8_t loff = 0; uint64_t sa_base; uint64_t **wqe; + int i; mbuf_init |= ((uint64_t)port_id) << 48; vec = (struct rte_event_vector *)vwqe; wqe = vec->u64s; + rte_prefetch_non_temporal(&vec->ptrs[0]); +#define OBJS_PER_CLINE (RTE_CACHE_LINE_SIZE / sizeof(void *)) + for (i = OBJS_PER_CLINE; i < vec->nb_elem; i += OBJS_PER_CLINE) + rte_prefetch_non_temporal(&vec->ptrs[i]); + nb_mbufs = RTE_ALIGN_FLOOR(vec->nb_elem, NIX_DESCS_PER_LOOP); nb_mbufs = cn10k_nix_recv_pkts_vector(&mbuf_init, vec->mbufs, nb_mbufs, flags | NIX_RX_VWQE_F, lookup_mem, @@ -191,15 +197,13 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev, uint64_t u64[2]; } gw; uint64_t tstamp_ptr; - uint64_t mbuf; gw.get_work = ws->gw_wdata; #if defined(RTE_ARCH_ARM64) && !defined(__clang__) asm volatile( PLT_CPU_FEATURE_PREAMBLE - "caspl %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n" - "sub %[mbuf], %H[wdata], #0x80 \n" - : [wdata] "+r"(gw.get_work), [mbuf] "=&r"(mbuf) + "caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n" + : [wdata] "+r"(gw.get_work) : [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) : "memory"); #else @@ -208,14 +212,12 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev, roc_load_pair(gw.u64[0], gw.u64[1], ws->base + SSOW_LF_GWS_WQE0); } while (gw.u64[0] & BIT_ULL(63)); - mbuf = (uint64_t)((char *)gw.u64[1] - sizeof(struct rte_mbuf)); #endif ws->gw_rdata = gw.u64[0]; - gw.u64[0] = (gw.u64[0] & (0x3ull << 32)) << 6 | - (gw.u64[0] & (0x3FFull << 36)) << 4 | - (gw.u64[0] & 0xffffffff); - - if (CNXK_TT_FROM_EVENT(gw.u64[0]) != SSO_TT_EMPTY) { + if (gw.u64[1]) { + gw.u64[0] = (gw.u64[0] & (0x3ull << 32)) << 6 | + (gw.u64[0] & (0x3FFull << 36)) << 4 | + (gw.u64[0] & 0xffffffff); if ((flags & CPT_RX_WQE_F) && (CNXK_EVENT_TYPE_FROM_TAG(gw.u64[0]) == RTE_EVENT_TYPE_CRYPTODEV)) { @@ -223,7 +225,10 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev, } else if (CNXK_EVENT_TYPE_FROM_TAG(gw.u64[0]) == RTE_EVENT_TYPE_ETHDEV) { uint8_t port = CNXK_SUB_EVENT_FROM_TAG(gw.u64[0]); + uint64_t mbuf; + mbuf = gw.u64[1] - sizeof(struct rte_mbuf); + rte_prefetch0((void *)mbuf); if (flags & NIX_RX_OFFLOAD_SECURITY_F) { struct rte_mbuf *m; uintptr_t sa_base; diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h index a2442d3726..9694a3080f 100644 --- a/drivers/net/cnxk/cn10k_rx.h +++ b/drivers/net/cnxk/cn10k_rx.h @@ -610,10 +610,10 @@ cn10k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, } /* Prefetch N desc ahead */ - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 8, 0, flags)); - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 9, 0, flags)); - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 10, 0, flags)); - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 11, 0, flags)); + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 4, 64, flags)); + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 5, 64, flags)); + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 6, 64, flags)); + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 7, 64, flags)); /* Get NIX_RX_SG_S for size and buffer pointer */ cq0_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 0, 64, flags)); diff --git a/drivers/net/cnxk/cn9k_rx.h b/drivers/net/cnxk/cn9k_rx.h index b038b1a6ef..fa4efbf80a 100644 --- a/drivers/net/cnxk/cn9k_rx.h +++ b/drivers/net/cnxk/cn9k_rx.h @@ -342,16 +342,16 @@ cn9k_nix_cqe_to_mbuf(const struct nix_cqe_hdr_s *cq, const uint32_t tag, ol_flags = nix_update_match_id(rx->cn9k.match_id, ol_flags, mbuf); - mbuf->pkt_len = len; - mbuf->data_len = len; - *(uint64_t *)(&mbuf->rearm_data) = val; - mbuf->ol_flags = ol_flags; + *(uint64_t *)(&mbuf->rearm_data) = val; + mbuf->pkt_len = len; - if (flag & NIX_RX_MULTI_SEG_F) + if (flag & NIX_RX_MULTI_SEG_F) { nix_cqe_xtract_mseg(rx, mbuf, val, flag); - else + } else { + mbuf->data_len = len; mbuf->next = NULL; + } } static inline uint16_t @@ -723,10 +723,6 @@ cn9k_nix_recv_pkts_vector(void *rx_queue, struct rte_mbuf **rx_pkts, vst1q_u64((uint64_t *)mbuf2->rearm_data, rearm2); vst1q_u64((uint64_t *)mbuf3->rearm_data, rearm3); - /* Store the mbufs to rx_pkts */ - vst1q_u64((uint64_t *)&rx_pkts[packets], mbuf01); - vst1q_u64((uint64_t *)&rx_pkts[packets + 2], mbuf23); - if (flags & NIX_RX_MULTI_SEG_F) { /* Multi segment is enable build mseg list for * individual mbufs in scalar mode. @@ -751,6 +747,10 @@ cn9k_nix_recv_pkts_vector(void *rx_queue, struct rte_mbuf **rx_pkts, mbuf3->next = NULL; } + /* Store the mbufs to rx_pkts */ + vst1q_u64((uint64_t *)&rx_pkts[packets], mbuf01); + vst1q_u64((uint64_t *)&rx_pkts[packets + 2], mbuf23); + /* Prefetch mbufs */ roc_prefetch_store_keep(mbuf0); roc_prefetch_store_keep(mbuf1); -- 2.17.1