From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7EB42A034C; Thu, 24 Feb 2022 17:10:29 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id ADDFC41160; Thu, 24 Feb 2022 17:10:27 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id 6366E4115B for ; Thu, 24 Feb 2022 17:10:26 +0100 (CET) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 21OEsvNd008424 for ; Thu, 24 Feb 2022 08:10:25 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=eYUwMWlQpBlW+Brj7k+o/rXPS+Jy7rbbwYkRVrxpyQE=; b=Uef7kTwE/p9Aeln4FZtyGVUVszjLxkKI45j5CfxjA2UWuY0tj0NHBG6hXL76b06MHHcP 0ogURNHk4Sw4GMt7+LZUidDejTyAKPrd/iQ6PJ2CgusWQxtOPGbNbvFsihSR51gypXpe tLfUy5peykVTDFf6rA92XddbYAjx2tAaCzh+JUxWVx1Xkt4fFIf1ANIRpE1Or8rbvnZI WoRwXsmQW1CO/i9SQuW82tTHEIDZL7fDL0WsVvmr62fWpG8bdY3sotE9BI79rqtXShWu 1Q8slxq6+s8GCnlCGquBXrTCkGB0Ydpg1O0/kku5iNcPHUbjNGMkFOnSu6RqHoG7283v hw== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0b-0016f401.pphosted.com (PPS) with ESMTPS id 3edjerqg1d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Thu, 24 Feb 2022 08:10:25 -0800 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 24 Feb 2022 08:10:23 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.18 via Frontend Transport; Thu, 24 Feb 2022 08:10:23 -0800 Received: from BG-LT7430.marvell.com (unknown [10.193.70.86]) by maili.marvell.com (Postfix) with ESMTP id B0FEA5B6939; Thu, 24 Feb 2022 08:10:20 -0800 (PST) From: To: , Pavan Nikhilesh , "Shijith Thotton" , Nithin Dabilpuram , Kiran Kumar K , Sunil Kumar Kori , Satha Rao CC: Subject: [PATCH v2 2/2] net/cnxk: align perfetchs to CN10K cache model Date: Thu, 24 Feb 2022 21:40:12 +0530 Message-ID: <20220224161013.4566-2-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220224161013.4566-1-pbhagavatula@marvell.com> References: <20220224135243.4233-1-pbhagavatula@marvell.com> <20220224161013.4566-1-pbhagavatula@marvell.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Proofpoint-ORIG-GUID: oxDaEkGpLp6Ap3eDftMxIJrn6GQjW4i8 X-Proofpoint-GUID: oxDaEkGpLp6Ap3eDftMxIJrn6GQjW4i8 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.64.514 definitions=2022-02-24_03,2022-02-24_01,2022-02-23_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Align perfetchs for CN10K cache model for vWQE in Rx and Tx. Move mbuf->next NULL assignment to Tx path and enabled it only when multi segments offload is enabled to reduce L1 pressure. Add macros to detect corrupted mbuf->next values when MEMPOOL_DEBUG is set. Signed-off-by: Pavan Nikhilesh --- drivers/event/cnxk/cn10k_worker.h | 13 ++-- drivers/net/cnxk/cn10k_rx.h | 115 ++++++++++++++++++++++++------ drivers/net/cnxk/cn10k_tx.h | 7 ++ 3 files changed, 107 insertions(+), 28 deletions(-) diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h index d288c66cac..a827a1e422 100644 --- a/drivers/event/cnxk/cn10k_worker.h +++ b/drivers/event/cnxk/cn10k_worker.h @@ -118,23 +118,23 @@ cn10k_process_vwqe(uintptr_t vwqe, uint16_t port_id, const uint32_t flags, uint64_t aura_handle, laddr; uint16_t nb_mbufs, non_vec; uint16_t lmt_id, d_off; + struct rte_mbuf **wqe; struct rte_mbuf *mbuf; uint8_t loff = 0; uint64_t sa_base; - uint64_t **wqe; int i; mbuf_init |= ((uint64_t)port_id) << 48; vec = (struct rte_event_vector *)vwqe; - wqe = vec->u64s; + wqe = vec->mbufs; - rte_prefetch_non_temporal(&vec->ptrs[0]); + rte_prefetch0(&vec->ptrs[0]); #define OBJS_PER_CLINE (RTE_CACHE_LINE_SIZE / sizeof(void *)) for (i = OBJS_PER_CLINE; i < vec->nb_elem; i += OBJS_PER_CLINE) - rte_prefetch_non_temporal(&vec->ptrs[i]); + rte_prefetch0(&vec->ptrs[i]); nb_mbufs = RTE_ALIGN_FLOOR(vec->nb_elem, NIX_DESCS_PER_LOOP); - nb_mbufs = cn10k_nix_recv_pkts_vector(&mbuf_init, vec->mbufs, nb_mbufs, + nb_mbufs = cn10k_nix_recv_pkts_vector(&mbuf_init, wqe, nb_mbufs, flags | NIX_RX_VWQE_F, lookup_mem, tstamp, lbase); wqe += nb_mbufs; @@ -182,7 +182,7 @@ cn10k_process_vwqe(uintptr_t vwqe, uint16_t port_id, const uint32_t flags, cn10k_nix_mbuf_to_tstamp((struct rte_mbuf *)mbuf, tstamp, flags & NIX_RX_OFFLOAD_TSTAMP_F, (uint64_t *)tstamp_ptr); - wqe[0] = (uint64_t *)mbuf; + wqe[0] = (struct rte_mbuf *)mbuf; non_vec--; wqe++; } @@ -612,6 +612,7 @@ cn10k_sso_hws_event_tx(struct cn10k_sso_hws *ws, struct rte_event *ev, ev->sched_type, txq_data, flags); } rte_mempool_put(rte_mempool_from_obj(ev->vec), ev->vec); + rte_prefetch0(ws); return (meta & 0xFFFF); } diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h index 65a08e379b..66a35c69f9 100644 --- a/drivers/net/cnxk/cn10k_rx.h +++ b/drivers/net/cnxk/cn10k_rx.h @@ -36,6 +36,22 @@ (((f) & NIX_RX_VWQE_F) ? \ (uint64_t *)(((uintptr_t)((uint64_t *)(b))[i]) + (o)) : \ (uint64_t *)(((uintptr_t)(b)) + CQE_SZ(i) + (o))) +#define CQE_PTR_DIFF(b, i, o, f) \ + (((f) & NIX_RX_VWQE_F) ? \ + (uint64_t *)(((uintptr_t)((uint64_t *)(b))[i]) - (o)) : \ + (uint64_t *)(((uintptr_t)(b)) + CQE_SZ(i) - (o))) + +#ifdef RTE_LIBRTE_MEMPOOL_DEBUG +#define NIX_MBUF_VALIDATE_NEXT(m) \ + if (m->nb_segs == 1 && mbuf->next) { \ + rte_panic("mbuf->next[%p] valid when mbuf->nb_segs is %d", \ + m->next, m->nb_segs); \ + } +#else +#define NIX_MBUF_VALIDATE_NEXT(m) \ + do { \ + } while (0) +#endif union mbuf_initializer { struct { @@ -674,17 +690,73 @@ cn10k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, cq0 = (uintptr_t)&mbufs[packets]; } - /* Prefetch N desc ahead */ - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 4, 64, flags)); - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 5, 64, flags)); - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 6, 64, flags)); - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 7, 64, flags)); - - /* Get NIX_RX_SG_S for size and buffer pointer */ - cq0_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 0, 64, flags)); - cq1_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 1, 64, flags)); - cq2_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 2, 64, flags)); - cq3_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 3, 64, flags)); + if (flags & NIX_RX_VWQE_F) { + if (pkts - packets > 4) { + rte_prefetch_non_temporal( + CQE_PTR_OFF(cq0, 4, 0, flags)); + rte_prefetch_non_temporal( + CQE_PTR_OFF(cq0, 5, 0, flags)); + rte_prefetch_non_temporal( + CQE_PTR_OFF(cq0, 6, 0, flags)); + rte_prefetch_non_temporal( + CQE_PTR_OFF(cq0, 7, 0, flags)); + + if (likely(pkts - packets > 8)) { + rte_prefetch1( + CQE_PTR_OFF(cq0, 8, 0, flags)); + rte_prefetch1( + CQE_PTR_OFF(cq0, 9, 0, flags)); + rte_prefetch1( + CQE_PTR_OFF(cq0, 10, 0, flags)); + rte_prefetch1( + CQE_PTR_OFF(cq0, 11, 0, flags)); + if (pkts - packets > 12) { + rte_prefetch1(CQE_PTR_OFF( + cq0, 12, 0, flags)); + rte_prefetch1(CQE_PTR_OFF( + cq0, 13, 0, flags)); + rte_prefetch1(CQE_PTR_OFF( + cq0, 14, 0, flags)); + rte_prefetch1(CQE_PTR_OFF( + cq0, 15, 0, flags)); + } + } + + rte_prefetch0(CQE_PTR_DIFF( + cq0, 4, RTE_PKTMBUF_HEADROOM, flags)); + rte_prefetch0(CQE_PTR_DIFF( + cq0, 5, RTE_PKTMBUF_HEADROOM, flags)); + rte_prefetch0(CQE_PTR_DIFF( + cq0, 6, RTE_PKTMBUF_HEADROOM, flags)); + rte_prefetch0(CQE_PTR_DIFF( + cq0, 7, RTE_PKTMBUF_HEADROOM, flags)); + if (likely(pkts - packets > 8)) { + rte_prefetch0(CQE_PTR_DIFF( + cq0, 8, RTE_PKTMBUF_HEADROOM, + flags)); + rte_prefetch0(CQE_PTR_DIFF( + cq0, 9, RTE_PKTMBUF_HEADROOM, + flags)); + rte_prefetch0(CQE_PTR_DIFF( + cq0, 10, RTE_PKTMBUF_HEADROOM, + flags)); + rte_prefetch0(CQE_PTR_DIFF( + cq0, 11, RTE_PKTMBUF_HEADROOM, + flags)); + } + } + } else { + if (pkts - packets > 4) { + rte_prefetch_non_temporal( + CQE_PTR_OFF(cq0, 4, 64, flags)); + rte_prefetch_non_temporal( + CQE_PTR_OFF(cq0, 5, 64, flags)); + rte_prefetch_non_temporal( + CQE_PTR_OFF(cq0, 6, 64, flags)); + rte_prefetch_non_temporal( + CQE_PTR_OFF(cq0, 7, 64, flags)); + } + } if (!(flags & NIX_RX_VWQE_F)) { /* Get NIX_RX_SG_S for size and buffer pointer */ @@ -997,19 +1069,18 @@ cn10k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, nix_cqe_xtract_mseg((union nix_rx_parse_u *) (CQE_PTR_OFF(cq0, 3, 8, flags)), mbuf3, mbuf_initializer, flags); - } else { - /* Update that no more segments */ - mbuf0->next = NULL; - mbuf1->next = NULL; - mbuf2->next = NULL; - mbuf3->next = NULL; } - /* Prefetch mbufs */ - roc_prefetch_store_keep(mbuf0); - roc_prefetch_store_keep(mbuf1); - roc_prefetch_store_keep(mbuf2); - roc_prefetch_store_keep(mbuf3); + /* Mark mempool obj as "get" as it is alloc'ed by NIX */ + RTE_MEMPOOL_CHECK_COOKIES(mbuf0->pool, (void **)&mbuf0, 1, 1); + RTE_MEMPOOL_CHECK_COOKIES(mbuf1->pool, (void **)&mbuf1, 1, 1); + RTE_MEMPOOL_CHECK_COOKIES(mbuf2->pool, (void **)&mbuf2, 1, 1); + RTE_MEMPOOL_CHECK_COOKIES(mbuf3->pool, (void **)&mbuf3, 1, 1); + + NIX_MBUF_VALIDATE_NEXT(mbuf0); + NIX_MBUF_VALIDATE_NEXT(mbuf1); + NIX_MBUF_VALIDATE_NEXT(mbuf2); + NIX_MBUF_VALIDATE_NEXT(mbuf3); packets += NIX_DESCS_PER_LOOP; diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index ec6366168c..695e3ed354 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -2569,6 +2569,13 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, lnum += 1; } + if (flags & NIX_TX_MULTI_SEG_F) { + tx_pkts[0]->next = NULL; + tx_pkts[1]->next = NULL; + tx_pkts[2]->next = NULL; + tx_pkts[3]->next = NULL; + } + tx_pkts = tx_pkts + NIX_DESCS_PER_LOOP; } -- 2.17.1