From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6899FA034C; Thu, 24 Feb 2022 19:39:46 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7D1944114E; Thu, 24 Feb 2022 19:39:45 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id F2F754114D for ; Thu, 24 Feb 2022 19:39:43 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 21OHMmYL020124 for ; Thu, 24 Feb 2022 10:39:43 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=1VYZfrRPiQZra12GAKccgB4Nl9wAizoFBiSLyWdQcFQ=; b=i9a4qJa5OMGquVoTJESJ+ZPEvbNLEW2k+v8Up5T4Yn+vlYJye+yNyQbjiciARh2eMPcw cz3TP6jhv6HjBrQSIBN+IR0DEGTB1P7ADVkV94Uavc5mzg7V3QC+ZbIQg3vYBacwCZ5j sviKWXgF2+AbpTdwxUUFruOuAX5MqoLIlz+ltoPY3fNeV0dNkfv+EWUTRj1+8shwk0ob uqdKV9mwP8DqIhVHl7ED28aTB8AZbmvoAqghee1ZwtkhufR6j4VAOoKmkp5ifGj6bAlH HjJy6ulwjMOnqsaUvgpVuktn4sznYnqQT7XZHVKclW0oC4HneuIP2luF8Z0BimRDAse0 Ew== Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3ee5tptyxa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Thu, 24 Feb 2022 10:39:43 -0800 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Thu, 24 Feb 2022 10:39:41 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 24 Feb 2022 10:39:41 -0800 Received: from jerin-lab.marvell.com (jerin-lab.marvell.com [10.28.34.14]) by maili.marvell.com (Postfix) with ESMTP id 8D08B5B6934; Thu, 24 Feb 2022 10:39:38 -0800 (PST) From: To: , Pavan Nikhilesh , "Shijith Thotton" , Nithin Dabilpuram , Kiran Kumar K , Sunil Kumar Kori , Satha Rao CC: Jerin Jacob Subject: [dpdk-dev] [PATCH v3 2/2] net/cnxk: align perfetchs to CN10K cache model Date: Fri, 25 Feb 2022 00:10:39 +0530 Message-ID: <20220224184039.786663-2-jerinj@marvell.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220224184039.786663-1-jerinj@marvell.com> References: <20220224161013.4566-1-pbhagavatula@marvell.com> <20220224184039.786663-1-jerinj@marvell.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Proofpoint-ORIG-GUID: L8QGMYNFuHkKgi9aHoNK4vbwUcoQleQd X-Proofpoint-GUID: L8QGMYNFuHkKgi9aHoNK4vbwUcoQleQd X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.64.514 definitions=2022-02-24_04,2022-02-24_01,2022-02-23_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Align perfetchs for CN10K cache model for vWQE in Rx and Tx. Move mbuf->next NULL assignment to Tx path and enabled it only when multi segments offload is enabled to reduce L1 pressure. Add macros to detect corrupted mbuf->next values when MEMPOOL_DEBUG is set. Signed-off-by: Pavan Nikhilesh Acked-by: Jerin Jacob --- drivers/event/cnxk/cn10k_worker.h | 13 ++-- drivers/net/cnxk/cn10k_rx.h | 111 ++++++++++++++++++++++++------ drivers/net/cnxk/cn10k_tx.h | 7 ++ 3 files changed, 104 insertions(+), 27 deletions(-) diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h index d288c66cac..a827a1e422 100644 --- a/drivers/event/cnxk/cn10k_worker.h +++ b/drivers/event/cnxk/cn10k_worker.h @@ -118,23 +118,23 @@ cn10k_process_vwqe(uintptr_t vwqe, uint16_t port_id, const uint32_t flags, uint64_t aura_handle, laddr; uint16_t nb_mbufs, non_vec; uint16_t lmt_id, d_off; + struct rte_mbuf **wqe; struct rte_mbuf *mbuf; uint8_t loff = 0; uint64_t sa_base; - uint64_t **wqe; int i; mbuf_init |= ((uint64_t)port_id) << 48; vec = (struct rte_event_vector *)vwqe; - wqe = vec->u64s; + wqe = vec->mbufs; - rte_prefetch_non_temporal(&vec->ptrs[0]); + rte_prefetch0(&vec->ptrs[0]); #define OBJS_PER_CLINE (RTE_CACHE_LINE_SIZE / sizeof(void *)) for (i = OBJS_PER_CLINE; i < vec->nb_elem; i += OBJS_PER_CLINE) - rte_prefetch_non_temporal(&vec->ptrs[i]); + rte_prefetch0(&vec->ptrs[i]); nb_mbufs = RTE_ALIGN_FLOOR(vec->nb_elem, NIX_DESCS_PER_LOOP); - nb_mbufs = cn10k_nix_recv_pkts_vector(&mbuf_init, vec->mbufs, nb_mbufs, + nb_mbufs = cn10k_nix_recv_pkts_vector(&mbuf_init, wqe, nb_mbufs, flags | NIX_RX_VWQE_F, lookup_mem, tstamp, lbase); wqe += nb_mbufs; @@ -182,7 +182,7 @@ cn10k_process_vwqe(uintptr_t vwqe, uint16_t port_id, const uint32_t flags, cn10k_nix_mbuf_to_tstamp((struct rte_mbuf *)mbuf, tstamp, flags & NIX_RX_OFFLOAD_TSTAMP_F, (uint64_t *)tstamp_ptr); - wqe[0] = (uint64_t *)mbuf; + wqe[0] = (struct rte_mbuf *)mbuf; non_vec--; wqe++; } @@ -612,6 +612,7 @@ cn10k_sso_hws_event_tx(struct cn10k_sso_hws *ws, struct rte_event *ev, ev->sched_type, txq_data, flags); } rte_mempool_put(rte_mempool_from_obj(ev->vec), ev->vec); + rte_prefetch0(ws); return (meta & 0xFFFF); } diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h index 236a1dca6e..de5e41483b 100644 --- a/drivers/net/cnxk/cn10k_rx.h +++ b/drivers/net/cnxk/cn10k_rx.h @@ -36,6 +36,27 @@ (((f) & NIX_RX_VWQE_F) ? \ (uint64_t *)(((uintptr_t)((uint64_t *)(b))[i]) + (o)) : \ (uint64_t *)(((uintptr_t)(b)) + CQE_SZ(i) + (o))) +#define CQE_PTR_DIFF(b, i, o, f) \ + (((f) & NIX_RX_VWQE_F) ? \ + (uint64_t *)(((uintptr_t)((uint64_t *)(b))[i]) - (o)) : \ + (uint64_t *)(((uintptr_t)(b)) + CQE_SZ(i) - (o))) + +#ifdef RTE_LIBRTE_MEMPOOL_DEBUG +static inline void +nix_mbuf_validate_next(struct rte_mbuf *m) +{ + if (m->nb_segs == 1 && m->next) { + rte_panic("mbuf->next[%p] valid when mbuf->nb_segs is %d", + m->next, m->nb_segs); + } +} +#else +static inline void +nix_mbuf_validate_next(struct rte_mbuf *m) +{ + RTE_SET_USED(m); +} +#endif union mbuf_initializer { struct { @@ -674,17 +695,66 @@ cn10k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, cq0 = (uintptr_t)&mbufs[packets]; } - /* Prefetch N desc ahead */ - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 4, 64, flags)); - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 5, 64, flags)); - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 6, 64, flags)); - rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 7, 64, flags)); + if (flags & NIX_RX_VWQE_F) { + if (pkts - packets > 4) { + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, + 4, 0, flags)); + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, + 5, 0, flags)); + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, + 6, 0, flags)); + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, + 7, 0, flags)); - /* Get NIX_RX_SG_S for size and buffer pointer */ - cq0_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 0, 64, flags)); - cq1_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 1, 64, flags)); - cq2_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 2, 64, flags)); - cq3_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 3, 64, flags)); + if (likely(pkts - packets > 8)) { + rte_prefetch1(CQE_PTR_OFF(cq0, + 8, 0, flags)); + rte_prefetch1(CQE_PTR_OFF(cq0, + 9, 0, flags)); + rte_prefetch1(CQE_PTR_OFF(cq0, + 10, 0, flags)); + rte_prefetch1(CQE_PTR_OFF(cq0, + 11, 0, flags)); + if (pkts - packets > 12) { + rte_prefetch1(CQE_PTR_OFF(cq0, + 12, 0, flags)); + rte_prefetch1(CQE_PTR_OFF(cq0, + 13, 0, flags)); + rte_prefetch1(CQE_PTR_OFF(cq0, + 14, 0, flags)); + rte_prefetch1(CQE_PTR_OFF(cq0, + 15, 0, flags)); + } + } + + rte_prefetch0(CQE_PTR_DIFF(cq0, + 4, RTE_PKTMBUF_HEADROOM, flags)); + rte_prefetch0(CQE_PTR_DIFF(cq0, + 5, RTE_PKTMBUF_HEADROOM, flags)); + rte_prefetch0(CQE_PTR_DIFF(cq0, + 6, RTE_PKTMBUF_HEADROOM, flags)); + rte_prefetch0(CQE_PTR_DIFF(cq0, + 7, RTE_PKTMBUF_HEADROOM, flags)); + + if (likely(pkts - packets > 8)) { + rte_prefetch0(CQE_PTR_DIFF(cq0, + 8, RTE_PKTMBUF_HEADROOM, flags)); + rte_prefetch0(CQE_PTR_DIFF(cq0, + 9, RTE_PKTMBUF_HEADROOM, flags)); + rte_prefetch0(CQE_PTR_DIFF(cq0, + 10, RTE_PKTMBUF_HEADROOM, flags)); + rte_prefetch0(CQE_PTR_DIFF(cq0, + 11, RTE_PKTMBUF_HEADROOM, flags)); + } + } + } else { + if (pkts - packets > 4) { + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 4, 64, flags)); + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 5, 64, flags)); + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 6, 64, flags)); + rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 7, 64, flags)); + } + } if (!(flags & NIX_RX_VWQE_F)) { /* Get NIX_RX_SG_S for size and buffer pointer */ @@ -995,19 +1065,18 @@ cn10k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, nix_cqe_xtract_mseg((union nix_rx_parse_u *) (CQE_PTR_OFF(cq0, 3, 8, flags)), mbuf3, mbuf_initializer, flags); - } else { - /* Update that no more segments */ - mbuf0->next = NULL; - mbuf1->next = NULL; - mbuf2->next = NULL; - mbuf3->next = NULL; } - /* Prefetch mbufs */ - roc_prefetch_store_keep(mbuf0); - roc_prefetch_store_keep(mbuf1); - roc_prefetch_store_keep(mbuf2); - roc_prefetch_store_keep(mbuf3); + /* Mark mempool obj as "get" as it is alloc'ed by NIX */ + RTE_MEMPOOL_CHECK_COOKIES(mbuf0->pool, (void **)&mbuf0, 1, 1); + RTE_MEMPOOL_CHECK_COOKIES(mbuf1->pool, (void **)&mbuf1, 1, 1); + RTE_MEMPOOL_CHECK_COOKIES(mbuf2->pool, (void **)&mbuf2, 1, 1); + RTE_MEMPOOL_CHECK_COOKIES(mbuf3->pool, (void **)&mbuf3, 1, 1); + + nix_mbuf_validate_next(mbuf0); + nix_mbuf_validate_next(mbuf1); + nix_mbuf_validate_next(mbuf2); + nix_mbuf_validate_next(mbuf3); packets += NIX_DESCS_PER_LOOP; diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index ec6366168c..695e3ed354 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -2569,6 +2569,13 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, lnum += 1; } + if (flags & NIX_TX_MULTI_SEG_F) { + tx_pkts[0]->next = NULL; + tx_pkts[1]->next = NULL; + tx_pkts[2]->next = NULL; + tx_pkts[3]->next = NULL; + } + tx_pkts = tx_pkts + NIX_DESCS_PER_LOOP; } -- 2.35.1