From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 92AD343E57 for ; Sat, 13 Apr 2024 14:54:18 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8D54F40A7D; Sat, 13 Apr 2024 14:54:18 +0200 (CEST) Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2047.outbound.protection.outlook.com [40.107.236.47]) by mails.dpdk.org (Postfix) with ESMTP id 37422400D6 for ; Sat, 13 Apr 2024 14:54:17 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gso/Lad+V1Ym0kqGx+Yx/RuX+tlOjHARdXV2DSSp3o6Ir3nkTnbzYLpticgGJH0dUCASIGMz2ne5E3gbfeNob/+6ccM14qSDP3rJU2m2nSF6SV0DBLVSA2Cvis/rg46wF0T+fNZck+dfDujO60wd3/R842Z9aVCYhSDMprelOvUIxhOSk/Klzl8buw1MBMAGrQY1lPW+Q2A8n9HdaPagT9ZXE39BJZkUOBmz/wIFF9BkZGzrgKiMGY57PoL5kTPiv5Hjb0fItUDtn8avmc+v7QyB3ifJWCpqoB60tq+PWdeU5fEN5LGiC9MpKj6dh8bg/OgQ0UkUO62V7vgyGS9L6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=AqKYwzj7pPW5sE0GKpvQ4oD0liJq5cLAhu7q5kIgnBU=; b=YlmvNa7BYc6pMBUrHd3bfYKjEu42OGP19cbE7ja9xg3ELObuuuE8HHM+vSvMql8G3q7Mb0MnoJ7NJPWfNOhWYHr76T56Kge0l76iVd/n3l0H4bMLFXuEB+xgKdiNv1b5rib0kcQwCGqmVAzUsYI7DJ6st/XbtUz+fOcZLQRCobA6CSwQUe18QGt1LMK41R/aMEuZMsRhRC6JjyMVmZQxry+P7MgghGLTEvhrxLAIDdRnrgEd/aQAMr47vj+Kai28zaFMl5z8Debdt6uUU1aA3sW38/dnupwnuggqZcbH8ug0t4K/CeaqrJkz8vHzLlgE4wkP0FR+bIPjE+krHyBgUA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=marvell.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AqKYwzj7pPW5sE0GKpvQ4oD0liJq5cLAhu7q5kIgnBU=; b=ejD6QVcOvr5ywY4pHkd3NN8M8KX8F/DjF2W8q3fFWMm9WNmvRtjLOWAlR8SuDzFKccCwYPKcpHNJhWJxc5MpS5KxfRH9MahkEvljJM0pRcrjDNozDtZ4Op7yja2mfWovQLs9K9eSg1Zr48t+P0zLagREiTytDruOblYxANzXuvQxsd7wuO67ta+JpEia5O242P9IYrxDw5FwAxR4YDRcuRANtwlPNpFGVz1wR0MX09fPtsc+MC/MpbuyHCm+RUDIlgvslc2bXpJAZfFlcJQoHhUtEzpMT1TyIcTgPKbrnWVR5nycAQl6Y0vOViU/XWoYAvjZpUDWw13/nLrlLh1Wdg== Received: from DM6PR07CA0112.namprd07.prod.outlook.com (2603:10b6:5:330::27) by CH0PR12MB8577.namprd12.prod.outlook.com (2603:10b6:610:18b::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.54; Sat, 13 Apr 2024 12:54:10 +0000 Received: from DS3PEPF0000C37F.namprd04.prod.outlook.com (2603:10b6:5:330:cafe::6) by DM6PR07CA0112.outlook.office365.com (2603:10b6:5:330::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7472.30 via Frontend Transport; Sat, 13 Apr 2024 12:54:10 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by DS3PEPF0000C37F.mail.protection.outlook.com (10.167.23.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7452.22 via Frontend Transport; Sat, 13 Apr 2024 12:54:10 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Sat, 13 Apr 2024 05:53:59 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.12; Sat, 13 Apr 2024 05:53:57 -0700 From: Xueming Li To: Rahul Bhansali CC: dpdk stable Subject: patch 'net/cnxk: improve Tx performance for SW mbuf free' has been queued to stable release 23.11.1 Date: Sat, 13 Apr 2024 20:48:43 +0800 Message-ID: <20240413125005.725659-43-xuemingl@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240413125005.725659-1-xuemingl@nvidia.com> References: <20240305094757.439387-1-xuemingl@nvidia.com> <20240413125005.725659-1-xuemingl@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail201.nvidia.com (10.129.68.8) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF0000C37F:EE_|CH0PR12MB8577:EE_ X-MS-Office365-Filtering-Correlation-Id: fea7f758-c57e-402d-0621-08dc5bb8d05a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 6D3kgYwR6IOyGNdqvrbA66Shm0YXjz0KJKbuKY0l8dYC31m1CREcC8GIOSwuPtxL6xySt/5Lga5MiUzmynXwPfrFscQMuCYXtUDsSGC5v5dEd7z15ysaMdUrxlks4ztaYKPKOLnbKWNu2yR8U/WQjmPvdWpgDRtM5VU/ZdENf/5oGIvney67dQ7tA8v3CnXtyYJJLFHB7Cb/dyPmU/LhUD3dnab+n3rsp5kTaXg1wx/mOYv6oeoZBBhbdo3lGxIQUmOoLd7d5eS45zyi6FqEu7R3efMYMg6/cYsTo+fqm7/pAI5E1iiByJCfPNYDLko3OTg+5CYd22VReCgBsKPmm7H1k2vL0dwGmBBrSUrao0Qe3j85gMIc4vmzAFUyuniNS+/AtBJgXITH0GMvvjhLDWLpSB66ftegbWxFJphJWKtsSq9cPiYY/YxDgrzpNQqDF1IDA2tXrUFqTvCkQ2UBovqrM8qrKe4D8iFyUzDGUhMQAjtAsMT/i8a5ol4WLjQLqt1uObh3/0ZQj6PdYJYcHJ3OOqPigTUBJAvDb0ZHYEF67vyXxl3EB/WidFL8gDrJdyoKaevkMMoDzRYCNTcbXlcfMUD0qW0p6jH4ftS5Oq23o0EgnFYhYnOySbuzwYGiwwBoF6Jc1c1OQbcfjDUMVF5Hn8lTHyTgZmUuthnoW5kpUe3aCrLkYaivWgzRP4ebJmQZtiFTKhpXoHP5pURm/caGtL7uM7cSZ4y9D4RNaq0cQbhTGEyGni/9O6V5Ukzh X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230031)(36860700004)(1800799015)(82310400014)(376005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Apr 2024 12:54:10.5876 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fea7f758-c57e-402d-0621-08dc5bb8d05a X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF0000C37F.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH0PR12MB8577 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Hi, FYI, your patch has been queued to stable release 23.11.1 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 04/15/24. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. This will indicate if there was any rebasing needed to apply to the stable branch. If there were code changes for rebasing (ie: not only metadata diffs), please double check that the rebase was correctly done. Queued patches are on a temporary branch at: https://git.dpdk.org/dpdk-stable/log/?h=23.11-staging This queued commit can be viewed at: https://git.dpdk.org/dpdk-stable/commit/?h=23.11-staging&id=630dbc8a928ba12e93c534df9d7dfdd6ad4af371 Thanks. Xueming Li --- >From 630dbc8a928ba12e93c534df9d7dfdd6ad4af371 Mon Sep 17 00:00:00 2001 From: Rahul Bhansali Date: Fri, 1 Mar 2024 08:46:45 +0530 Subject: [PATCH] net/cnxk: improve Tx performance for SW mbuf free Cc: Xueming Li [ upstream commit f3d7cf8a4c7eedbf2bdfc19370d49bd2557717e6 ] Performance improvement is done for Tx fastpath flag MBUF_NOFF when tx_compl_ena is false and mbuf has an external buffer. In such case, Instead of individual external mbuf free before LMTST, a chain of external mbuf will be created and free all after LMTST. This not only improve the performance but also fixes SQ corruption. CN10k performance improvement is ~14%. CN9k performance improvement is ~20%. Fixes: 51a636528515 ("net/cnxk: fix crash during Tx completion") Signed-off-by: Rahul Bhansali --- drivers/event/cnxk/cn10k_tx_worker.h | 8 ++- drivers/event/cnxk/cn9k_worker.h | 9 ++- drivers/net/cnxk/cn10k_tx.h | 97 +++++++++++++++++++--------- drivers/net/cnxk/cn9k_tx.h | 88 ++++++++++++++++--------- 4 files changed, 135 insertions(+), 67 deletions(-) diff --git a/drivers/event/cnxk/cn10k_tx_worker.h b/drivers/event/cnxk/cn10k_tx_worker.h index 53e0dde20c..256237b895 100644 --- a/drivers/event/cnxk/cn10k_tx_worker.h +++ b/drivers/event/cnxk/cn10k_tx_worker.h @@ -70,6 +70,7 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf *m, uint64_t *cmd, const uint64_t *txq_data, const uint32_t flags) { uint8_t lnum = 0, loff = 0, shft = 0; + struct rte_mbuf *extm = NULL; struct cn10k_eth_txq *txq; uintptr_t laddr; uint16_t segdw; @@ -90,7 +91,7 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf *m, uint64_t *cmd, if (flags & NIX_TX_OFFLOAD_TSO_F) cn10k_nix_xmit_prepare_tso(m, flags); - cn10k_nix_xmit_prepare(txq, m, cmd, flags, txq->lso_tun_fmt, &sec, + cn10k_nix_xmit_prepare(txq, m, &extm, cmd, flags, txq->lso_tun_fmt, &sec, txq->mark_flag, txq->mark_fmt); laddr = lmt_addr; @@ -105,7 +106,7 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf *m, uint64_t *cmd, cn10k_nix_xmit_mv_lmt_base(laddr, cmd, flags); if (flags & NIX_TX_MULTI_SEG_F) - segdw = cn10k_nix_prepare_mseg(txq, m, (uint64_t *)laddr, flags); + segdw = cn10k_nix_prepare_mseg(txq, m, &extm, (uint64_t *)laddr, flags); else segdw = cn10k_nix_tx_ext_subs(flags) + 2; @@ -127,6 +128,9 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf *m, uint64_t *cmd, /* Memory barrier to make sure lmtst store completes */ rte_io_wmb(); + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) + cn10k_nix_free_extmbuf(extm); + return 1; } diff --git a/drivers/event/cnxk/cn9k_worker.h b/drivers/event/cnxk/cn9k_worker.h index 0451157812..107265d54b 100644 --- a/drivers/event/cnxk/cn9k_worker.h +++ b/drivers/event/cnxk/cn9k_worker.h @@ -746,7 +746,7 @@ static __rte_always_inline uint16_t cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, uint64_t *txq_data, const uint32_t flags) { - struct rte_mbuf *m = ev->mbuf; + struct rte_mbuf *m = ev->mbuf, *extm = NULL; struct cn9k_eth_txq *txq; /* Perform header writes before barrier for TSO */ @@ -767,7 +767,7 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, if (cn9k_sso_sq_depth(txq) <= 0) return 0; cn9k_nix_tx_skeleton(txq, cmd, flags, 0); - cn9k_nix_xmit_prepare(txq, m, cmd, flags, txq->lso_tun_fmt, txq->mark_flag, + cn9k_nix_xmit_prepare(txq, m, &extm, cmd, flags, txq->lso_tun_fmt, txq->mark_flag, txq->mark_fmt); if (flags & NIX_TX_OFFLOAD_SECURITY_F) { @@ -789,7 +789,7 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, } if (flags & NIX_TX_MULTI_SEG_F) { - const uint16_t segdw = cn9k_nix_prepare_mseg(txq, m, cmd, flags); + const uint16_t segdw = cn9k_nix_prepare_mseg(txq, m, &extm, cmd, flags); cn9k_nix_xmit_prepare_tstamp(txq, cmd, m->ol_flags, segdw, flags); if (!CNXK_TT_FROM_EVENT(ev->event)) { @@ -819,6 +819,9 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, } done: + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) + cn9k_nix_free_extmbuf(extm); + return 1; } diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index cc480d24e8..5dff578ba4 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -784,8 +784,19 @@ cn10k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, } #endif +static inline void +cn10k_nix_free_extmbuf(struct rte_mbuf *m) +{ + struct rte_mbuf *m_next; + while (m != NULL) { + m_next = m->next; + rte_pktmbuf_free_seg(m); + m = m_next; + } +} + static __rte_always_inline uint64_t -cn10k_nix_prefree_seg(struct rte_mbuf *m, struct cn10k_eth_txq *txq, +cn10k_nix_prefree_seg(struct rte_mbuf *m, struct rte_mbuf **extm, struct cn10k_eth_txq *txq, struct nix_send_hdr_s *send_hdr, uint64_t *aura) { struct rte_mbuf *prev = NULL; @@ -793,7 +804,8 @@ cn10k_nix_prefree_seg(struct rte_mbuf *m, struct cn10k_eth_txq *txq, if (RTE_MBUF_HAS_EXTBUF(m)) { if (unlikely(txq->tx_compl.ena == 0)) { - rte_pktmbuf_free_seg(m); + m->next = *extm; + *extm = m; return 1; } if (send_hdr->w0.pnc) { @@ -817,7 +829,8 @@ cn10k_nix_prefree_seg(struct rte_mbuf *m, struct cn10k_eth_txq *txq, #if defined(RTE_ARCH_ARM64) /* Only called for first segments of single segmented mbufs */ static __rte_always_inline void -cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, +cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct rte_mbuf **extm, + struct cn10k_eth_txq *txq, uint64x2_t *senddesc01_w0, uint64x2_t *senddesc23_w0, uint64x2_t *senddesc01_w1, uint64x2_t *senddesc23_w1) { @@ -841,7 +854,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, w1 = vgetq_lane_u64(*senddesc01_w1, 0); w1 &= ~0xFFFF000000000000UL; if (unlikely(!tx_compl_ena)) { - rte_pktmbuf_free_seg(m0); + m0->next = *extm; + *extm = m0; } else { sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1, rte_memory_order_relaxed); @@ -871,7 +885,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, w1 = vgetq_lane_u64(*senddesc01_w1, 1); w1 &= ~0xFFFF000000000000UL; if (unlikely(!tx_compl_ena)) { - rte_pktmbuf_free_seg(m1); + m1->next = *extm; + *extm = m1; } else { sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1, rte_memory_order_relaxed); @@ -901,7 +916,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, w1 = vgetq_lane_u64(*senddesc23_w1, 0); w1 &= ~0xFFFF000000000000UL; if (unlikely(!tx_compl_ena)) { - rte_pktmbuf_free_seg(m2); + m2->next = *extm; + *extm = m2; } else { sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1, rte_memory_order_relaxed); @@ -931,7 +947,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, w1 = vgetq_lane_u64(*senddesc23_w1, 1); w1 &= ~0xFFFF000000000000UL; if (unlikely(!tx_compl_ena)) { - rte_pktmbuf_free_seg(m3); + m3->next = *extm; + *extm = m3; } else { sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1, rte_memory_order_relaxed); @@ -1013,9 +1030,9 @@ cn10k_nix_xmit_prepare_tso(struct rte_mbuf *m, const uint64_t flags) static __rte_always_inline void cn10k_nix_xmit_prepare(struct cn10k_eth_txq *txq, - struct rte_mbuf *m, uint64_t *cmd, const uint16_t flags, - const uint64_t lso_tun_fmt, bool *sec, uint8_t mark_flag, - uint64_t mark_fmt) + struct rte_mbuf *m, struct rte_mbuf **extm, uint64_t *cmd, + const uint16_t flags, const uint64_t lso_tun_fmt, bool *sec, + uint8_t mark_flag, uint64_t mark_fmt) { uint8_t mark_off = 0, mark_vlan = 0, markptr = 0; struct nix_send_ext_s *send_hdr_ext; @@ -1215,7 +1232,7 @@ cn10k_nix_xmit_prepare(struct cn10k_eth_txq *txq, * DF bit = 0 otherwise */ aura = send_hdr->w0.aura; - send_hdr->w0.df = cn10k_nix_prefree_seg(m, txq, send_hdr, &aura); + send_hdr->w0.df = cn10k_nix_prefree_seg(m, extm, txq, send_hdr, &aura); send_hdr->w0.aura = aura; } #ifdef RTE_LIBRTE_MEMPOOL_DEBUG @@ -1291,8 +1308,8 @@ cn10k_nix_xmit_prepare_tstamp(struct cn10k_eth_txq *txq, uintptr_t lmt_addr, } static __rte_always_inline uint16_t -cn10k_nix_prepare_mseg(struct cn10k_eth_txq *txq, - struct rte_mbuf *m, uint64_t *cmd, const uint16_t flags) +cn10k_nix_prepare_mseg(struct cn10k_eth_txq *txq, struct rte_mbuf *m, struct rte_mbuf **extm, + uint64_t *cmd, const uint16_t flags) { uint64_t prefree = 0, aura0, aura, nb_segs, segdw; struct nix_send_hdr_s *send_hdr; @@ -1335,7 +1352,7 @@ cn10k_nix_prepare_mseg(struct cn10k_eth_txq *txq, /* Set invert df if buffer is not to be freed by H/W */ if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { aura = send_hdr->w0.aura; - prefree = cn10k_nix_prefree_seg(m, txq, send_hdr, &aura); + prefree = cn10k_nix_prefree_seg(m, extm, txq, send_hdr, &aura); send_hdr->w0.aura = aura; l_sg.i1 = prefree; } @@ -1382,7 +1399,7 @@ cn10k_nix_prepare_mseg(struct cn10k_eth_txq *txq, cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m); if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { aura = roc_npa_aura_handle_to_aura(m->pool->pool_id); - prefree = cn10k_nix_prefree_seg(m, txq, send_hdr, &aura); + prefree = cn10k_nix_prefree_seg(m, extm, txq, send_hdr, &aura); is_sg2 = aura != aura0 && !prefree; } @@ -1476,6 +1493,7 @@ cn10k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint8_t lnum, c_lnum, c_shft, c_loff; uintptr_t pa, lbase = txq->lmt_base; uint16_t lmt_id, burst, left, i; + struct rte_mbuf *extm = NULL; uintptr_t c_lbase = lbase; uint64_t lso_tun_fmt = 0; uint64_t mark_fmt = 0; @@ -1530,7 +1548,7 @@ again: if (flags & NIX_TX_OFFLOAD_TSO_F) cn10k_nix_xmit_prepare_tso(tx_pkts[i], flags); - cn10k_nix_xmit_prepare(txq, tx_pkts[i], cmd, flags, lso_tun_fmt, + cn10k_nix_xmit_prepare(txq, tx_pkts[i], &extm, cmd, flags, lso_tun_fmt, &sec, mark_flag, mark_fmt); laddr = (uintptr_t)LMT_OFF(lbase, lnum, 0); @@ -1605,6 +1623,11 @@ again: } rte_io_wmb(); + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) { + cn10k_nix_free_extmbuf(extm); + extm = NULL; + } + if (left) goto again; @@ -1620,6 +1643,7 @@ cn10k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, uintptr_t pa0, pa1, lbase = txq->lmt_base; const rte_iova_t io_addr = txq->io_addr; uint16_t segdw, lmt_id, burst, left, i; + struct rte_mbuf *extm = NULL; uint8_t lnum, c_lnum, c_loff; uintptr_t c_lbase = lbase; uint64_t lso_tun_fmt = 0; @@ -1681,7 +1705,7 @@ again: if (flags & NIX_TX_OFFLOAD_TSO_F) cn10k_nix_xmit_prepare_tso(tx_pkts[i], flags); - cn10k_nix_xmit_prepare(txq, tx_pkts[i], cmd, flags, lso_tun_fmt, + cn10k_nix_xmit_prepare(txq, tx_pkts[i], &extm, cmd, flags, lso_tun_fmt, &sec, mark_flag, mark_fmt); laddr = (uintptr_t)LMT_OFF(lbase, lnum, 0); @@ -1695,7 +1719,7 @@ again: /* Move NIX desc to LMT/NIXTX area */ cn10k_nix_xmit_mv_lmt_base(laddr, cmd, flags); /* Store sg list directly on lmt line */ - segdw = cn10k_nix_prepare_mseg(txq, tx_pkts[i], (uint64_t *)laddr, + segdw = cn10k_nix_prepare_mseg(txq, tx_pkts[i], &extm, (uint64_t *)laddr, flags); cn10k_nix_xmit_prepare_tstamp(txq, laddr, tx_pkts[i]->ol_flags, segdw, flags); @@ -1768,6 +1792,11 @@ again: } rte_io_wmb(); + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) { + cn10k_nix_free_extmbuf(extm); + extm = NULL; + } + if (left) goto again; @@ -1818,7 +1847,7 @@ cn10k_nix_prepare_tso(struct rte_mbuf *m, union nix_send_hdr_w1_u *w1, static __rte_always_inline uint16_t cn10k_nix_prepare_mseg_vec_noff(struct cn10k_eth_txq *txq, - struct rte_mbuf *m, uint64_t *cmd, + struct rte_mbuf *m, struct rte_mbuf **extm, uint64_t *cmd, uint64x2_t *cmd0, uint64x2_t *cmd1, uint64x2_t *cmd2, uint64x2_t *cmd3, const uint32_t flags) @@ -1833,7 +1862,7 @@ cn10k_nix_prepare_mseg_vec_noff(struct cn10k_eth_txq *txq, vst1q_u64(cmd + 2, *cmd1); /* sg */ } - segdw = cn10k_nix_prepare_mseg(txq, m, cmd, flags); + segdw = cn10k_nix_prepare_mseg(txq, m, extm, cmd, flags); if (flags & NIX_TX_OFFLOAD_TSTAMP_F) vst1q_u64(cmd + segdw * 2 - 2, *cmd3); @@ -1943,7 +1972,7 @@ cn10k_nix_prepare_mseg_vec(struct rte_mbuf *m, uint64_t *cmd, uint64x2_t *cmd0, static __rte_always_inline uint8_t cn10k_nix_prep_lmt_mseg_vector(struct cn10k_eth_txq *txq, - struct rte_mbuf **mbufs, uint64x2_t *cmd0, + struct rte_mbuf **mbufs, struct rte_mbuf **extm, uint64x2_t *cmd0, uint64x2_t *cmd1, uint64x2_t *cmd2, uint64x2_t *cmd3, uint8_t *segdw, uint64_t *lmt_addr, __uint128_t *data128, @@ -1961,7 +1990,7 @@ cn10k_nix_prep_lmt_mseg_vector(struct cn10k_eth_txq *txq, lmt_addr += 16; off = 0; } - off += cn10k_nix_prepare_mseg_vec_noff(txq, mbufs[j], + off += cn10k_nix_prepare_mseg_vec_noff(txq, mbufs[j], extm, lmt_addr + off * 2, &cmd0[j], &cmd1[j], &cmd2[j], &cmd3[j], flags); } @@ -2114,14 +2143,14 @@ cn10k_nix_lmt_next(uint8_t dw, uintptr_t laddr, uint8_t *lnum, uint8_t *loff, static __rte_always_inline void cn10k_nix_xmit_store(struct cn10k_eth_txq *txq, - struct rte_mbuf *mbuf, uint8_t segdw, uintptr_t laddr, + struct rte_mbuf *mbuf, struct rte_mbuf **extm, uint8_t segdw, uintptr_t laddr, uint64x2_t cmd0, uint64x2_t cmd1, uint64x2_t cmd2, uint64x2_t cmd3, const uint16_t flags) { uint8_t off; if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { - cn10k_nix_prepare_mseg_vec_noff(txq, mbuf, LMT_OFF(laddr, 0, 0), + cn10k_nix_prepare_mseg_vec_noff(txq, mbuf, extm, LMT_OFF(laddr, 0, 0), &cmd0, &cmd1, &cmd2, &cmd3, flags); return; @@ -2205,6 +2234,7 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, __uint128_t data128; uint64_t data[2]; } wd; + struct rte_mbuf *extm = NULL; if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && txq->tx_compl.ena) handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F); @@ -3050,8 +3080,8 @@ again: !(flags & NIX_TX_MULTI_SEG_F) && !(flags & NIX_TX_OFFLOAD_SECURITY_F)) { /* Set don't free bit if reference count > 1 */ - cn10k_nix_prefree_seg_vec(tx_pkts, txq, &senddesc01_w0, &senddesc23_w0, - &senddesc01_w1, &senddesc23_w1); + cn10k_nix_prefree_seg_vec(tx_pkts, &extm, txq, &senddesc01_w0, + &senddesc23_w0, &senddesc01_w1, &senddesc23_w1); } else if (!(flags & NIX_TX_MULTI_SEG_F) && !(flags & NIX_TX_OFFLOAD_SECURITY_F)) { /* Move mbufs to iova */ @@ -3123,7 +3153,7 @@ again: &shift, &wd.data128, &next); /* Store mbuf0 to LMTLINE/CPT NIXTX area */ - cn10k_nix_xmit_store(txq, tx_pkts[0], segdw[0], next, + cn10k_nix_xmit_store(txq, tx_pkts[0], &extm, segdw[0], next, cmd0[0], cmd1[0], cmd2[0], cmd3[0], flags); @@ -3139,7 +3169,7 @@ again: &shift, &wd.data128, &next); /* Store mbuf1 to LMTLINE/CPT NIXTX area */ - cn10k_nix_xmit_store(txq, tx_pkts[1], segdw[1], next, + cn10k_nix_xmit_store(txq, tx_pkts[1], &extm, segdw[1], next, cmd0[1], cmd1[1], cmd2[1], cmd3[1], flags); @@ -3155,7 +3185,7 @@ again: &shift, &wd.data128, &next); /* Store mbuf2 to LMTLINE/CPT NIXTX area */ - cn10k_nix_xmit_store(txq, tx_pkts[2], segdw[2], next, + cn10k_nix_xmit_store(txq, tx_pkts[2], &extm, segdw[2], next, cmd0[2], cmd1[2], cmd2[2], cmd3[2], flags); @@ -3171,7 +3201,7 @@ again: &shift, &wd.data128, &next); /* Store mbuf3 to LMTLINE/CPT NIXTX area */ - cn10k_nix_xmit_store(txq, tx_pkts[3], segdw[3], next, + cn10k_nix_xmit_store(txq, tx_pkts[3], &extm, segdw[3], next, cmd0[3], cmd1[3], cmd2[3], cmd3[3], flags); @@ -3179,7 +3209,7 @@ again: uint8_t j; segdw[4] = 8; - j = cn10k_nix_prep_lmt_mseg_vector(txq, tx_pkts, cmd0, cmd1, + j = cn10k_nix_prep_lmt_mseg_vector(txq, tx_pkts, &extm, cmd0, cmd1, cmd2, cmd3, segdw, (uint64_t *) LMT_OFF(laddr, lnum, @@ -3329,6 +3359,11 @@ again: } rte_io_wmb(); + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) { + cn10k_nix_free_extmbuf(extm); + extm = NULL; + } + if (left) goto again; diff --git a/drivers/net/cnxk/cn9k_tx.h b/drivers/net/cnxk/cn9k_tx.h index 94acbe64fa..018fae2eb7 100644 --- a/drivers/net/cnxk/cn9k_tx.h +++ b/drivers/net/cnxk/cn9k_tx.h @@ -82,16 +82,28 @@ cn9k_nix_tx_skeleton(struct cn9k_eth_txq *txq, uint64_t *cmd, } } +static __rte_always_inline void +cn9k_nix_free_extmbuf(struct rte_mbuf *m) +{ + struct rte_mbuf *m_next; + while (m != NULL) { + m_next = m->next; + rte_pktmbuf_free_seg(m); + m = m_next; + } +} + static __rte_always_inline uint64_t -cn9k_nix_prefree_seg(struct rte_mbuf *m, struct cn9k_eth_txq *txq, struct nix_send_hdr_s *send_hdr, - uint64_t *aura) +cn9k_nix_prefree_seg(struct rte_mbuf *m, struct rte_mbuf **extm, struct cn9k_eth_txq *txq, + struct nix_send_hdr_s *send_hdr, uint64_t *aura) { struct rte_mbuf *prev; uint32_t sqe_id; if (RTE_MBUF_HAS_EXTBUF(m)) { if (unlikely(txq->tx_compl.ena == 0)) { - rte_pktmbuf_free_seg(m); + m->next = *extm; + *extm = m; return 1; } if (send_hdr->w0.pnc) { @@ -115,7 +127,7 @@ cn9k_nix_prefree_seg(struct rte_mbuf *m, struct cn9k_eth_txq *txq, struct nix_se #if defined(RTE_ARCH_ARM64) /* Only called for first segments of single segmented mbufs */ static __rte_always_inline void -cn9k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn9k_eth_txq *txq, +cn9k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct rte_mbuf **extm, struct cn9k_eth_txq *txq, uint64x2_t *senddesc01_w0, uint64x2_t *senddesc23_w0, uint64x2_t *senddesc01_w1, uint64x2_t *senddesc23_w1) { @@ -139,7 +151,8 @@ cn9k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn9k_eth_txq *txq, w1 = vgetq_lane_u64(*senddesc01_w1, 0); w1 &= ~0xFFFF000000000000UL; if (unlikely(!tx_compl_ena)) { - rte_pktmbuf_free_seg(m0); + m0->next = *extm; + *extm = m0; } else { sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1, rte_memory_order_relaxed); @@ -169,7 +182,8 @@ cn9k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn9k_eth_txq *txq, w1 = vgetq_lane_u64(*senddesc01_w1, 1); w1 &= ~0xFFFF000000000000UL; if (unlikely(!tx_compl_ena)) { - rte_pktmbuf_free_seg(m1); + m1->next = *extm; + *extm = m1; } else { sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1, rte_memory_order_relaxed); @@ -199,7 +213,8 @@ cn9k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn9k_eth_txq *txq, w1 = vgetq_lane_u64(*senddesc23_w1, 0); w1 &= ~0xFFFF000000000000UL; if (unlikely(!tx_compl_ena)) { - rte_pktmbuf_free_seg(m2); + m2->next = *extm; + *extm = m2; } else { sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1, rte_memory_order_relaxed); @@ -229,7 +244,8 @@ cn9k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn9k_eth_txq *txq, w1 = vgetq_lane_u64(*senddesc23_w1, 1); w1 &= ~0xFFFF000000000000UL; if (unlikely(!tx_compl_ena)) { - rte_pktmbuf_free_seg(m3); + m3->next = *extm; + *extm = m3; } else { sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1, rte_memory_order_relaxed); @@ -310,10 +326,9 @@ cn9k_nix_xmit_prepare_tso(struct rte_mbuf *m, const uint64_t flags) } static __rte_always_inline void -cn9k_nix_xmit_prepare(struct cn9k_eth_txq *txq, - struct rte_mbuf *m, uint64_t *cmd, const uint16_t flags, - const uint64_t lso_tun_fmt, uint8_t mark_flag, - uint64_t mark_fmt) +cn9k_nix_xmit_prepare(struct cn9k_eth_txq *txq, struct rte_mbuf *m, struct rte_mbuf **extm, + uint64_t *cmd, const uint16_t flags, const uint64_t lso_tun_fmt, + uint8_t mark_flag, uint64_t mark_fmt) { uint8_t mark_off = 0, mark_vlan = 0, markptr = 0; struct nix_send_ext_s *send_hdr_ext; @@ -509,7 +524,7 @@ cn9k_nix_xmit_prepare(struct cn9k_eth_txq *txq, * DF bit = 0 otherwise */ aura = send_hdr->w0.aura; - send_hdr->w0.df = cn9k_nix_prefree_seg(m, txq, send_hdr, &aura); + send_hdr->w0.df = cn9k_nix_prefree_seg(m, extm, txq, send_hdr, &aura); send_hdr->w0.aura = aura; /* Ensuring mbuf fields which got updated in * cnxk_nix_prefree_seg are written before LMTST. @@ -600,8 +615,8 @@ cn9k_nix_xmit_submit_lmt_release(const rte_iova_t io_addr) } static __rte_always_inline uint16_t -cn9k_nix_prepare_mseg(struct cn9k_eth_txq *txq, - struct rte_mbuf *m, uint64_t *cmd, const uint16_t flags) +cn9k_nix_prepare_mseg(struct cn9k_eth_txq *txq, struct rte_mbuf *m, struct rte_mbuf **extm, + uint64_t *cmd, const uint16_t flags) { struct nix_send_hdr_s *send_hdr; uint64_t prefree = 0, aura; @@ -634,7 +649,7 @@ cn9k_nix_prepare_mseg(struct cn9k_eth_txq *txq, /* Set invert df if buffer is not to be freed by H/W */ if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { aura = send_hdr->w0.aura; - prefree = (cn9k_nix_prefree_seg(m, txq, send_hdr, &aura) << 55); + prefree = (cn9k_nix_prefree_seg(m, extm, txq, send_hdr, &aura) << 55); send_hdr->w0.aura = aura; sg_u |= prefree; rte_io_wmb(); @@ -664,7 +679,7 @@ cn9k_nix_prepare_mseg(struct cn9k_eth_txq *txq, cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m); /* Set invert df if buffer is not to be freed by H/W */ if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { - sg_u |= (cn9k_nix_prefree_seg(m, txq, send_hdr, NULL) << (i + 55)); + sg_u |= (cn9k_nix_prefree_seg(m, extm, txq, send_hdr, NULL) << (i + 55)); /* Commit changes to mbuf */ rte_io_wmb(); } @@ -748,6 +763,7 @@ cn9k_nix_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts, const rte_iova_t io_addr = txq->io_addr; uint64_t lso_tun_fmt = 0, mark_fmt = 0; void *lmt_addr = txq->lmt_addr; + struct rte_mbuf *extm = NULL; uint8_t mark_flag = 0; uint16_t i; @@ -778,13 +794,16 @@ cn9k_nix_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts, rte_io_wmb(); for (i = 0; i < pkts; i++) { - cn9k_nix_xmit_prepare(txq, tx_pkts[i], cmd, flags, lso_tun_fmt, + cn9k_nix_xmit_prepare(txq, tx_pkts[i], &extm, cmd, flags, lso_tun_fmt, mark_flag, mark_fmt); cn9k_nix_xmit_prepare_tstamp(txq, cmd, tx_pkts[i]->ol_flags, 4, flags); cn9k_nix_xmit_one(cmd, lmt_addr, io_addr, flags); } + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) + cn9k_nix_free_extmbuf(extm); + /* Reduce the cached count */ txq->fc_cache_pkts -= pkts; @@ -799,6 +818,7 @@ cn9k_nix_xmit_pkts_mseg(void *tx_queue, struct rte_mbuf **tx_pkts, const rte_iova_t io_addr = txq->io_addr; uint64_t lso_tun_fmt = 0, mark_fmt = 0; void *lmt_addr = txq->lmt_addr; + struct rte_mbuf *extm = NULL; uint8_t mark_flag = 0; uint16_t segdw; uint64_t i; @@ -830,14 +850,17 @@ cn9k_nix_xmit_pkts_mseg(void *tx_queue, struct rte_mbuf **tx_pkts, rte_io_wmb(); for (i = 0; i < pkts; i++) { - cn9k_nix_xmit_prepare(txq, tx_pkts[i], cmd, flags, lso_tun_fmt, + cn9k_nix_xmit_prepare(txq, tx_pkts[i], &extm, cmd, flags, lso_tun_fmt, mark_flag, mark_fmt); - segdw = cn9k_nix_prepare_mseg(txq, tx_pkts[i], cmd, flags); + segdw = cn9k_nix_prepare_mseg(txq, tx_pkts[i], &extm, cmd, flags); cn9k_nix_xmit_prepare_tstamp(txq, cmd, tx_pkts[i]->ol_flags, segdw, flags); cn9k_nix_xmit_mseg_one(cmd, lmt_addr, io_addr, segdw); } + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) + cn9k_nix_free_extmbuf(extm); + /* Reduce the cached count */ txq->fc_cache_pkts -= pkts; @@ -885,7 +908,7 @@ cn9k_nix_prepare_tso(struct rte_mbuf *m, union nix_send_hdr_w1_u *w1, static __rte_always_inline uint8_t cn9k_nix_prepare_mseg_vec_list(struct cn9k_eth_txq *txq, - struct rte_mbuf *m, uint64_t *cmd, + struct rte_mbuf *m, struct rte_mbuf **extm, uint64_t *cmd, struct nix_send_hdr_s *send_hdr, union nix_send_sg_s *sg, const uint32_t flags) { @@ -910,7 +933,7 @@ cn9k_nix_prepare_mseg_vec_list(struct cn9k_eth_txq *txq, cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m); if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { aura = send_hdr->w0.aura; - sg_u |= (cn9k_nix_prefree_seg(m, txq, send_hdr, &aura) << 55); + sg_u |= (cn9k_nix_prefree_seg(m, extm, txq, send_hdr, &aura) << 55); send_hdr->w0.aura = aura; } /* Mark mempool object as "put" since it is freed by NIX */ @@ -935,7 +958,7 @@ cn9k_nix_prepare_mseg_vec_list(struct cn9k_eth_txq *txq, cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m); /* Set invert df if buffer is not to be freed by H/W */ if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) - sg_u |= (cn9k_nix_prefree_seg(m, txq, send_hdr, &aura) << (i + 55)); + sg_u |= (cn9k_nix_prefree_seg(m, extm, txq, send_hdr, &aura) << (i + 55)); /* Mark mempool object as "put" since it is freed by NIX */ #ifdef RTE_LIBRTE_MEMPOOL_DEBUG @@ -981,9 +1004,8 @@ cn9k_nix_prepare_mseg_vec_list(struct cn9k_eth_txq *txq, } static __rte_always_inline uint8_t -cn9k_nix_prepare_mseg_vec(struct cn9k_eth_txq *txq, - struct rte_mbuf *m, uint64_t *cmd, uint64x2_t *cmd0, - uint64x2_t *cmd1, const uint32_t flags) +cn9k_nix_prepare_mseg_vec(struct cn9k_eth_txq *txq, struct rte_mbuf *m, struct rte_mbuf **extm, + uint64_t *cmd, uint64x2_t *cmd0, uint64x2_t *cmd1, const uint32_t flags) { struct nix_send_hdr_s send_hdr; struct rte_mbuf *cookie; @@ -998,7 +1020,7 @@ cn9k_nix_prepare_mseg_vec(struct cn9k_eth_txq *txq, send_hdr.w1.u = vgetq_lane_u64(cmd0[0], 1); sg.u = vgetq_lane_u64(cmd1[0], 0); aura = send_hdr.w0.aura; - sg.u |= (cn9k_nix_prefree_seg(m, txq, &send_hdr, &aura) << 55); + sg.u |= (cn9k_nix_prefree_seg(m, extm, txq, &send_hdr, &aura) << 55); send_hdr.w0.aura = aura; cmd1[0] = vsetq_lane_u64(sg.u, cmd1[0], 0); cmd0[0] = vsetq_lane_u64(send_hdr.w0.u, cmd0[0], 0); @@ -1021,7 +1043,7 @@ cn9k_nix_prepare_mseg_vec(struct cn9k_eth_txq *txq, send_hdr.w1.u = vgetq_lane_u64(cmd0[0], 1); sg.u = vgetq_lane_u64(cmd1[0], 0); - ret = cn9k_nix_prepare_mseg_vec_list(txq, m, cmd, &send_hdr, &sg, flags); + ret = cn9k_nix_prepare_mseg_vec_list(txq, m, extm, cmd, &send_hdr, &sg, flags); cmd0[0] = vsetq_lane_u64(send_hdr.w0.u, cmd0[0], 0); cmd0[0] = vsetq_lane_u64(send_hdr.w1.u, cmd0[0], 1); @@ -1168,6 +1190,7 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, uint64_t *lmt_addr = txq->lmt_addr; rte_iova_t io_addr = txq->io_addr; uint64x2_t ltypes01, ltypes23; + struct rte_mbuf *extm = NULL; uint64x2_t xtmp128, ytmp128; uint64x2_t xmask01, xmask23; uint64_t lmt_status, i; @@ -1933,8 +1956,8 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && !(flags & NIX_TX_MULTI_SEG_F)) { /* Set don't free bit if reference count > 1 */ - cn9k_nix_prefree_seg_vec(tx_pkts, txq, &senddesc01_w0, &senddesc23_w0, - &senddesc01_w1, &senddesc23_w1); + cn9k_nix_prefree_seg_vec(tx_pkts, &extm, txq, &senddesc01_w0, + &senddesc23_w0, &senddesc01_w1, &senddesc23_w1); /* Ensuring mbuf fields which got updated in * cnxk_nix_prefree_seg are written before LMTST. */ @@ -1995,7 +2018,7 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, /* Build mseg list for each packet individually. */ for (j = 0; j < NIX_DESCS_PER_LOOP; j++) segdw[j] = cn9k_nix_prepare_mseg_vec(txq, - tx_pkts[j], + tx_pkts[j], &extm, seg_list[j], &cmd0[j], &cmd1[j], flags); segdw[4] = 8; @@ -2070,6 +2093,9 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, tx_pkts = tx_pkts + NIX_DESCS_PER_LOOP; } + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) + cn9k_nix_free_extmbuf(extm); + if (unlikely(pkts_left)) { if (flags & NIX_TX_MULTI_SEG_F) pkts += cn9k_nix_xmit_pkts_mseg(tx_queue, tx_pkts, -- 2.34.1 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2024-04-13 20:43:06.384004321 +0800 +++ 0043-net-cnxk-improve-Tx-performance-for-SW-mbuf-free.patch 2024-04-13 20:43:04.957753984 +0800 @@ -1 +1 @@ -From f3d7cf8a4c7eedbf2bdfc19370d49bd2557717e6 Mon Sep 17 00:00:00 2001 +From 630dbc8a928ba12e93c534df9d7dfdd6ad4af371 Mon Sep 17 00:00:00 2001 @@ -4,0 +5,3 @@ +Cc: Xueming Li + +[ upstream commit f3d7cf8a4c7eedbf2bdfc19370d49bd2557717e6 ] @@ -16 +18,0 @@ -Cc: stable@dpdk.org @@ -20,6 +22,5 @@ - doc/guides/rel_notes/release_24_03.rst | 1 + - drivers/event/cnxk/cn10k_tx_worker.h | 8 ++- - drivers/event/cnxk/cn9k_worker.h | 9 ++- - drivers/net/cnxk/cn10k_tx.h | 97 ++++++++++++++++++-------- - drivers/net/cnxk/cn9k_tx.h | 88 +++++++++++++++-------- - 5 files changed, 136 insertions(+), 67 deletions(-) + drivers/event/cnxk/cn10k_tx_worker.h | 8 ++- + drivers/event/cnxk/cn9k_worker.h | 9 ++- + drivers/net/cnxk/cn10k_tx.h | 97 +++++++++++++++++++--------- + drivers/net/cnxk/cn9k_tx.h | 88 ++++++++++++++++--------- + 4 files changed, 135 insertions(+), 67 deletions(-) @@ -27,12 +27,0 @@ -diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst -index 6f6cc06dbb..639de93b79 100644 ---- a/doc/guides/rel_notes/release_24_03.rst -+++ b/doc/guides/rel_notes/release_24_03.rst -@@ -115,6 +115,7 @@ New Features - * Added support for ``RTE_FLOW_ITEM_TYPE_PPPOES`` flow item. - * Added support for ``RTE_FLOW_ACTION_TYPE_SAMPLE`` flow item. - * Added support for Rx inject. -+ * Optimized SW external mbuf free for better performance and avoid SQ corruption. - - * **Updated Marvell OCTEON EP driver.** - @@ -80 +69 @@ -index e8863e42fc..a8e998951c 100644 +index 0451157812..107265d54b 100644 @@ -83 +72 @@ -@@ -749,7 +749,7 @@ static __rte_always_inline uint16_t +@@ -746,7 +746,7 @@ static __rte_always_inline uint16_t @@ -92 +81 @@ -@@ -770,7 +770,7 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, +@@ -767,7 +767,7 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, @@ -101 +90 @@ -@@ -792,7 +792,7 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, +@@ -789,7 +789,7 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, @@ -110 +99 @@ -@@ -822,6 +822,9 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, +@@ -819,6 +819,9 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, @@ -121 +110 @@ -index 266c899a05..5c4b9e559e 100644 +index cc480d24e8..5dff578ba4 100644 @@ -124 +113 @@ -@@ -733,8 +733,19 @@ cn10k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, +@@ -784,8 +784,19 @@ cn10k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, @@ -145 +134 @@ -@@ -742,7 +753,8 @@ cn10k_nix_prefree_seg(struct rte_mbuf *m, struct cn10k_eth_txq *txq, +@@ -793,7 +804,8 @@ cn10k_nix_prefree_seg(struct rte_mbuf *m, struct cn10k_eth_txq *txq, @@ -155 +144 @@ -@@ -766,7 +778,8 @@ cn10k_nix_prefree_seg(struct rte_mbuf *m, struct cn10k_eth_txq *txq, +@@ -817,7 +829,8 @@ cn10k_nix_prefree_seg(struct rte_mbuf *m, struct cn10k_eth_txq *txq, @@ -165 +154 @@ -@@ -790,7 +803,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, +@@ -841,7 +854,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, @@ -175 +164 @@ -@@ -820,7 +834,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, +@@ -871,7 +885,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, @@ -185 +174 @@ -@@ -850,7 +865,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, +@@ -901,7 +916,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, @@ -195 +184 @@ -@@ -880,7 +896,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, +@@ -931,7 +947,8 @@ cn10k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct cn10k_eth_txq *txq, @@ -205 +194 @@ -@@ -962,9 +979,9 @@ cn10k_nix_xmit_prepare_tso(struct rte_mbuf *m, const uint64_t flags) +@@ -1013,9 +1030,9 @@ cn10k_nix_xmit_prepare_tso(struct rte_mbuf *m, const uint64_t flags) @@ -218 +207 @@ -@@ -1164,7 +1181,7 @@ cn10k_nix_xmit_prepare(struct cn10k_eth_txq *txq, +@@ -1215,7 +1232,7 @@ cn10k_nix_xmit_prepare(struct cn10k_eth_txq *txq, @@ -227 +216 @@ -@@ -1240,8 +1257,8 @@ cn10k_nix_xmit_prepare_tstamp(struct cn10k_eth_txq *txq, uintptr_t lmt_addr, +@@ -1291,8 +1308,8 @@ cn10k_nix_xmit_prepare_tstamp(struct cn10k_eth_txq *txq, uintptr_t lmt_addr, @@ -238 +227 @@ -@@ -1284,7 +1301,7 @@ cn10k_nix_prepare_mseg(struct cn10k_eth_txq *txq, +@@ -1335,7 +1352,7 @@ cn10k_nix_prepare_mseg(struct cn10k_eth_txq *txq, @@ -247 +236 @@ -@@ -1331,7 +1348,7 @@ cn10k_nix_prepare_mseg(struct cn10k_eth_txq *txq, +@@ -1382,7 +1399,7 @@ cn10k_nix_prepare_mseg(struct cn10k_eth_txq *txq, @@ -256 +245 @@ -@@ -1425,6 +1442,7 @@ cn10k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, +@@ -1476,6 +1493,7 @@ cn10k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, @@ -264 +253 @@ -@@ -1479,7 +1497,7 @@ again: +@@ -1530,7 +1548,7 @@ again: @@ -273 +262 @@ -@@ -1554,6 +1572,11 @@ again: +@@ -1605,6 +1623,11 @@ again: @@ -285 +274 @@ -@@ -1569,6 +1592,7 @@ cn10k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, +@@ -1620,6 +1643,7 @@ cn10k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, @@ -293 +282 @@ -@@ -1630,7 +1654,7 @@ again: +@@ -1681,7 +1705,7 @@ again: @@ -302 +291 @@ -@@ -1644,7 +1668,7 @@ again: +@@ -1695,7 +1719,7 @@ again: @@ -311 +300 @@ -@@ -1717,6 +1741,11 @@ again: +@@ -1768,6 +1792,11 @@ again: @@ -323 +312 @@ -@@ -1767,7 +1796,7 @@ cn10k_nix_prepare_tso(struct rte_mbuf *m, union nix_send_hdr_w1_u *w1, +@@ -1818,7 +1847,7 @@ cn10k_nix_prepare_tso(struct rte_mbuf *m, union nix_send_hdr_w1_u *w1, @@ -332 +321 @@ -@@ -1782,7 +1811,7 @@ cn10k_nix_prepare_mseg_vec_noff(struct cn10k_eth_txq *txq, +@@ -1833,7 +1862,7 @@ cn10k_nix_prepare_mseg_vec_noff(struct cn10k_eth_txq *txq, @@ -341 +330 @@ -@@ -1892,7 +1921,7 @@ cn10k_nix_prepare_mseg_vec(struct rte_mbuf *m, uint64_t *cmd, uint64x2_t *cmd0, +@@ -1943,7 +1972,7 @@ cn10k_nix_prepare_mseg_vec(struct rte_mbuf *m, uint64_t *cmd, uint64x2_t *cmd0, @@ -350 +339 @@ -@@ -1910,7 +1939,7 @@ cn10k_nix_prep_lmt_mseg_vector(struct cn10k_eth_txq *txq, +@@ -1961,7 +1990,7 @@ cn10k_nix_prep_lmt_mseg_vector(struct cn10k_eth_txq *txq, @@ -359 +348 @@ -@@ -2063,14 +2092,14 @@ cn10k_nix_lmt_next(uint8_t dw, uintptr_t laddr, uint8_t *lnum, uint8_t *loff, +@@ -2114,14 +2143,14 @@ cn10k_nix_lmt_next(uint8_t dw, uintptr_t laddr, uint8_t *lnum, uint8_t *loff, @@ -376 +365 @@ -@@ -2154,6 +2183,7 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, +@@ -2205,6 +2234,7 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, @@ -384 +373 @@ -@@ -3003,8 +3033,8 @@ again: +@@ -3050,8 +3080,8 @@ again: @@ -395 +384 @@ -@@ -3076,7 +3106,7 @@ again: +@@ -3123,7 +3153,7 @@ again: @@ -404 +393 @@ -@@ -3092,7 +3122,7 @@ again: +@@ -3139,7 +3169,7 @@ again: @@ -413 +402 @@ -@@ -3108,7 +3138,7 @@ again: +@@ -3155,7 +3185,7 @@ again: @@ -422 +411 @@ -@@ -3124,7 +3154,7 @@ again: +@@ -3171,7 +3201,7 @@ again: @@ -431 +420 @@ -@@ -3132,7 +3162,7 @@ again: +@@ -3179,7 +3209,7 @@ again: @@ -440 +429 @@ -@@ -3282,6 +3312,11 @@ again: +@@ -3329,6 +3359,11 @@ again: