From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id BB94142CA8; Tue, 13 Jun 2023 11:26:09 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DF0F242D12; Tue, 13 Jun 2023 11:26:02 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id CDF0442D16 for ; Tue, 13 Jun 2023 11:26:01 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D563Ih013028 for ; Tue, 13 Jun 2023 02:26:01 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=Ip2EjRx2+1IAUgtpUrZXU9i4sD0/vH6Fq0jKVccpDeM=; b=i42adTjgrKhsv81iYGzPihVNfnnV5M9i2VYe7QoFncKd0oqgf/zy+/xGMG0mg8a9BGFs yEon6R9FQzGqB5iYtUt12+rErwwLtbs10JJkId8v5lKnld05vxMJSJ3Vvp7u2ly+4Jy9 zG3gHHxy4pr07kuaGMqohITZpHjlZc0lpopF+FM3+fE0dEfZmE+F/D+hw6kiJqoUkJvn cUI9XdVaskfE20HksqTY23GBLuVOBEXKjhclxl8wNOBaG2xJjud9Dd/+HFbJk7kfZDtb E7Qo6YrcUdK6uyNP2srsCDji5gJMHkuv5yYp7rc8pSmwgWg2f4lklzEms6yVklnrV7m2 Kg== Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0b-0016f401.pphosted.com (PPS) with ESMTPS id 3r4rpkfj1g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Tue, 13 Jun 2023 02:26:01 -0700 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Tue, 13 Jun 2023 02:25:59 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Tue, 13 Jun 2023 02:25:59 -0700 Received: from MININT-80QBFE8.corp.innovium.com (unknown [10.28.164.122]) by maili.marvell.com (Postfix) with ESMTP id 59BEC3F708C; Tue, 13 Jun 2023 02:25:56 -0700 (PDT) From: To: , Pavan Nikhilesh , "Shijith Thotton" , Nithin Dabilpuram , Kiran Kumar K , Sunil Kumar Kori , Satha Rao CC: Subject: [PATCH v2 3/3] event/cnxk: use WFE in Tx fc wait Date: Tue, 13 Jun 2023 14:55:48 +0530 Message-ID: <20230613092548.1315-3-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230613092548.1315-1-pbhagavatula@marvell.com> References: <20230516143752.4941-1-pbhagavatula@marvell.com> <20230613092548.1315-1-pbhagavatula@marvell.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Proofpoint-ORIG-GUID: 9NleZBztNpsG5CrYPyyp7XIwowU6fJdO X-Proofpoint-GUID: 9NleZBztNpsG5CrYPyyp7XIwowU6fJdO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-13_04,2023-06-12_02,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Use WFE is Tx path when waiting for space in the Tx queue. Depending upon the Tx queue contention and size, WFE will reduce the cache pressure and power consumption. In multi-core scenarios we have observed up to 8W power reduction. Signed-off-by: Pavan Nikhilesh --- drivers/event/cnxk/cn10k_tx_worker.h | 18 ++++ drivers/net/cnxk/cn10k_tx.h | 152 +++++++++++++++++++++++---- 2 files changed, 147 insertions(+), 23 deletions(-) diff --git a/drivers/event/cnxk/cn10k_tx_worker.h b/drivers/event/cnxk/cn10k_tx_worker.h index b6c9bb1d26..dea6cdcde2 100644 --- a/drivers/event/cnxk/cn10k_tx_worker.h +++ b/drivers/event/cnxk/cn10k_tx_worker.h @@ -24,9 +24,27 @@ cn10k_sso_hws_xtract_meta(struct rte_mbuf *m, const uint64_t *txq_data) static __rte_always_inline void cn10k_sso_txq_fc_wait(const struct cn10k_eth_txq *txq) { +#ifdef RTE_ARCH_ARM64 + uint64_t space; + + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %[space], [%[addr]] \n" + " cmp %[adj], %[space] \n" + " b.hi .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %[space], [%[addr]] \n" + " cmp %[adj], %[space] \n" + " b.ls .Lrty%= \n" + ".Ldne%=: \n" + : [space] "=&r"(space) + : [adj] "r"(txq->nb_sqb_bufs_adj), [addr] "r"(txq->fc_mem) + : "memory"); +#else while ((uint64_t)txq->nb_sqb_bufs_adj <= __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED)) ; +#endif } static __rte_always_inline int32_t diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index a365cbe0ee..d0e8350ce2 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -102,27 +102,72 @@ cn10k_nix_tx_mbuf_validate(struct rte_mbuf *m, const uint32_t flags) } static __plt_always_inline void -cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, int64_t req) +cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, uint16_t req) { int64_t cached, refill; + int64_t pkts; retry: +#ifdef RTE_ARCH_ARM64 + + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %[pkts], [%[addr]] \n" + " tbz %[pkts], 63, .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %[pkts], [%[addr]] \n" + " tbnz %[pkts], 63, .Lrty%= \n" + ".Ldne%=: \n" + : [pkts] "=&r"(pkts) + : [addr] "r"(&txq->fc_cache_pkts) + : "memory"); +#else + RTE_SET_USED(pkts); while (__atomic_load_n(&txq->fc_cache_pkts, __ATOMIC_RELAXED) < 0) ; +#endif cached = __atomic_fetch_sub(&txq->fc_cache_pkts, req, __ATOMIC_ACQUIRE) - req; /* Check if there is enough space, else update and retry. */ - if (cached < 0) { - /* Check if we have space else retry. */ - do { - refill = txq->nb_sqb_bufs_adj - - __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED); - refill = (refill << txq->sqes_per_sqb_log2) - refill; - } while (refill <= 0); - __atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &refill, - 0, __ATOMIC_RELEASE, - __ATOMIC_RELAXED); + if (cached >= 0) + return; + + /* Check if we have space else retry. */ +#ifdef RTE_ARCH_ARM64 + int64_t val; + + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %[val], [%[addr]] \n" + " sub %[val], %[adj], %[val] \n" + " lsl %[refill], %[val], %[shft] \n" + " sub %[refill], %[refill], %[val] \n" + " sub %[refill], %[refill], %[sub] \n" + " cmp %[refill], #0x0 \n" + " b.ge .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %[val], [%[addr]] \n" + " sub %[val], %[adj], %[val] \n" + " lsl %[refill], %[val], %[shft] \n" + " sub %[refill], %[refill], %[val] \n" + " sub %[refill], %[refill], %[sub] \n" + " cmp %[refill], #0x0 \n" + " b.lt .Lrty%= \n" + ".Ldne%=: \n" + : [refill] "=&r"(refill), [val] "=&r" (val) + : [addr] "r"(txq->fc_mem), [adj] "r"(txq->nb_sqb_bufs_adj), + [shft] "r"(txq->sqes_per_sqb_log2), [sub] "r"(req) + : "memory"); +#else + do { + refill = (txq->nb_sqb_bufs_adj - __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED)); + refill = (refill << txq->sqes_per_sqb_log2) - refill; + refill -= req; + } while (refill < 0); +#endif + if (!__atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &refill, + 0, __ATOMIC_RELEASE, + __ATOMIC_RELAXED)) goto retry; - } } /* Function to determine no of tx subdesc required in case ext @@ -283,10 +328,27 @@ static __rte_always_inline void cn10k_nix_sec_fc_wait_one(struct cn10k_eth_txq *txq) { uint64_t nb_desc = txq->cpt_desc; - uint64_t *fc = txq->cpt_fc; - - while (nb_desc <= __atomic_load_n(fc, __ATOMIC_RELAXED)) + uint64_t fc; + +#ifdef RTE_ARCH_ARM64 + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %[space], [%[addr]] \n" + " cmp %[nb_desc], %[space] \n" + " b.hi .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %[space], [%[addr]] \n" + " cmp %[nb_desc], %[space] \n" + " b.ls .Lrty%= \n" + ".Ldne%=: \n" + : [space] "=&r"(fc) + : [nb_desc] "r"(nb_desc), [addr] "r"(txq->cpt_fc) + : "memory"); +#else + RTE_SET_USED(fc); + while (nb_desc <= __atomic_load_n(txq->cpt_fc, __ATOMIC_RELAXED)) ; +#endif } static __rte_always_inline void @@ -294,7 +356,7 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, uint16_t nb_pkts) { int32_t nb_desc, val, newval; int32_t *fc_sw; - volatile uint64_t *fc; + uint64_t *fc; /* Check if there is any CPT instruction to submit */ if (!nb_pkts) @@ -302,21 +364,59 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, uint16_t nb_pkts) again: fc_sw = txq->cpt_fc_sw; - val = __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_RELAXED) - nb_pkts; +#ifdef RTE_ARCH_ARM64 + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %w[pkts], [%[addr]] \n" + " tbz %w[pkts], 31, .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %w[pkts], [%[addr]] \n" + " tbnz %w[pkts], 31, .Lrty%= \n" + ".Ldne%=: \n" + : [pkts] "=&r"(val) + : [addr] "r"(fc_sw) + : "memory"); +#else + /* Wait for primary core to refill FC. */ + while (__atomic_load_n(fc_sw, __ATOMIC_RELAXED) < 0) + ; +#endif + + val = __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_ACQUIRE) - nb_pkts; if (likely(val >= 0)) return; nb_desc = txq->cpt_desc; fc = txq->cpt_fc; +#ifdef RTE_ARCH_ARM64 + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %[refill], [%[addr]] \n" + " sub %[refill], %[desc], %[refill] \n" + " sub %[refill], %[refill], %[pkts] \n" + " cmp %[refill], #0x0 \n" + " b.ge .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %[refill], [%[addr]] \n" + " sub %[refill], %[desc], %[refill] \n" + " sub %[refill], %[refill], %[pkts] \n" + " cmp %[refill], #0x0 \n" + " b.lt .Lrty%= \n" + ".Ldne%=: \n" + : [refill] "=&r"(newval) + : [addr] "r"(fc), [desc] "r"(nb_desc), [pkts] "r"(nb_pkts) + : "memory"); +#else while (true) { newval = nb_desc - __atomic_load_n(fc, __ATOMIC_RELAXED); newval -= nb_pkts; if (newval >= 0) break; } +#endif - if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, - __ATOMIC_RELAXED, __ATOMIC_RELAXED)) + if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, __ATOMIC_RELEASE, + __ATOMIC_RELAXED)) goto again; } @@ -3033,10 +3133,16 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, wd.data[1] |= ((uint64_t)(lnum - 17)) << 12; wd.data[1] |= (uint64_t)(lmt_id + 16); - if (flags & NIX_TX_VWQE_F) - cn10k_nix_vwqe_wait_fc(txq, - burst - (cn10k_nix_pkts_per_vec_brst(flags) >> - 1)); + if (flags & NIX_TX_VWQE_F) { + if (flags & NIX_TX_MULTI_SEG_F) { + if (burst - (cn10k_nix_pkts_per_vec_brst(flags) >> 1) > 0) + cn10k_nix_vwqe_wait_fc(txq, + burst - (cn10k_nix_pkts_per_vec_brst(flags) >> 1)); + } else { + cn10k_nix_vwqe_wait_fc(txq, + burst - (cn10k_nix_pkts_per_vec_brst(flags) >> 1)); + } + } /* STEOR1 */ roc_lmt_submit_steorl(wd.data[1], pa); } else if (lnum) { -- 2.25.1