From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9F90542CB7; Wed, 14 Jun 2023 12:33:28 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7AF4A40E0F; Wed, 14 Jun 2023 12:33:28 +0200 (CEST) Received: from mail-vs1-f47.google.com (mail-vs1-f47.google.com [209.85.217.47]) by mails.dpdk.org (Postfix) with ESMTP id 7473840DDB for ; Wed, 14 Jun 2023 12:33:27 +0200 (CEST) Received: by mail-vs1-f47.google.com with SMTP id ada2fe7eead31-43b56039611so571365137.1 for ; Wed, 14 Jun 2023 03:33:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686738807; x=1689330807; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mc8reDEJkwhpnriWMNcWE9M3L/EW0MLFAy6owytYjBA=; b=mz9Mp1zU/5XjPN0lb67D6kSP0/MQDvLZO1wwynt1T6Hn2Y6Y57SHDTK5e3KCVvpiP5 6oqcL62j3Uy8gMuV5OXR2nIwRfAFMS1g6onwmK9A9Kmuh5Rkxq0LED8CzR01E8tTNDB1 SftQ5gLFZFxyNTYoSxPt7XkOzRgnMxa+rr/Y2pAq54nmZtAVbcu9xipMIvB9L1P0Ve7P DcK5Wcapoy0xy2254OOjmEQcrztF9KI8Ysg0hrUIjXKMRAMtcm731zCzIqToaqW+Fj5g 55H+YWB1jGQGa2xvOZHB2lW6ny+CTgyTshRsAhtpFoa0rSAy88NgwFrLKeu41o+uwSaf dG8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686738807; x=1689330807; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mc8reDEJkwhpnriWMNcWE9M3L/EW0MLFAy6owytYjBA=; b=VYGaYi7U39/0de08EfRccpp6oNknTyf8QFwLm5qHkmReZw53le4FqLuG1mewRRca0c +ln7P0K0OO1x8So/IWXppHp1KRTnMPqEY2OzLOV30B5lkDNi/ro7YJoLbtZukiwZhL47 Lqf3oqbUTTkbhWUzNiSaM3YaRE32I/nw4YRwciKWpnmVbh9vHUgsuTzyxend3eiz0+pQ sb6AOgTp2TnNkWvRMkNLbdnCYcr1f4BnKupwMl2GrNpm8UKJzMIXLRb5QDAN6cO53vNn ZcJ8L7J6FmmqB+yNPTjMfRlUGcl7YRRGFM4IpESN78QQhQOlBb0HYCLyRUybs1Mh/C5+ 8MaQ== X-Gm-Message-State: AC+VfDypCvtuHzP4SClVIWfLkjP7k6+lvwEpFzPqnliK0j34+hvUTNA4 HicDN6GqqAnIwp0JhTiT7xkGM5UlQHUYsz6rn9z82YMiRfO765tC X-Google-Smtp-Source: ACHHUZ5U8+q3SYFTBb5Ugaeg9MM+B6z3dDBAuDoKGsqQiGnwiTBHSo8wEqYl8XHHEfHnEuVi73a5oTbIGhknWcE0Lpo= X-Received: by 2002:a67:ef95:0:b0:43f:4f93:3d33 with SMTP id r21-20020a67ef95000000b0043f4f933d33mr171494vsp.5.1686738806601; Wed, 14 Jun 2023 03:33:26 -0700 (PDT) MIME-Version: 1.0 References: <20230516143752.4941-1-pbhagavatula@marvell.com> <20230613092548.1315-1-pbhagavatula@marvell.com> <20230613092548.1315-3-pbhagavatula@marvell.com> In-Reply-To: <20230613092548.1315-3-pbhagavatula@marvell.com> From: Jerin Jacob Date: Wed, 14 Jun 2023 16:03:00 +0530 Message-ID: Subject: Re: [PATCH v2 3/3] event/cnxk: use WFE in Tx fc wait To: pbhagavatula@marvell.com Cc: jerinj@marvell.com, Shijith Thotton , Nithin Dabilpuram , Kiran Kumar K , Sunil Kumar Kori , Satha Rao , dev@dpdk.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Tue, Jun 13, 2023 at 2:56=E2=80=AFPM wrote: > > From: Pavan Nikhilesh > > Use WFE is Tx path when waiting for space in the Tx queue. > Depending upon the Tx queue contention and size, WFE will > reduce the cache pressure and power consumption. > In multi-core scenarios we have observed up to 8W power reduction. > > Signed-off-by: Pavan Nikhilesh Series Applied to dpdk-next-net-eventdev/for-main. Thanks > --- > drivers/event/cnxk/cn10k_tx_worker.h | 18 ++++ > drivers/net/cnxk/cn10k_tx.h | 152 +++++++++++++++++++++++---- > 2 files changed, 147 insertions(+), 23 deletions(-) > > diff --git a/drivers/event/cnxk/cn10k_tx_worker.h b/drivers/event/cnxk/cn= 10k_tx_worker.h > index b6c9bb1d26..dea6cdcde2 100644 > --- a/drivers/event/cnxk/cn10k_tx_worker.h > +++ b/drivers/event/cnxk/cn10k_tx_worker.h > @@ -24,9 +24,27 @@ cn10k_sso_hws_xtract_meta(struct rte_mbuf *m, const ui= nt64_t *txq_data) > static __rte_always_inline void > cn10k_sso_txq_fc_wait(const struct cn10k_eth_txq *txq) > { > +#ifdef RTE_ARCH_ARM64 > + uint64_t space; > + > + asm volatile(PLT_CPU_FEATURE_PREAMBLE > + " ldxr %[space], [%[addr]] \= n" > + " cmp %[adj], %[space] \= n" > + " b.hi .Ldne%=3D = \n" > + " sevl \= n" > + ".Lrty%=3D: wfe = \n" > + " ldxr %[space], [%[addr]] \= n" > + " cmp %[adj], %[space] \= n" > + " b.ls .Lrty%=3D = \n" > + ".Ldne%=3D: = \n" > + : [space] "=3D&r"(space) > + : [adj] "r"(txq->nb_sqb_bufs_adj), [addr] "r"(txq->f= c_mem) > + : "memory"); > +#else > while ((uint64_t)txq->nb_sqb_bufs_adj <=3D > __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED)) > ; > +#endif > } > > static __rte_always_inline int32_t > diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h > index a365cbe0ee..d0e8350ce2 100644 > --- a/drivers/net/cnxk/cn10k_tx.h > +++ b/drivers/net/cnxk/cn10k_tx.h > @@ -102,27 +102,72 @@ cn10k_nix_tx_mbuf_validate(struct rte_mbuf *m, cons= t uint32_t flags) > } > > static __plt_always_inline void > -cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, int64_t req) > +cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, uint16_t req) > { > int64_t cached, refill; > + int64_t pkts; > > retry: > +#ifdef RTE_ARCH_ARM64 > + > + asm volatile(PLT_CPU_FEATURE_PREAMBLE > + " ldxr %[pkts], [%[addr]] \= n" > + " tbz %[pkts], 63, .Ldne%=3D = \n" > + " sevl \= n" > + ".Lrty%=3D: wfe = \n" > + " ldxr %[pkts], [%[addr]] \= n" > + " tbnz %[pkts], 63, .Lrty%=3D = \n" > + ".Ldne%=3D: = \n" > + : [pkts] "=3D&r"(pkts) > + : [addr] "r"(&txq->fc_cache_pkts) > + : "memory"); > +#else > + RTE_SET_USED(pkts); > while (__atomic_load_n(&txq->fc_cache_pkts, __ATOMIC_RELAXED) < 0= ) > ; > +#endif > cached =3D __atomic_fetch_sub(&txq->fc_cache_pkts, req, __ATOMIC_= ACQUIRE) - req; > /* Check if there is enough space, else update and retry. */ > - if (cached < 0) { > - /* Check if we have space else retry. */ > - do { > - refill =3D txq->nb_sqb_bufs_adj - > - __atomic_load_n(txq->fc_mem, __ATOMIC_RE= LAXED); > - refill =3D (refill << txq->sqes_per_sqb_log2) - r= efill; > - } while (refill <=3D 0); > - __atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &= refill, > - 0, __ATOMIC_RELEASE, > - __ATOMIC_RELAXED); > + if (cached >=3D 0) > + return; > + > + /* Check if we have space else retry. */ > +#ifdef RTE_ARCH_ARM64 > + int64_t val; > + > + asm volatile(PLT_CPU_FEATURE_PREAMBLE > + " ldxr %[val], [%[addr]] \= n" > + " sub %[val], %[adj], %[val] \= n" > + " lsl %[refill], %[val], %[shft] \= n" > + " sub %[refill], %[refill], %[val] \= n" > + " sub %[refill], %[refill], %[sub] \= n" > + " cmp %[refill], #0x0 \= n" > + " b.ge .Ldne%=3D = \n" > + " sevl \= n" > + ".Lrty%=3D: wfe = \n" > + " ldxr %[val], [%[addr]] \= n" > + " sub %[val], %[adj], %[val] \= n" > + " lsl %[refill], %[val], %[shft] \= n" > + " sub %[refill], %[refill], %[val] \= n" > + " sub %[refill], %[refill], %[sub] \= n" > + " cmp %[refill], #0x0 \= n" > + " b.lt .Lrty%=3D = \n" > + ".Ldne%=3D: = \n" > + : [refill] "=3D&r"(refill), [val] "=3D&r" (val) > + : [addr] "r"(txq->fc_mem), [adj] "r"(txq->nb_sqb_buf= s_adj), > + [shft] "r"(txq->sqes_per_sqb_log2), [sub] "r"(req) > + : "memory"); > +#else > + do { > + refill =3D (txq->nb_sqb_bufs_adj - __atomic_load_n(txq->f= c_mem, __ATOMIC_RELAXED)); > + refill =3D (refill << txq->sqes_per_sqb_log2) - refill; > + refill -=3D req; > + } while (refill < 0); > +#endif > + if (!__atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &ref= ill, > + 0, __ATOMIC_RELEASE, > + __ATOMIC_RELAXED)) > goto retry; > - } > } > > /* Function to determine no of tx subdesc required in case ext > @@ -283,10 +328,27 @@ static __rte_always_inline void > cn10k_nix_sec_fc_wait_one(struct cn10k_eth_txq *txq) > { > uint64_t nb_desc =3D txq->cpt_desc; > - uint64_t *fc =3D txq->cpt_fc; > - > - while (nb_desc <=3D __atomic_load_n(fc, __ATOMIC_RELAXED)) > + uint64_t fc; > + > +#ifdef RTE_ARCH_ARM64 > + asm volatile(PLT_CPU_FEATURE_PREAMBLE > + " ldxr %[space], [%[addr]] \= n" > + " cmp %[nb_desc], %[space] \= n" > + " b.hi .Ldne%=3D = \n" > + " sevl \= n" > + ".Lrty%=3D: wfe = \n" > + " ldxr %[space], [%[addr]] \= n" > + " cmp %[nb_desc], %[space] \= n" > + " b.ls .Lrty%=3D = \n" > + ".Ldne%=3D: = \n" > + : [space] "=3D&r"(fc) > + : [nb_desc] "r"(nb_desc), [addr] "r"(txq->cpt_fc) > + : "memory"); > +#else > + RTE_SET_USED(fc); > + while (nb_desc <=3D __atomic_load_n(txq->cpt_fc, __ATOMIC_RELAXED= )) > ; > +#endif > } > > static __rte_always_inline void > @@ -294,7 +356,7 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, uint= 16_t nb_pkts) > { > int32_t nb_desc, val, newval; > int32_t *fc_sw; > - volatile uint64_t *fc; > + uint64_t *fc; > > /* Check if there is any CPT instruction to submit */ > if (!nb_pkts) > @@ -302,21 +364,59 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, ui= nt16_t nb_pkts) > > again: > fc_sw =3D txq->cpt_fc_sw; > - val =3D __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_RELAXED) - nb= _pkts; > +#ifdef RTE_ARCH_ARM64 > + asm volatile(PLT_CPU_FEATURE_PREAMBLE > + " ldxr %w[pkts], [%[addr]] \= n" > + " tbz %w[pkts], 31, .Ldne%=3D = \n" > + " sevl \= n" > + ".Lrty%=3D: wfe = \n" > + " ldxr %w[pkts], [%[addr]] \= n" > + " tbnz %w[pkts], 31, .Lrty%=3D = \n" > + ".Ldne%=3D: = \n" > + : [pkts] "=3D&r"(val) > + : [addr] "r"(fc_sw) > + : "memory"); > +#else > + /* Wait for primary core to refill FC. */ > + while (__atomic_load_n(fc_sw, __ATOMIC_RELAXED) < 0) > + ; > +#endif > + > + val =3D __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_ACQUIRE) - nb= _pkts; > if (likely(val >=3D 0)) > return; > > nb_desc =3D txq->cpt_desc; > fc =3D txq->cpt_fc; > +#ifdef RTE_ARCH_ARM64 > + asm volatile(PLT_CPU_FEATURE_PREAMBLE > + " ldxr %[refill], [%[addr]] \= n" > + " sub %[refill], %[desc], %[refill] \= n" > + " sub %[refill], %[refill], %[pkts] \= n" > + " cmp %[refill], #0x0 \= n" > + " b.ge .Ldne%=3D = \n" > + " sevl \= n" > + ".Lrty%=3D: wfe = \n" > + " ldxr %[refill], [%[addr]] \= n" > + " sub %[refill], %[desc], %[refill] \= n" > + " sub %[refill], %[refill], %[pkts] \= n" > + " cmp %[refill], #0x0 \= n" > + " b.lt .Lrty%=3D = \n" > + ".Ldne%=3D: = \n" > + : [refill] "=3D&r"(newval) > + : [addr] "r"(fc), [desc] "r"(nb_desc), [pkts] "r"(nb= _pkts) > + : "memory"); > +#else > while (true) { > newval =3D nb_desc - __atomic_load_n(fc, __ATOMIC_RELAXED= ); > newval -=3D nb_pkts; > if (newval >=3D 0) > break; > } > +#endif > > - if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, > - __ATOMIC_RELAXED, __ATOMIC_RELAX= ED)) > + if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, __AT= OMIC_RELEASE, > + __ATOMIC_RELAXED)) > goto again; > } > > @@ -3033,10 +3133,16 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, uint64= _t *ws, > wd.data[1] |=3D ((uint64_t)(lnum - 17)) << 12; > wd.data[1] |=3D (uint64_t)(lmt_id + 16); > > - if (flags & NIX_TX_VWQE_F) > - cn10k_nix_vwqe_wait_fc(txq, > - burst - (cn10k_nix_pkts_per_vec_brst(flag= s) >> > - 1)); > + if (flags & NIX_TX_VWQE_F) { > + if (flags & NIX_TX_MULTI_SEG_F) { > + if (burst - (cn10k_nix_pkts_per_vec_brst(= flags) >> 1) > 0) > + cn10k_nix_vwqe_wait_fc(txq, > + burst - (cn10k_nix_pkts_p= er_vec_brst(flags) >> 1)); > + } else { > + cn10k_nix_vwqe_wait_fc(txq, > + burst - (cn10k_nix_pkts_p= er_vec_brst(flags) >> 1)); > + } > + } > /* STEOR1 */ > roc_lmt_submit_steorl(wd.data[1], pa); > } else if (lnum) { > -- > 2.25.1 >