From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 9F90542CB7;
	Wed, 14 Jun 2023 12:33:28 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 7AF4A40E0F;
	Wed, 14 Jun 2023 12:33:28 +0200 (CEST)
Received: from mail-vs1-f47.google.com (mail-vs1-f47.google.com
 [209.85.217.47]) by mails.dpdk.org (Postfix) with ESMTP id 7473840DDB
 for <dev@dpdk.org>; Wed, 14 Jun 2023 12:33:27 +0200 (CEST)
Received: by mail-vs1-f47.google.com with SMTP id
 ada2fe7eead31-43b56039611so571365137.1
 for <dev@dpdk.org>; Wed, 14 Jun 2023 03:33:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20221208; t=1686738807; x=1689330807;
 h=content-transfer-encoding:cc:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:from:to:cc:subject:date
 :message-id:reply-to;
 bh=mc8reDEJkwhpnriWMNcWE9M3L/EW0MLFAy6owytYjBA=;
 b=mz9Mp1zU/5XjPN0lb67D6kSP0/MQDvLZO1wwynt1T6Hn2Y6Y57SHDTK5e3KCVvpiP5
 6oqcL62j3Uy8gMuV5OXR2nIwRfAFMS1g6onwmK9A9Kmuh5Rkxq0LED8CzR01E8tTNDB1
 SftQ5gLFZFxyNTYoSxPt7XkOzRgnMxa+rr/Y2pAq54nmZtAVbcu9xipMIvB9L1P0Ve7P
 DcK5Wcapoy0xy2254OOjmEQcrztF9KI8Ysg0hrUIjXKMRAMtcm731zCzIqToaqW+Fj5g
 55H+YWB1jGQGa2xvOZHB2lW6ny+CTgyTshRsAhtpFoa0rSAy88NgwFrLKeu41o+uwSaf
 dG8g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20221208; t=1686738807; x=1689330807;
 h=content-transfer-encoding:cc:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=mc8reDEJkwhpnriWMNcWE9M3L/EW0MLFAy6owytYjBA=;
 b=VYGaYi7U39/0de08EfRccpp6oNknTyf8QFwLm5qHkmReZw53le4FqLuG1mewRRca0c
 +ln7P0K0OO1x8So/IWXppHp1KRTnMPqEY2OzLOV30B5lkDNi/ro7YJoLbtZukiwZhL47
 Lqf3oqbUTTkbhWUzNiSaM3YaRE32I/nw4YRwciKWpnmVbh9vHUgsuTzyxend3eiz0+pQ
 sb6AOgTp2TnNkWvRMkNLbdnCYcr1f4BnKupwMl2GrNpm8UKJzMIXLRb5QDAN6cO53vNn
 ZcJ8L7J6FmmqB+yNPTjMfRlUGcl7YRRGFM4IpESN78QQhQOlBb0HYCLyRUybs1Mh/C5+
 8MaQ==
X-Gm-Message-State: AC+VfDypCvtuHzP4SClVIWfLkjP7k6+lvwEpFzPqnliK0j34+hvUTNA4
 HicDN6GqqAnIwp0JhTiT7xkGM5UlQHUYsz6rn9z82YMiRfO765tC
X-Google-Smtp-Source: ACHHUZ5U8+q3SYFTBb5Ugaeg9MM+B6z3dDBAuDoKGsqQiGnwiTBHSo8wEqYl8XHHEfHnEuVi73a5oTbIGhknWcE0Lpo=
X-Received: by 2002:a67:ef95:0:b0:43f:4f93:3d33 with SMTP id
 r21-20020a67ef95000000b0043f4f933d33mr171494vsp.5.1686738806601; Wed, 14 Jun
 2023 03:33:26 -0700 (PDT)
MIME-Version: 1.0
References: <20230516143752.4941-1-pbhagavatula@marvell.com>
 <20230613092548.1315-1-pbhagavatula@marvell.com>
 <20230613092548.1315-3-pbhagavatula@marvell.com>
In-Reply-To: <20230613092548.1315-3-pbhagavatula@marvell.com>
From: Jerin Jacob <jerinjacobk@gmail.com>
Date: Wed, 14 Jun 2023 16:03:00 +0530
Message-ID: <CALBAE1MiiRbEm77PZ16uXSzpH9s7BA3KTEbrQeg2n+xaNhnqFw@mail.gmail.com>
Subject: Re: [PATCH v2 3/3] event/cnxk: use WFE in Tx fc wait
To: pbhagavatula@marvell.com
Cc: jerinj@marvell.com, Shijith Thotton <sthotton@marvell.com>, 
 Nithin Dabilpuram <ndabilpuram@marvell.com>,
 Kiran Kumar K <kirankumark@marvell.com>, 
 Sunil Kumar Kori <skori@marvell.com>, Satha Rao <skoteshwar@marvell.com>,
 dev@dpdk.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On Tue, Jun 13, 2023 at 2:56=E2=80=AFPM <pbhagavatula@marvell.com> wrote:
>
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> Use WFE is Tx path when waiting for space in the Tx queue.
> Depending upon the Tx queue contention and size, WFE will
> reduce the cache pressure and power consumption.
> In multi-core scenarios we have observed up to 8W power reduction.
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

Series Applied to dpdk-next-net-eventdev/for-main. Thanks

> ---
>  drivers/event/cnxk/cn10k_tx_worker.h |  18 ++++
>  drivers/net/cnxk/cn10k_tx.h          | 152 +++++++++++++++++++++++----
>  2 files changed, 147 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/event/cnxk/cn10k_tx_worker.h b/drivers/event/cnxk/cn=
10k_tx_worker.h
> index b6c9bb1d26..dea6cdcde2 100644
> --- a/drivers/event/cnxk/cn10k_tx_worker.h
> +++ b/drivers/event/cnxk/cn10k_tx_worker.h
> @@ -24,9 +24,27 @@ cn10k_sso_hws_xtract_meta(struct rte_mbuf *m, const ui=
nt64_t *txq_data)
>  static __rte_always_inline void
>  cn10k_sso_txq_fc_wait(const struct cn10k_eth_txq *txq)
>  {
> +#ifdef RTE_ARCH_ARM64
> +       uint64_t space;
> +
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %[space], [%[addr]]                \=
n"
> +                    "          cmp %[adj], %[space]                    \=
n"
> +                    "          b.hi .Ldne%=3D                           =
 \n"
> +                    "          sevl                                    \=
n"
> +                    ".Lrty%=3D:  wfe                                    =
 \n"
> +                    "          ldxr %[space], [%[addr]]                \=
n"
> +                    "          cmp %[adj], %[space]                    \=
n"
> +                    "          b.ls .Lrty%=3D                           =
 \n"
> +                    ".Ldne%=3D:                                         =
 \n"
> +                    : [space] "=3D&r"(space)
> +                    : [adj] "r"(txq->nb_sqb_bufs_adj), [addr] "r"(txq->f=
c_mem)
> +                    : "memory");
> +#else
>         while ((uint64_t)txq->nb_sqb_bufs_adj <=3D
>                __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED))
>                 ;
> +#endif
>  }
>
>  static __rte_always_inline int32_t
> diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h
> index a365cbe0ee..d0e8350ce2 100644
> --- a/drivers/net/cnxk/cn10k_tx.h
> +++ b/drivers/net/cnxk/cn10k_tx.h
> @@ -102,27 +102,72 @@ cn10k_nix_tx_mbuf_validate(struct rte_mbuf *m, cons=
t uint32_t flags)
>  }
>
>  static __plt_always_inline void
> -cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, int64_t req)
> +cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, uint16_t req)
>  {
>         int64_t cached, refill;
> +       int64_t pkts;
>
>  retry:
> +#ifdef RTE_ARCH_ARM64
> +
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %[pkts], [%[addr]]                 \=
n"
> +                    "          tbz %[pkts], 63, .Ldne%=3D               =
 \n"
> +                    "          sevl                                    \=
n"
> +                    ".Lrty%=3D:  wfe                                    =
 \n"
> +                    "          ldxr %[pkts], [%[addr]]                 \=
n"
> +                    "          tbnz %[pkts], 63, .Lrty%=3D              =
 \n"
> +                    ".Ldne%=3D:                                         =
 \n"
> +                    : [pkts] "=3D&r"(pkts)
> +                    : [addr] "r"(&txq->fc_cache_pkts)
> +                    : "memory");
> +#else
> +       RTE_SET_USED(pkts);
>         while (__atomic_load_n(&txq->fc_cache_pkts, __ATOMIC_RELAXED) < 0=
)
>                 ;
> +#endif
>         cached =3D __atomic_fetch_sub(&txq->fc_cache_pkts, req, __ATOMIC_=
ACQUIRE) - req;
>         /* Check if there is enough space, else update and retry. */
> -       if (cached < 0) {
> -               /* Check if we have space else retry. */
> -               do {
> -                       refill =3D txq->nb_sqb_bufs_adj -
> -                                __atomic_load_n(txq->fc_mem, __ATOMIC_RE=
LAXED);
> -                       refill =3D (refill << txq->sqes_per_sqb_log2) - r=
efill;
> -               } while (refill <=3D 0);
> -               __atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &=
refill,
> -                                         0, __ATOMIC_RELEASE,
> -                                         __ATOMIC_RELAXED);
> +       if (cached >=3D 0)
> +               return;
> +
> +       /* Check if we have space else retry. */
> +#ifdef RTE_ARCH_ARM64
> +       int64_t val;
> +
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %[val], [%[addr]]                  \=
n"
> +                    "          sub %[val], %[adj], %[val]              \=
n"
> +                    "          lsl %[refill], %[val], %[shft]          \=
n"
> +                    "          sub %[refill], %[refill], %[val]        \=
n"
> +                    "          sub %[refill], %[refill], %[sub]        \=
n"
> +                    "          cmp %[refill], #0x0                     \=
n"
> +                    "          b.ge .Ldne%=3D                           =
 \n"
> +                    "          sevl                                    \=
n"
> +                    ".Lrty%=3D:  wfe                                    =
 \n"
> +                    "          ldxr %[val], [%[addr]]                  \=
n"
> +                    "          sub %[val], %[adj], %[val]              \=
n"
> +                    "          lsl %[refill], %[val], %[shft]          \=
n"
> +                    "          sub %[refill], %[refill], %[val]        \=
n"
> +                    "          sub %[refill], %[refill], %[sub]        \=
n"
> +                    "          cmp %[refill], #0x0                     \=
n"
> +                    "          b.lt .Lrty%=3D                           =
 \n"
> +                    ".Ldne%=3D:                                         =
 \n"
> +                    : [refill] "=3D&r"(refill), [val] "=3D&r" (val)
> +                    : [addr] "r"(txq->fc_mem), [adj] "r"(txq->nb_sqb_buf=
s_adj),
> +                      [shft] "r"(txq->sqes_per_sqb_log2), [sub] "r"(req)
> +                    : "memory");
> +#else
> +       do {
> +               refill =3D (txq->nb_sqb_bufs_adj - __atomic_load_n(txq->f=
c_mem, __ATOMIC_RELAXED));
> +               refill =3D (refill << txq->sqes_per_sqb_log2) - refill;
> +               refill -=3D req;
> +       } while (refill < 0);
> +#endif
> +       if (!__atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &ref=
ill,
> +                                 0, __ATOMIC_RELEASE,
> +                                 __ATOMIC_RELAXED))
>                 goto retry;
> -       }
>  }
>
>  /* Function to determine no of tx subdesc required in case ext
> @@ -283,10 +328,27 @@ static __rte_always_inline void
>  cn10k_nix_sec_fc_wait_one(struct cn10k_eth_txq *txq)
>  {
>         uint64_t nb_desc =3D txq->cpt_desc;
> -       uint64_t *fc =3D txq->cpt_fc;
> -
> -       while (nb_desc <=3D __atomic_load_n(fc, __ATOMIC_RELAXED))
> +       uint64_t fc;
> +
> +#ifdef RTE_ARCH_ARM64
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %[space], [%[addr]]                \=
n"
> +                    "          cmp %[nb_desc], %[space]                \=
n"
> +                    "          b.hi .Ldne%=3D                           =
 \n"
> +                    "          sevl                                    \=
n"
> +                    ".Lrty%=3D:  wfe                                    =
 \n"
> +                    "          ldxr %[space], [%[addr]]                \=
n"
> +                    "          cmp %[nb_desc], %[space]                \=
n"
> +                    "          b.ls .Lrty%=3D                           =
 \n"
> +                    ".Ldne%=3D:                                         =
 \n"
> +                    : [space] "=3D&r"(fc)
> +                    : [nb_desc] "r"(nb_desc), [addr] "r"(txq->cpt_fc)
> +                    : "memory");
> +#else
> +       RTE_SET_USED(fc);
> +       while (nb_desc <=3D __atomic_load_n(txq->cpt_fc, __ATOMIC_RELAXED=
))
>                 ;
> +#endif
>  }
>
>  static __rte_always_inline void
> @@ -294,7 +356,7 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, uint=
16_t nb_pkts)
>  {
>         int32_t nb_desc, val, newval;
>         int32_t *fc_sw;
> -       volatile uint64_t *fc;
> +       uint64_t *fc;
>
>         /* Check if there is any CPT instruction to submit */
>         if (!nb_pkts)
> @@ -302,21 +364,59 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, ui=
nt16_t nb_pkts)
>
>  again:
>         fc_sw =3D txq->cpt_fc_sw;
> -       val =3D __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_RELAXED) - nb=
_pkts;
> +#ifdef RTE_ARCH_ARM64
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %w[pkts], [%[addr]]                \=
n"
> +                    "          tbz %w[pkts], 31, .Ldne%=3D              =
 \n"
> +                    "          sevl                                    \=
n"
> +                    ".Lrty%=3D:  wfe                                    =
 \n"
> +                    "          ldxr %w[pkts], [%[addr]]                \=
n"
> +                    "          tbnz %w[pkts], 31, .Lrty%=3D             =
 \n"
> +                    ".Ldne%=3D:                                         =
 \n"
> +                    : [pkts] "=3D&r"(val)
> +                    : [addr] "r"(fc_sw)
> +                    : "memory");
> +#else
> +       /* Wait for primary core to refill FC. */
> +       while (__atomic_load_n(fc_sw, __ATOMIC_RELAXED) < 0)
> +               ;
> +#endif
> +
> +       val =3D __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_ACQUIRE) - nb=
_pkts;
>         if (likely(val >=3D 0))
>                 return;
>
>         nb_desc =3D txq->cpt_desc;
>         fc =3D txq->cpt_fc;
> +#ifdef RTE_ARCH_ARM64
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %[refill], [%[addr]]               \=
n"
> +                    "          sub %[refill], %[desc], %[refill]       \=
n"
> +                    "          sub %[refill], %[refill], %[pkts]       \=
n"
> +                    "          cmp %[refill], #0x0                     \=
n"
> +                    "          b.ge .Ldne%=3D                           =
 \n"
> +                    "          sevl                                    \=
n"
> +                    ".Lrty%=3D:  wfe                                    =
 \n"
> +                    "          ldxr %[refill], [%[addr]]               \=
n"
> +                    "          sub %[refill], %[desc], %[refill]       \=
n"
> +                    "          sub %[refill], %[refill], %[pkts]       \=
n"
> +                    "          cmp %[refill], #0x0                     \=
n"
> +                    "          b.lt .Lrty%=3D                           =
 \n"
> +                    ".Ldne%=3D:                                         =
 \n"
> +                    : [refill] "=3D&r"(newval)
> +                    : [addr] "r"(fc), [desc] "r"(nb_desc), [pkts] "r"(nb=
_pkts)
> +                    : "memory");
> +#else
>         while (true) {
>                 newval =3D nb_desc - __atomic_load_n(fc, __ATOMIC_RELAXED=
);
>                 newval -=3D nb_pkts;
>                 if (newval >=3D 0)
>                         break;
>         }
> +#endif
>
> -       if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false,
> -                                        __ATOMIC_RELAXED, __ATOMIC_RELAX=
ED))
> +       if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, __AT=
OMIC_RELEASE,
> +                                        __ATOMIC_RELAXED))
>                 goto again;
>  }
>
> @@ -3033,10 +3133,16 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, uint64=
_t *ws,
>                 wd.data[1] |=3D ((uint64_t)(lnum - 17)) << 12;
>                 wd.data[1] |=3D (uint64_t)(lmt_id + 16);
>
> -               if (flags & NIX_TX_VWQE_F)
> -                       cn10k_nix_vwqe_wait_fc(txq,
> -                               burst - (cn10k_nix_pkts_per_vec_brst(flag=
s) >>
> -                                        1));
> +               if (flags & NIX_TX_VWQE_F) {
> +                       if (flags & NIX_TX_MULTI_SEG_F) {
> +                               if (burst - (cn10k_nix_pkts_per_vec_brst(=
flags) >> 1) > 0)
> +                                       cn10k_nix_vwqe_wait_fc(txq,
> +                                               burst - (cn10k_nix_pkts_p=
er_vec_brst(flags) >> 1));
> +                       } else {
> +                               cn10k_nix_vwqe_wait_fc(txq,
> +                                               burst - (cn10k_nix_pkts_p=
er_vec_brst(flags) >> 1));
> +                       }
> +               }
>                 /* STEOR1 */
>                 roc_lmt_submit_steorl(wd.data[1], pa);
>         } else if (lnum) {
> --
> 2.25.1
>