DPDK patches and discussions
 help / color / mirror / Atom feed
From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
To: Patrick Robb <probb@iol.unh.edu>, Jerin Jacob <jerinjacobk@gmail.com>
Cc: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>,
	Shijith Thotton <sthotton@marvell.com>,
	Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>,
	Kiran Kumar Kokkilagadda <kirankumark@marvell.com>,
	Sunil Kumar Kori <skori@marvell.com>,
	Satha Koteswara Rao Kottidi <skoteshwar@marvell.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [EXT] Re: [PATCH v2 3/3] event/cnxk: use WFE in Tx fc wait
Date: Thu, 15 Jun 2023 05:49:25 +0000	[thread overview]
Message-ID: <BY3PR18MB47854174768613E48C1392BAC85BA@BY3PR18MB4785.namprd18.prod.outlook.com> (raw)
In-Reply-To: <CAJvnSUBscXeChXT028=RQzE6L6F_raRQvFDG9EX+gsKCtD+sqA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 16814 bytes --]

Fix merged.
________________________________
From: Patrick Robb <probb@iol.unh.edu>
Sent: Wednesday, June 14, 2023 11:57 PM
To: Jerin Jacob <jerinjacobk@gmail.com>
Cc: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>; Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Shijith Thotton <sthotton@marvell.com>; Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>; Kiran Kumar Kokkilagadda <kirankumark@marvell.com>; Sunil Kumar Kori <skori@marvell.com>; Satha Koteswara Rao Kottidi <skoteshwar@marvell.com>; dev@dpdk.org <dev@dpdk.org>
Subject: [EXT] Re: [PATCH v2 3/3] event/cnxk: use WFE in Tx fc wait

External Email
________________________________
Hello Pavan and Jerin,

The Community Lab's CI testing failed this patchseries on clang compile on ARM systems. That wasn't properly reported to Patchwork due to issues with our reporting process, which we are resolving currently. This updated report show the failed compile. Apologies for the incomplete initial report.

http://mails.dpdk.org/archives/test-report/2023-June/411303.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__mails.dpdk.org_archives_test-2Dreport_2023-2DJune_411303.html&d=DwMGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=3-k4hoFpHV-mJao0lg_yOPeBgGXugZwXaQ7lE17jZ2zwns7YlXt6exJ2gfSLoFGW&s=9FXCmqxHZTcV3Z4Di2iBAheusgkMC5cGx7UfGrOTS_M&e=>

On Wed, Jun 14, 2023 at 6:33 AM Jerin Jacob <jerinjacobk@gmail.com<mailto:jerinjacobk@gmail.com>> wrote:
On Tue, Jun 13, 2023 at 2:56 PM <pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>> wrote:
>
> From: Pavan Nikhilesh <pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>>
>
> Use WFE is Tx path when waiting for space in the Tx queue.
> Depending upon the Tx queue contention and size, WFE will
> reduce the cache pressure and power consumption.
> In multi-core scenarios we have observed up to 8W power reduction.
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>>

Series Applied to dpdk-next-net-eventdev/for-main. Thanks

> ---
>  drivers/event/cnxk/cn10k_tx_worker.h |  18 ++++
>  drivers/net/cnxk/cn10k_tx.h          | 152 +++++++++++++++++++++++----
>  2 files changed, 147 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/event/cnxk/cn10k_tx_worker.h b/drivers/event/cnxk/cn10k_tx_worker.h
> index b6c9bb1d26..dea6cdcde2 100644
> --- a/drivers/event/cnxk/cn10k_tx_worker.h
> +++ b/drivers/event/cnxk/cn10k_tx_worker.h
> @@ -24,9 +24,27 @@ cn10k_sso_hws_xtract_meta(struct rte_mbuf *m, const uint64_t *txq_data)
>  static __rte_always_inline void
>  cn10k_sso_txq_fc_wait(const struct cn10k_eth_txq *txq)
>  {
> +#ifdef RTE_ARCH_ARM64
> +       uint64_t space;
> +
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %[space], [%[addr]]                \n"
> +                    "          cmp %[adj], %[space]                    \n"
> +                    "          b.hi .Ldne%=                            \n"
> +                    "          sevl                                    \n"
> +                    ".Lrty%=:  wfe                                     \n"
> +                    "          ldxr %[space], [%[addr]]                \n"
> +                    "          cmp %[adj], %[space]                    \n"
> +                    "          b.ls<https://urldefense.proofpoint.com/v2/url?u=http-3A__b.ls&d=DwMGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=3-k4hoFpHV-mJao0lg_yOPeBgGXugZwXaQ7lE17jZ2zwns7YlXt6exJ2gfSLoFGW&s=7_L6grMaTkrE6l15-rPboGrijiBzvmc0jeDUK1qWJt0&e=> .Lrty%=                            \n"
> +                    ".Ldne%=:                                          \n"
> +                    : [space] "=&r"(space)
> +                    : [adj] "r"(txq->nb_sqb_bufs_adj), [addr] "r"(txq->fc_mem)
> +                    : "memory");
> +#else
>         while ((uint64_t)txq->nb_sqb_bufs_adj <=
>                __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED))
>                 ;
> +#endif
>  }
>
>  static __rte_always_inline int32_t
> diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h
> index a365cbe0ee..d0e8350ce2 100644
> --- a/drivers/net/cnxk/cn10k_tx.h
> +++ b/drivers/net/cnxk/cn10k_tx.h
> @@ -102,27 +102,72 @@ cn10k_nix_tx_mbuf_validate(struct rte_mbuf *m, const uint32_t flags)
>  }
>
>  static __plt_always_inline void
> -cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, int64_t req)
> +cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, uint16_t req)
>  {
>         int64_t cached, refill;
> +       int64_t pkts;
>
>  retry:
> +#ifdef RTE_ARCH_ARM64
> +
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %[pkts], [%[addr]]                 \n"
> +                    "          tbz %[pkts], 63, .Ldne%=                \n"
> +                    "          sevl                                    \n"
> +                    ".Lrty%=:  wfe                                     \n"
> +                    "          ldxr %[pkts], [%[addr]]                 \n"
> +                    "          tbnz %[pkts], 63, .Lrty%=               \n"
> +                    ".Ldne%=:                                          \n"
> +                    : [pkts] "=&r"(pkts)
> +                    : [addr] "r"(&txq->fc_cache_pkts)
> +                    : "memory");
> +#else
> +       RTE_SET_USED(pkts);
>         while (__atomic_load_n(&txq->fc_cache_pkts, __ATOMIC_RELAXED) < 0)
>                 ;
> +#endif
>         cached = __atomic_fetch_sub(&txq->fc_cache_pkts, req, __ATOMIC_ACQUIRE) - req;
>         /* Check if there is enough space, else update and retry. */
> -       if (cached < 0) {
> -               /* Check if we have space else retry. */
> -               do {
> -                       refill = txq->nb_sqb_bufs_adj -
> -                                __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED);
> -                       refill = (refill << txq->sqes_per_sqb_log2) - refill;
> -               } while (refill <= 0);
> -               __atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &refill,
> -                                         0, __ATOMIC_RELEASE,
> -                                         __ATOMIC_RELAXED);
> +       if (cached >= 0)
> +               return;
> +
> +       /* Check if we have space else retry. */
> +#ifdef RTE_ARCH_ARM64
> +       int64_t val;
> +
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %[val], [%[addr]]                  \n"
> +                    "          sub %[val], %[adj], %[val]              \n"
> +                    "          lsl %[refill], %[val], %[shft]          \n"
> +                    "          sub %[refill], %[refill], %[val]        \n"
> +                    "          sub %[refill], %[refill], %[sub]        \n"
> +                    "          cmp %[refill], #0x0                     \n"
> +                    "          b.ge<https://urldefense.proofpoint.com/v2/url?u=http-3A__b.ge&d=DwMGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=3-k4hoFpHV-mJao0lg_yOPeBgGXugZwXaQ7lE17jZ2zwns7YlXt6exJ2gfSLoFGW&s=poGRidtbMlekMuju7dkSIMPVgX_NvzmkSh4kyoMwmj4&e=> .Ldne%=                            \n"
> +                    "          sevl                                    \n"
> +                    ".Lrty%=:  wfe                                     \n"
> +                    "          ldxr %[val], [%[addr]]                  \n"
> +                    "          sub %[val], %[adj], %[val]              \n"
> +                    "          lsl %[refill], %[val], %[shft]          \n"
> +                    "          sub %[refill], %[refill], %[val]        \n"
> +                    "          sub %[refill], %[refill], %[sub]        \n"
> +                    "          cmp %[refill], #0x0                     \n"
> +                    "          b.lt<https://urldefense.proofpoint.com/v2/url?u=http-3A__b.lt&d=DwMGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=3-k4hoFpHV-mJao0lg_yOPeBgGXugZwXaQ7lE17jZ2zwns7YlXt6exJ2gfSLoFGW&s=Hn_UpnhODUQpAWw-9ZlxrzNy2MyxX_kJS6d2z99kqao&e=> .Lrty%=                            \n"
> +                    ".Ldne%=:                                          \n"
> +                    : [refill] "=&r"(refill), [val] "=&r" (val)
> +                    : [addr] "r"(txq->fc_mem), [adj] "r"(txq->nb_sqb_bufs_adj),
> +                      [shft] "r"(txq->sqes_per_sqb_log2), [sub] "r"(req)
> +                    : "memory");
> +#else
> +       do {
> +               refill = (txq->nb_sqb_bufs_adj - __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED));
> +               refill = (refill << txq->sqes_per_sqb_log2) - refill;
> +               refill -= req;
> +       } while (refill < 0);
> +#endif
> +       if (!__atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &refill,
> +                                 0, __ATOMIC_RELEASE,
> +                                 __ATOMIC_RELAXED))
>                 goto retry;
> -       }
>  }
>
>  /* Function to determine no of tx subdesc required in case ext
> @@ -283,10 +328,27 @@ static __rte_always_inline void
>  cn10k_nix_sec_fc_wait_one(struct cn10k_eth_txq *txq)
>  {
>         uint64_t nb_desc = txq->cpt_desc;
> -       uint64_t *fc = txq->cpt_fc;
> -
> -       while (nb_desc <= __atomic_load_n(fc, __ATOMIC_RELAXED))
> +       uint64_t fc;
> +
> +#ifdef RTE_ARCH_ARM64
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %[space], [%[addr]]                \n"
> +                    "          cmp %[nb_desc], %[space]                \n"
> +                    "          b.hi .Ldne%=                            \n"
> +                    "          sevl                                    \n"
> +                    ".Lrty%=:  wfe                                     \n"
> +                    "          ldxr %[space], [%[addr]]                \n"
> +                    "          cmp %[nb_desc], %[space]                \n"
> +                    "          b.ls<https://urldefense.proofpoint.com/v2/url?u=http-3A__b.ls&d=DwMGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=3-k4hoFpHV-mJao0lg_yOPeBgGXugZwXaQ7lE17jZ2zwns7YlXt6exJ2gfSLoFGW&s=7_L6grMaTkrE6l15-rPboGrijiBzvmc0jeDUK1qWJt0&e=> .Lrty%=                            \n"
> +                    ".Ldne%=:                                          \n"
> +                    : [space] "=&r"(fc)
> +                    : [nb_desc] "r"(nb_desc), [addr] "r"(txq->cpt_fc)
> +                    : "memory");
> +#else
> +       RTE_SET_USED(fc);
> +       while (nb_desc <= __atomic_load_n(txq->cpt_fc, __ATOMIC_RELAXED))
>                 ;
> +#endif
>  }
>
>  static __rte_always_inline void
> @@ -294,7 +356,7 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, uint16_t nb_pkts)
>  {
>         int32_t nb_desc, val, newval;
>         int32_t *fc_sw;
> -       volatile uint64_t *fc;
> +       uint64_t *fc;
>
>         /* Check if there is any CPT instruction to submit */
>         if (!nb_pkts)
> @@ -302,21 +364,59 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, uint16_t nb_pkts)
>
>  again:
>         fc_sw = txq->cpt_fc_sw;
> -       val = __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_RELAXED) - nb_pkts;
> +#ifdef RTE_ARCH_ARM64
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %w[pkts], [%[addr]]                \n"
> +                    "          tbz %w[pkts], 31, .Ldne%=               \n"
> +                    "          sevl                                    \n"
> +                    ".Lrty%=:  wfe                                     \n"
> +                    "          ldxr %w[pkts], [%[addr]]                \n"
> +                    "          tbnz %w[pkts], 31, .Lrty%=              \n"
> +                    ".Ldne%=:                                          \n"
> +                    : [pkts] "=&r"(val)
> +                    : [addr] "r"(fc_sw)
> +                    : "memory");
> +#else
> +       /* Wait for primary core to refill FC. */
> +       while (__atomic_load_n(fc_sw, __ATOMIC_RELAXED) < 0)
> +               ;
> +#endif
> +
> +       val = __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_ACQUIRE) - nb_pkts;
>         if (likely(val >= 0))
>                 return;
>
>         nb_desc = txq->cpt_desc;
>         fc = txq->cpt_fc;
> +#ifdef RTE_ARCH_ARM64
> +       asm volatile(PLT_CPU_FEATURE_PREAMBLE
> +                    "          ldxr %[refill], [%[addr]]               \n"
> +                    "          sub %[refill], %[desc], %[refill]       \n"
> +                    "          sub %[refill], %[refill], %[pkts]       \n"
> +                    "          cmp %[refill], #0x0                     \n"
> +                    "          b.ge<https://urldefense.proofpoint.com/v2/url?u=http-3A__b.ge&d=DwMGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=3-k4hoFpHV-mJao0lg_yOPeBgGXugZwXaQ7lE17jZ2zwns7YlXt6exJ2gfSLoFGW&s=poGRidtbMlekMuju7dkSIMPVgX_NvzmkSh4kyoMwmj4&e=> .Ldne%=                            \n"
> +                    "          sevl                                    \n"
> +                    ".Lrty%=:  wfe                                     \n"
> +                    "          ldxr %[refill], [%[addr]]               \n"
> +                    "          sub %[refill], %[desc], %[refill]       \n"
> +                    "          sub %[refill], %[refill], %[pkts]       \n"
> +                    "          cmp %[refill], #0x0                     \n"
> +                    "          b.lt<https://urldefense.proofpoint.com/v2/url?u=http-3A__b.lt&d=DwMGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=3-k4hoFpHV-mJao0lg_yOPeBgGXugZwXaQ7lE17jZ2zwns7YlXt6exJ2gfSLoFGW&s=Hn_UpnhODUQpAWw-9ZlxrzNy2MyxX_kJS6d2z99kqao&e=> .Lrty%=                            \n"
> +                    ".Ldne%=:                                          \n"
> +                    : [refill] "=&r"(newval)
> +                    : [addr] "r"(fc), [desc] "r"(nb_desc), [pkts] "r"(nb_pkts)
> +                    : "memory");
> +#else
>         while (true) {
>                 newval = nb_desc - __atomic_load_n(fc, __ATOMIC_RELAXED);
>                 newval -= nb_pkts;
>                 if (newval >= 0)
>                         break;
>         }
> +#endif
>
> -       if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false,
> -                                        __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> +       if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, __ATOMIC_RELEASE,
> +                                        __ATOMIC_RELAXED))
>                 goto again;
>  }
>
> @@ -3033,10 +3133,16 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws,
>                 wd.data[1] |= ((uint64_t)(lnum - 17)) << 12;
>                 wd.data[1] |= (uint64_t)(lmt_id + 16);
>
> -               if (flags & NIX_TX_VWQE_F)
> -                       cn10k_nix_vwqe_wait_fc(txq,
> -                               burst - (cn10k_nix_pkts_per_vec_brst(flags) >>
> -                                        1));
> +               if (flags & NIX_TX_VWQE_F) {
> +                       if (flags & NIX_TX_MULTI_SEG_F) {
> +                               if (burst - (cn10k_nix_pkts_per_vec_brst(flags) >> 1) > 0)
> +                                       cn10k_nix_vwqe_wait_fc(txq,
> +                                               burst - (cn10k_nix_pkts_per_vec_brst(flags) >> 1));
> +                       } else {
> +                               cn10k_nix_vwqe_wait_fc(txq,
> +                                               burst - (cn10k_nix_pkts_per_vec_brst(flags) >> 1));
> +                       }
> +               }
>                 /* STEOR1 */
>                 roc_lmt_submit_steorl(wd.data[1], pa);
>         } else if (lnum) {
> --
> 2.25.1
>


--

Patrick Robb

Technical Service Manager

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

www.iol.unh.edu<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.iol.unh.edu_&d=DwMGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=3-k4hoFpHV-mJao0lg_yOPeBgGXugZwXaQ7lE17jZ2zwns7YlXt6exJ2gfSLoFGW&s=Y8ZpK3xtg8c-YvTVjw-jOWVGSQ0BWQbtdrl0Asxorfo&e=>


[https://lh4.googleusercontent.com/7sTY8VswXadak_YT0J13osh5ockNVRX2BuYaRsKoTTpkpilBokA0WlocYHLB4q7XUgXNHka6-ns47S8R_am0sOt7MYQQ1ILQ3S-P5aezsrjp3-IsJMmMrErHWmTARNgZhpAx06n2]

[-- Attachment #2: Type: text/html, Size: 36199 bytes --]

  parent reply	other threads:[~2023-06-15  5:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-16 14:37 [PATCH 1/3] event/cnxk: align TX queue buffer adjustment pbhagavatula
2023-05-16 14:37 ` [PATCH 2/3] event/cnxk: use local labels in asm intrinsic pbhagavatula
2023-05-16 14:37 ` [PATCH 3/3] event/cnxk: use WFE in Tx fc wait pbhagavatula
2023-06-12 15:52 ` [PATCH 1/3] event/cnxk: align TX queue buffer adjustment Jerin Jacob
2023-06-13  9:25 ` [PATCH v2 " pbhagavatula
2023-06-13  9:25   ` [PATCH v2 2/3] event/cnxk: use local labels in asm intrinsic pbhagavatula
2023-06-13  9:25   ` [PATCH v2 3/3] event/cnxk: use WFE in Tx fc wait pbhagavatula
2023-06-14 10:33     ` Jerin Jacob
2023-06-14 18:27       ` Patrick Robb
2023-06-14 20:24         ` [EXT] " Pavan Nikhilesh Bhagavatula
2023-06-15  5:49         ` Jerin Jacob Kollanukkaran [this message]
2023-06-15 15:28     ` Stephen Hemminger
2023-06-16  6:46       ` [EXT] " Pavan Nikhilesh Bhagavatula

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BY3PR18MB47854174768613E48C1392BAC85BA@BY3PR18MB4785.namprd18.prod.outlook.com \
    --to=jerinj@marvell.com \
    --cc=dev@dpdk.org \
    --cc=jerinjacobk@gmail.com \
    --cc=kirankumark@marvell.com \
    --cc=ndabilpuram@marvell.com \
    --cc=pbhagavatula@marvell.com \
    --cc=probb@iol.unh.edu \
    --cc=skori@marvell.com \
    --cc=skoteshwar@marvell.com \
    --cc=sthotton@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).