From: Bruce Richardson <bruce.richardson@intel.com>
To: "Stanisław Kardach" <kda@semihalf.com>
Cc: "Morten Brørup" <mb@smartsharesystems.com>,
"Stephen Hemminger" <stephen@networkplumber.org>,
"Vladimir Medvedkin" <vladimir.medvedkin@intel.com>,
"Michal Mazurek" <maz@semihalf.com>, dev <dev@dpdk.org>,
"Frank Zhao" <Frank.Zhao@starfivetech.com>,
"Sam Grove" <sam.grove@sifive.com>,
"Marcin Wojtas" <mw@semihalf.com>,
upstream@semihalf.com
Subject: Re: [PATCH v2 2/2] lpm: add a scalar version of lookupx4 function
Date: Mon, 30 May 2022 13:46:16 +0100 [thread overview]
Message-ID: <YpS8mAvsABXAKFgz@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <CALVGJWJMDPcao8xWxwjt9ABLDmNbxFE1MrCE14N=xRkHs3gx8A@mail.gmail.com>
On Mon, May 30, 2022 at 01:20:50PM +0200, Stanisław Kardach wrote:
> On Mon, May 30, 2022 at 12:42 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Mon, May 30, 2022 at 10:00:34AM +0200, Morten Brørup wrote:
> > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > Sent: Monday, 30 May 2022 09.52
> > > >
> > > > On Fri, May 27, 2022 at 01:15:20PM -0700, Stephen Hemminger wrote:
> > > > > On Fri, 27 May 2022 20:18:22 +0200
> > > > > Stanislaw Kardach <kda@semihalf.com> wrote:
> > > > >
> > > > > > +static inline void
> > > > > > +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t
> > > > hop[4],
> > > > > > + uint32_t defv)
> > > > > > +{
> > > > > > + uint32_t nh;
> > > > > > + int i, ret;
> > > > > > +
> > > > > > + for (i = 0; i < 4; i++) {
> > > > > > + ret = rte_lpm_lookup(lpm, ((rte_xmm_t)ip).u32[i], &nh);
> > > > > > + hop[i] = (ret == 0) ? nh : defv;
> > > > > > + }
> > > > > > +}
> > > > >
> > > > > For performance, manually unroll the loop.
> > > >
> > > > Given a constant 4x iterations, will compilers not unroll this
> > > > automatically. I think the loop is a little clearer if it can be kept
> > > >
> > > > /Bruce
> > >
> > > If in doubt, add this and look at the assembler output:
> > >
> > > #define REVIEW_INLINE_FUNCTIONS 1
> > >
> > > #if REVIEW_INLINE_FUNCTIONS /* For compiler output review purposes only. */
> > > #pragma GCC diagnostic push
> > > #pragma GCC diagnostic ignored "-Wmissing-prototypes"
> > > void review_rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], uint32_t defv)
> > > {
> > > rte_lpm_lookupx4(lpm, ip, hop, defv);
> > > }
> > > #pragma GCC diagnostic pop
> > > #endif /* REVIEW_INLINE_FUNCTIONS */
> > >
> >
> > Used godbolt.org to check and indeed the function is not unrolled.
> > (Gcc 11.2, with flags "-O3 -march=icelake-server").
> >
> > Manually unrolling changes the assembly generated in interesting ways. For
> > example, it appears to generate more cmov-type instructions for the
> > miss/default-value case rather than using branches as in the looped
> > version. Whether this is better or not may depend upon usecase - if one
> > expects most lpm lookup entries to hit, then having (predictable) branches
> > may well be cheaper.
> >
> > In any case, I'll withdraw any object to unrolling, but I'm still not
> > convinced it's necessary.
> >
> > /Bruce
> Interestingly enough until I've defined unlikely() in godbolt, I did
> not get any automatic unrolling on godbolt (either with x86 or RISC-V
> GCC). Did you get any compilation warnings?
That matches what I saw. I then just used manual unrolling i.e. copy-paste
the 2 lines 4 times, to see what the output was like then.
> That said it only happens on O3 since it implies -fpeel-loops. O3 is
> the default for DPDK.
next prev parent reply other threads:[~2022-05-30 12:46 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-10 11:58 [PATCH 1/1] " Stanislaw Kardach
2022-05-19 17:02 ` Medvedkin, Vladimir
2022-05-24 16:28 ` Stanisław Kardach
2022-05-27 11:16 ` Stanisław Kardach
2022-05-27 13:16 ` Medvedkin, Vladimir
2022-05-27 18:18 ` [PATCH v2 1/2] lpm: add const to lpm arg of rte_lpm_lookup Stanislaw Kardach
2022-05-27 18:18 ` [PATCH v2 2/2] lpm: add a scalar version of lookupx4 function Stanislaw Kardach
2022-05-27 20:15 ` Stephen Hemminger
2022-05-30 7:52 ` Bruce Richardson
2022-05-30 8:00 ` Morten Brørup
2022-05-30 10:42 ` Bruce Richardson
2022-05-30 11:20 ` Stanisław Kardach
2022-05-30 12:46 ` Bruce Richardson [this message]
2022-05-30 18:24 ` [PATCH v3 1/2] lpm: add const to lpm arg of rte_lpm_lookup Stanislaw Kardach
2022-05-30 18:24 ` [PATCH v3 2/2] lpm: add a scalar version of lookupx4 function Stanislaw Kardach
2022-06-01 9:41 ` Medvedkin, Vladimir
2022-06-01 10:32 ` Stanisław Kardach
2022-05-30 20:38 ` [PATCH v3 1/2] lpm: add const to lpm arg of rte_lpm_lookup Stephen Hemminger
2022-06-01 9:35 ` Medvedkin, Vladimir
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YpS8mAvsABXAKFgz@bricha3-MOBL.ger.corp.intel.com \
--to=bruce.richardson@intel.com \
--cc=Frank.Zhao@starfivetech.com \
--cc=dev@dpdk.org \
--cc=kda@semihalf.com \
--cc=maz@semihalf.com \
--cc=mb@smartsharesystems.com \
--cc=mw@semihalf.com \
--cc=sam.grove@sifive.com \
--cc=stephen@networkplumber.org \
--cc=upstream@semihalf.com \
--cc=vladimir.medvedkin@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).