From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 7B57DA04FD;
	Mon, 30 May 2022 13:21:28 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 5A12742B75;
	Mon, 30 May 2022 13:21:28 +0200 (CEST)
Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com
 [209.85.167.47]) by mails.dpdk.org (Postfix) with ESMTP id CA02A400D6
 for <dev@dpdk.org>; Mon, 30 May 2022 13:21:26 +0200 (CEST)
Received: by mail-lf1-f47.google.com with SMTP id bf44so667945lfb.0
 for <dev@dpdk.org>; Mon, 30 May 2022 04:21:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=semihalf-com.20210112.gappssmtp.com; s=20210112;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=tG14eQc9CQIiqOwHXiOMfltF5OVfqXtsRa68m5JbtQc=;
 b=awwjN9LuzYu8jrdU9M17wryM3XzfWGg423CZc5PdPjsE4xNzyYsRd9VIDxoP0HpoyA
 5zOQQng6Bl49pRNtFVQg2gqxYIJD412CDgtx9kl0YgtD/R0UOtbn6s8CwAtvj6QBvyJL
 N+8vw28Okr+2Y4aPDKWS6QWY23+THF+vQB7k1xScKwFPTu2qnu5cR5qjBlIyjo+0mgfw
 u4xoQ33v34K+AUpfMB7XUcCFwECnve5necN7PjliyE45u5UX35/IHIrqvxZs0esR2h2G
 AEB2u8n7NT/oRY8hTZZxjcjb7AIml+zw1KRkXdiHa2nFwTWheObTVlpaJQTx1Eiuw0Pk
 sL5A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=tG14eQc9CQIiqOwHXiOMfltF5OVfqXtsRa68m5JbtQc=;
 b=zT96XHTLmpX2Fxp4rwKJZbAmpgGol11gmxHFnFG4edtjOMwZG+YIqbtnVOoiIjbuWt
 gqih43TxhwKcyLQpWV3jsG694HEKadNwQIlAY0/4Y676sz2t6+H5RFa5zxw1VvJ2f5hn
 OQRISdQZ4babzuFYa+I/idIFnCDVt2KSKI6BJP9k3+QtQQVTYC62hhD5yhc+aSPyXsJT
 al68Uwk9e065+StqaT4YReMxJh7Jyw9kXz69/8qeJrlKHZjj5SMcVREMmT9X73JjOsM8
 ZGaytP6X3Hi8iZp3psSgtce73mgD7zTYqDeW2hVGEHfhOYw9mYd1SWLJfpxhjSi6IhCv
 tMzg==
X-Gm-Message-State: AOAM531+QPZmwEVcQgCfxXVlD2hPzltEnGOrkNLA4VvbzKposgc3mLv6
 V0MtJsw+RBp01fqlJufOsngNblJrk5pK4yoFbfB+4w==
X-Google-Smtp-Source: ABdhPJxUqRtbiNo64P1kd+lE4EPb9iIfdONGOzMoQnGzCpMWFmR0PxJneygMZqzuT7vtB02hcZ2yXaGhJciXs/4yuxo=
X-Received: by 2002:a05:6512:e9c:b0:478:e289:a911 with SMTP id
 bi28-20020a0565120e9c00b00478e289a911mr2211471lfb.589.1653909686329; Mon, 30
 May 2022 04:21:26 -0700 (PDT)
MIME-Version: 1.0
References: <20220510115824.457885-1-kda@semihalf.com>
 <20220527181822.716758-1-kda@semihalf.com>
 <20220527181822.716758-2-kda@semihalf.com>
 <20220527131520.23d9f544@hermes.local>
 <YpR3tu6B0gOwCcZj@bricha3-MOBL.ger.corp.intel.com>
 <98CBD80474FA8B44BF855DF32C47DC35D870C0@smartserver.smartshare.dk>
 <YpSfeORa9Y3lLXxx@bricha3-MOBL.ger.corp.intel.com>
In-Reply-To: <YpSfeORa9Y3lLXxx@bricha3-MOBL.ger.corp.intel.com>
From: =?UTF-8?Q?Stanis=C5=82aw_Kardach?= <kda@semihalf.com>
Date: Mon, 30 May 2022 13:20:50 +0200
Message-ID: <CALVGJWJMDPcao8xWxwjt9ABLDmNbxFE1MrCE14N=xRkHs3gx8A@mail.gmail.com>
Subject: Re: [PATCH v2 2/2] lpm: add a scalar version of lookupx4 function
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: =?UTF-8?Q?Morten_Br=C3=B8rup?= <mb@smartsharesystems.com>, 
 Stephen Hemminger <stephen@networkplumber.org>, 
 Vladimir Medvedkin <vladimir.medvedkin@intel.com>,
 Michal Mazurek <maz@semihalf.com>, dev <dev@dpdk.org>, 
 Frank Zhao <Frank.Zhao@starfivetech.com>, Sam Grove <sam.grove@sifive.com>, 
 Marcin Wojtas <mw@semihalf.com>, upstream@semihalf.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On Mon, May 30, 2022 at 12:42 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Mon, May 30, 2022 at 10:00:34AM +0200, Morten Br=C3=B8rup wrote:
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > Sent: Monday, 30 May 2022 09.52
> > >
> > > On Fri, May 27, 2022 at 01:15:20PM -0700, Stephen Hemminger wrote:
> > > > On Fri, 27 May 2022 20:18:22 +0200
> > > > Stanislaw Kardach <kda@semihalf.com> wrote:
> > > >
> > > > > +static inline void
> > > > > +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t
> > > hop[4],
> > > > > +               uint32_t defv)
> > > > > +{
> > > > > +       uint32_t nh;
> > > > > +       int i, ret;
> > > > > +
> > > > > +       for (i =3D 0; i < 4; i++) {
> > > > > +               ret =3D rte_lpm_lookup(lpm, ((rte_xmm_t)ip).u32[i=
], &nh);
> > > > > +               hop[i] =3D (ret =3D=3D 0) ? nh : defv;
> > > > > +       }
> > > > > +}
> > > >
> > > > For performance, manually unroll the loop.
> > >
> > > Given a constant 4x iterations, will compilers not unroll this
> > > automatically. I think the loop is a little clearer if it can be kept
> > >
> > > /Bruce
> >
> > If in doubt, add this and look at the assembler output:
> >
> > #define REVIEW_INLINE_FUNCTIONS 1
> >
> > #if REVIEW_INLINE_FUNCTIONS /* For compiler output review purposes only=
. */
> > #pragma GCC diagnostic push
> > #pragma GCC diagnostic ignored "-Wmissing-prototypes"
> > void review_rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint3=
2_t hop[4], uint32_t defv)
> > {
> >       rte_lpm_lookupx4(lpm, ip, hop, defv);
> > }
> > #pragma GCC diagnostic pop
> > #endif /* REVIEW_INLINE_FUNCTIONS */
> >
>
> Used godbolt.org to check and indeed the function is not unrolled.
> (Gcc 11.2, with flags "-O3 -march=3Dicelake-server").
>
> Manually unrolling changes the assembly generated in interesting ways. Fo=
r
> example, it appears to generate more cmov-type instructions for the
> miss/default-value case rather than using branches as in the looped
> version. Whether this is better or not may depend upon usecase - if one
> expects most lpm lookup entries to hit, then having (predictable) branche=
s
> may well be cheaper.
>
> In any case, I'll withdraw any object to unrolling, but I'm still not
> convinced it's necessary.
>
> /Bruce
Interestingly enough until I've defined unlikely() in godbolt, I did
not get any automatic unrolling on godbolt (either with x86 or RISC-V
GCC). Did you get any compilation warnings?
That said it only happens on O3 since it implies -fpeel-loops. O3 is
the default for DPDK.