From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 31A5EA04FD;
	Mon, 30 May 2022 12:42:08 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 24B9A42B75;
	Mon, 30 May 2022 12:42:08 +0200 (CEST)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by mails.dpdk.org (Postfix) with ESMTP id 6B035400D6
 for <dev@dpdk.org>; Mon, 30 May 2022 12:42:06 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1653907326; x=1685443326;
 h=date:from:to:cc:subject:message-id:references:
 mime-version:content-transfer-encoding:in-reply-to;
 bh=kUZ45ZT0QA0Jmpj7uxBd8fRittBV49QqnVLoBPaog1M=;
 b=chqoNNBIWAqjp/vB4qteE+thjgVXN7SD9piI2jHc+ZgQCVRuqLeA23sQ
 0NeqDGrk8Kkz3cz5acv7wvdktlKPUiEJq3nBRgBDJTyI2nrcHROknivyR
 L5L93Dej3A1DzPq90H+qeEQDSwPn/dGkfppe0FP4h532tsPPhCA88xK4E
 iXU8I437+3kjaSwuztJ21gyxGsFFYS0pZ/AamUKSpZvPNXnQ4h1Ro9RG/
 8phtwDSHh2KC4dlq5K473CLfjaJvfJw11NIEe7QBIZWtG5MOqZ+C7Qrsh
 5zWw/7sEbJzcZvhb3m6nG26ePv1nAixtsrx2kMJhCVdRDrN7cIZs/6+yK Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10362"; a="254840715"
X-IronPort-AV: E=Sophos;i="5.91,262,1647327600"; d="scan'208";a="254840715"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 May 2022 03:42:05 -0700
X-IronPort-AV: E=Sophos;i="5.91,262,1647327600"; d="scan'208";a="632567112"
Received: from bricha3-mobl.ger.corp.intel.com ([10.252.6.54])
 by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA;
 30 May 2022 03:42:03 -0700
Date: Mon, 30 May 2022 11:42:00 +0100
From: Bruce Richardson <bruce.richardson@intel.com>
To: Morten =?iso-8859-1?Q?Br=F8rup?= <mb@smartsharesystems.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
 Stanislaw Kardach <kda@semihalf.com>,
 Vladimir Medvedkin <vladimir.medvedkin@intel.com>,
 Michal Mazurek <maz@semihalf.com>, dev@dpdk.org,
 Frank Zhao <Frank.Zhao@starfivetech.com>,
 Sam Grove <sam.grove@sifive.com>, mw@semihalf.com, upstream@semihalf.com
Subject: Re: [PATCH v2 2/2] lpm: add a scalar version of lookupx4 function
Message-ID: <YpSfeORa9Y3lLXxx@bricha3-MOBL.ger.corp.intel.com>
References: <20220510115824.457885-1-kda@semihalf.com>
 <20220527181822.716758-1-kda@semihalf.com>
 <20220527181822.716758-2-kda@semihalf.com>
 <20220527131520.23d9f544@hermes.local>
 <YpR3tu6B0gOwCcZj@bricha3-MOBL.ger.corp.intel.com>
 <98CBD80474FA8B44BF855DF32C47DC35D870C0@smartserver.smartshare.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D870C0@smartserver.smartshare.dk>
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On Mon, May 30, 2022 at 10:00:34AM +0200, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Monday, 30 May 2022 09.52
> > 
> > On Fri, May 27, 2022 at 01:15:20PM -0700, Stephen Hemminger wrote:
> > > On Fri, 27 May 2022 20:18:22 +0200
> > > Stanislaw Kardach <kda@semihalf.com> wrote:
> > >
> > > > +static inline void
> > > > +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t
> > hop[4],
> > > > +		uint32_t defv)
> > > > +{
> > > > +	uint32_t nh;
> > > > +	int i, ret;
> > > > +
> > > > +	for (i = 0; i < 4; i++) {
> > > > +		ret = rte_lpm_lookup(lpm, ((rte_xmm_t)ip).u32[i], &nh);
> > > > +		hop[i] = (ret == 0) ? nh : defv;
> > > > +	}
> > > > +}
> > >
> > > For performance, manually unroll the loop.
> > 
> > Given a constant 4x iterations, will compilers not unroll this
> > automatically. I think the loop is a little clearer if it can be kept
> > 
> > /Bruce
> 
> If in doubt, add this and look at the assembler output:
> 
> #define REVIEW_INLINE_FUNCTIONS 1
> 
> #if REVIEW_INLINE_FUNCTIONS /* For compiler output review purposes only. */
> #pragma GCC diagnostic push
> #pragma GCC diagnostic ignored "-Wmissing-prototypes"
> void review_rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], uint32_t defv)
> {
> 	rte_lpm_lookupx4(lpm, ip, hop, defv);
> }
> #pragma GCC diagnostic pop
> #endif /* REVIEW_INLINE_FUNCTIONS */
> 

Used godbolt.org to check and indeed the function is not unrolled.
(Gcc 11.2, with flags "-O3 -march=icelake-server").

Manually unrolling changes the assembly generated in interesting ways. For
example, it appears to generate more cmov-type instructions for the
miss/default-value case rather than using branches as in the looped
version. Whether this is better or not may depend upon usecase - if one
expects most lpm lookup entries to hit, then having (predictable) branches
may well be cheaper.

In any case, I'll withdraw any object to unrolling, but I'm still not
convinced it's necessary.

/Bruce