From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 31A5EA04FD; Mon, 30 May 2022 12:42:08 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 24B9A42B75; Mon, 30 May 2022 12:42:08 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by mails.dpdk.org (Postfix) with ESMTP id 6B035400D6 for ; Mon, 30 May 2022 12:42:06 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1653907326; x=1685443326; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=kUZ45ZT0QA0Jmpj7uxBd8fRittBV49QqnVLoBPaog1M=; b=chqoNNBIWAqjp/vB4qteE+thjgVXN7SD9piI2jHc+ZgQCVRuqLeA23sQ 0NeqDGrk8Kkz3cz5acv7wvdktlKPUiEJq3nBRgBDJTyI2nrcHROknivyR L5L93Dej3A1DzPq90H+qeEQDSwPn/dGkfppe0FP4h532tsPPhCA88xK4E iXU8I437+3kjaSwuztJ21gyxGsFFYS0pZ/AamUKSpZvPNXnQ4h1Ro9RG/ 8phtwDSHh2KC4dlq5K473CLfjaJvfJw11NIEe7QBIZWtG5MOqZ+C7Qrsh 5zWw/7sEbJzcZvhb3m6nG26ePv1nAixtsrx2kMJhCVdRDrN7cIZs/6+yK Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10362"; a="254840715" X-IronPort-AV: E=Sophos;i="5.91,262,1647327600"; d="scan'208";a="254840715" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 May 2022 03:42:05 -0700 X-IronPort-AV: E=Sophos;i="5.91,262,1647327600"; d="scan'208";a="632567112" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.6.54]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 30 May 2022 03:42:03 -0700 Date: Mon, 30 May 2022 11:42:00 +0100 From: Bruce Richardson To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: Stephen Hemminger , Stanislaw Kardach , Vladimir Medvedkin , Michal Mazurek , dev@dpdk.org, Frank Zhao , Sam Grove , mw@semihalf.com, upstream@semihalf.com Subject: Re: [PATCH v2 2/2] lpm: add a scalar version of lookupx4 function Message-ID: References: <20220510115824.457885-1-kda@semihalf.com> <20220527181822.716758-1-kda@semihalf.com> <20220527181822.716758-2-kda@semihalf.com> <20220527131520.23d9f544@hermes.local> <98CBD80474FA8B44BF855DF32C47DC35D870C0@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D870C0@smartserver.smartshare.dk> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Mon, May 30, 2022 at 10:00:34AM +0200, Morten Brørup wrote: > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > Sent: Monday, 30 May 2022 09.52 > > > > On Fri, May 27, 2022 at 01:15:20PM -0700, Stephen Hemminger wrote: > > > On Fri, 27 May 2022 20:18:22 +0200 > > > Stanislaw Kardach wrote: > > > > > > > +static inline void > > > > +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t > > hop[4], > > > > + uint32_t defv) > > > > +{ > > > > + uint32_t nh; > > > > + int i, ret; > > > > + > > > > + for (i = 0; i < 4; i++) { > > > > + ret = rte_lpm_lookup(lpm, ((rte_xmm_t)ip).u32[i], &nh); > > > > + hop[i] = (ret == 0) ? nh : defv; > > > > + } > > > > +} > > > > > > For performance, manually unroll the loop. > > > > Given a constant 4x iterations, will compilers not unroll this > > automatically. I think the loop is a little clearer if it can be kept > > > > /Bruce > > If in doubt, add this and look at the assembler output: > > #define REVIEW_INLINE_FUNCTIONS 1 > > #if REVIEW_INLINE_FUNCTIONS /* For compiler output review purposes only. */ > #pragma GCC diagnostic push > #pragma GCC diagnostic ignored "-Wmissing-prototypes" > void review_rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], uint32_t defv) > { > rte_lpm_lookupx4(lpm, ip, hop, defv); > } > #pragma GCC diagnostic pop > #endif /* REVIEW_INLINE_FUNCTIONS */ > Used godbolt.org to check and indeed the function is not unrolled. (Gcc 11.2, with flags "-O3 -march=icelake-server"). Manually unrolling changes the assembly generated in interesting ways. For example, it appears to generate more cmov-type instructions for the miss/default-value case rather than using branches as in the looped version. Whether this is better or not may depend upon usecase - if one expects most lpm lookup entries to hit, then having (predictable) branches may well be cheaper. In any case, I'll withdraw any object to unrolling, but I'm still not convinced it's necessary. /Bruce