From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 184AFA00C4; Mon, 30 May 2022 14:46:26 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C644640694; Mon, 30 May 2022 14:46:25 +0200 (CEST) Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by mails.dpdk.org (Postfix) with ESMTP id 6B2DD400D6 for ; Mon, 30 May 2022 14:46:23 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1653914783; x=1685450783; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=4DL078NVFoCXKKpQVscqdPDexegCqW8rEDL8fHJMUws=; b=THvHl13y34KnA9h8lTZePwF7raGqlavRAwX1IeM5bFDsPJjqF3bvXgLB S5DXikj4SJAKxieiDdIykCW/adC0Igszxd1wilawgXX6BK8uaTZjEe9YW nPo6aBIha0QfEunIg6s/XZbvUrHRgKj+qT1fKpFLH+UqW6xRUj2dTo4g1 GWgk/ZMSe510hVZB6E939+OtNHMoCMMO8vo+tOLhmTkZnOZObFv2eMCUP cmvggSV8vBHq0zJnhMc3HqjTM14pBKh5Pgqc50Vk7PVSYB8+ay4tbzQgo vEgH06G7nyjykQRpcxUMF1fqFBjix3+xijAmd5Pb1iqgwlV7WbU2BHTeF A==; X-IronPort-AV: E=McAfee;i="6400,9594,10362"; a="335640288" X-IronPort-AV: E=Sophos;i="5.91,262,1647327600"; d="scan'208";a="335640288" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 May 2022 05:46:22 -0700 X-IronPort-AV: E=Sophos;i="5.91,262,1647327600"; d="scan'208";a="605173471" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.6.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 30 May 2022 05:46:19 -0700 Date: Mon, 30 May 2022 13:46:16 +0100 From: Bruce Richardson To: =?utf-8?Q?Stanis=C5=82aw?= Kardach Cc: Morten =?iso-8859-1?Q?Br=F8rup?= , Stephen Hemminger , Vladimir Medvedkin , Michal Mazurek , dev , Frank Zhao , Sam Grove , Marcin Wojtas , upstream@semihalf.com Subject: Re: [PATCH v2 2/2] lpm: add a scalar version of lookupx4 function Message-ID: References: <20220510115824.457885-1-kda@semihalf.com> <20220527181822.716758-1-kda@semihalf.com> <20220527181822.716758-2-kda@semihalf.com> <20220527131520.23d9f544@hermes.local> <98CBD80474FA8B44BF855DF32C47DC35D870C0@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Mon, May 30, 2022 at 01:20:50PM +0200, Stanisław Kardach wrote: > On Mon, May 30, 2022 at 12:42 PM Bruce Richardson > wrote: > > > > On Mon, May 30, 2022 at 10:00:34AM +0200, Morten Brørup wrote: > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > > > Sent: Monday, 30 May 2022 09.52 > > > > > > > > On Fri, May 27, 2022 at 01:15:20PM -0700, Stephen Hemminger wrote: > > > > > On Fri, 27 May 2022 20:18:22 +0200 > > > > > Stanislaw Kardach wrote: > > > > > > > > > > > +static inline void > > > > > > +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t > > > > hop[4], > > > > > > + uint32_t defv) > > > > > > +{ > > > > > > + uint32_t nh; > > > > > > + int i, ret; > > > > > > + > > > > > > + for (i = 0; i < 4; i++) { > > > > > > + ret = rte_lpm_lookup(lpm, ((rte_xmm_t)ip).u32[i], &nh); > > > > > > + hop[i] = (ret == 0) ? nh : defv; > > > > > > + } > > > > > > +} > > > > > > > > > > For performance, manually unroll the loop. > > > > > > > > Given a constant 4x iterations, will compilers not unroll this > > > > automatically. I think the loop is a little clearer if it can be kept > > > > > > > > /Bruce > > > > > > If in doubt, add this and look at the assembler output: > > > > > > #define REVIEW_INLINE_FUNCTIONS 1 > > > > > > #if REVIEW_INLINE_FUNCTIONS /* For compiler output review purposes only. */ > > > #pragma GCC diagnostic push > > > #pragma GCC diagnostic ignored "-Wmissing-prototypes" > > > void review_rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], uint32_t defv) > > > { > > > rte_lpm_lookupx4(lpm, ip, hop, defv); > > > } > > > #pragma GCC diagnostic pop > > > #endif /* REVIEW_INLINE_FUNCTIONS */ > > > > > > > Used godbolt.org to check and indeed the function is not unrolled. > > (Gcc 11.2, with flags "-O3 -march=icelake-server"). > > > > Manually unrolling changes the assembly generated in interesting ways. For > > example, it appears to generate more cmov-type instructions for the > > miss/default-value case rather than using branches as in the looped > > version. Whether this is better or not may depend upon usecase - if one > > expects most lpm lookup entries to hit, then having (predictable) branches > > may well be cheaper. > > > > In any case, I'll withdraw any object to unrolling, but I'm still not > > convinced it's necessary. > > > > /Bruce > Interestingly enough until I've defined unlikely() in godbolt, I did > not get any automatic unrolling on godbolt (either with x86 or RISC-V > GCC). Did you get any compilation warnings? That matches what I saw. I then just used manual unrolling i.e. copy-paste the 2 lines 4 times, to see what the output was like then. > That said it only happens on O3 since it implies -fpeel-loops. O3 is > the default for DPDK.