From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 221395F22 for ; Thu, 29 Mar 2018 22:41:54 +0200 (CEST) X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Mar 2018 13:41:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,378,1517904000"; d="scan'208";a="30032121" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.18.84]) by orsmga006.jf.intel.com with SMTP; 29 Mar 2018 13:41:49 -0700 Received: by (sSMTP sendmail emulation); Thu, 29 Mar 2018 21:41:49 +0100 Date: Thu, 29 Mar 2018 21:41:49 +0100 From: Bruce Richardson To: Vladimir Medvedkin Cc: dev@dpdk.org Message-ID: <20180329204148.GA2060@bricha3-MOBL.ger.corp.intel.com> References: <1519249495-16594-1-git-send-email-medvedkinv@gmail.com> <1519249495-16594-2-git-send-email-medvedkinv@gmail.com> <20180329102746.GA4004@bricha3-MOBL.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: Intel Research and Development Ireland Ltd. User-Agent: Mutt/1.9.4 (2018-02-28) Subject: Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Mar 2018 20:41:56 -0000 On Thu, Mar 29, 2018 at 11:11:20PM +0300, Vladimir Medvedkin wrote: > 2018-03-29 13:27 GMT+03:00 Bruce Richardson : > > > On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote: > > > RIB is an alternative to current LPM library. > > > +#define LOOKUP_FUNC(suffix, type, bulk_prefetch) \ > > > +int rte_dir24_8_lookup_bulk_##suffix(void *fib_p, const uint32_t > > *ips, \ > > > + uint64_t *next_hops, const unsigned n) \ > > > +{ \ > > > + struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p; \ > > > + uint64_t tmp; \ > > > + uint32_t i; \ > > > + uint32_t prefetch_offset = RTE_MIN((unsigned)bulk_prefetch, n); \ > > > + \ > > > + RTE_RIB_RETURN_IF_TRUE(((fib == NULL) || (ips == NULL) || \ > > > + (next_hops == NULL)), -EINVAL); \ > > > + \ > > > + for (i = 0; i < prefetch_offset; i++) \ > > > + rte_prefetch0(get_tbl24_p(fib, ips[i])); \ > > > + for (i = 0; i < (n - prefetch_offset); i++) { \ > > > + rte_prefetch0(get_tbl24_p(fib, ips[i + prefetch_offset])); > > \ > > > + tmp = ((type *)fib->tbl24)[ips[i] >> 8]; \ > > > + if (unlikely((tmp & RTE_DIR24_8_VALID_EXT_ENT) == \ > > > + RTE_DIR24_8_VALID_EXT_ENT)) { \ > > > + tmp = ((type *)fib->tbl8)[(uint8_t)ips[i] + \ > > > + ((tmp >> 1) * > > RTE_DIR24_8_TBL8_GRP_NUM_ENT)]; \ > > > + } \ > > > + next_hops[i] = tmp >> 1; \ > > > + } \ > > > + for (; i < n; i++) { \ > > > + tmp = ((type *)fib->tbl24)[ips[i] >> 8]; \ > > > + if (unlikely((tmp & RTE_DIR24_8_VALID_EXT_ENT) == \ > > > + RTE_DIR24_8_VALID_EXT_ENT)) { \ > > > + tmp = ((type *)fib->tbl8)[(uint8_t)ips[i] + \ > > > + ((tmp >> 1) * > > RTE_DIR24_8_TBL8_GRP_NUM_ENT)]; \ > > > + } \ > > > + next_hops[i] = tmp >> 1; \ > > > + } \ > > > + return 0; \ > > > +} \ > > > > What is the advantage of doing this as a macro? Unless I'm missing > > something "suffix" is never actually used in the function at all, and you > > reference the size of the data from fix->nh_sz. Therefore there can be no > > performance benefit from having such a lookup function, that I can see. > > > suffix is to create 4 different function names, look at the end of > rte_dir24_8.c, there are > LOOKUP_FUNC(1b, uint8_t, 5) > LOOKUP_FUNC(2b, uint16_t, 6) > LOOKUP_FUNC(4b, uint32_t, 15) > LOOKUP_FUNC(8b, uint64_t, 12) > > Now I made single lookup function that references the size of the data from > fib->nh_sz instead of static casting to passed type in macro. > was: > BULK RIB Lookup: 24.2 cycles (fails = 0.0%) > become: > BULK RIB Lookup: 26.1 cycles (fails = 0.0%) > proc E3-1230v1@3.6Ghz > > So you are saying that it turned out to be faster to do a lookup of the size rather than hardcoding it. Seems strange, but ok. I'm still confused why the four functions with four different names. What is different between the four implementations. Just the amount of prefetching done? It could still be done by a single call with a compile-time constant parameter. It's whats used a lot in the ring library and works well there. > > > > Therefore, if performance is ok, I suggest just making a single lookup_bulk > > function that works with all sizes - as the inlined lookup function does in > > the header. > > > > Alternatively, if you do want specific functions for each > > entry size, you still don't need macros. Write a single function that takes > > as a final parameter the entry-size and use that in calculations rather > > than nh_sz. Then wrap that function in the set of public ones, passing in > > the final size parameter explicitly as "1", "2", "4" or "8". The compiler > > will then know that as a compile-time constant and generate the correct > > code for each size. However, for this path I suggest you check for any > > resulting performance improvement, e.g. with l3fwd, as I think it's not > > likely to be significant. > > > > > + > > > > > +{ > > > + uint64_t res; > > > + struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p; > > > + > > > + RTE_RIB_RETURN_IF_TRUE(((fib == NULL) || (ip == NULL) || > > > + (next_hop == NULL)), -EINVAL); > > > + > > > + res = RTE_DIR24_8_GET_TBL24(fib, ip); > > > + if (unlikely((res & RTE_DIR24_8_VALID_EXT_ENT) == > > > + RTE_DIR24_8_VALID_EXT_ENT)) { > > > + res = RTE_DIR24_8_GET_TBL8(fib, res, ip); > > > + } > > > + *next_hop = res >> 1; > > > + return 0; > > > +} > > > > Do we need this static inline function? Can the bulk functions do on their > > own? If we can remove this, we can move the most of the header file > > contents, especially the structures, out of the public header. That would > > greatly improve the ease with which ABI can be maintained. > > > It was done in some manner to LPM. There was separate single lookup and > bulk versions. > Of course it is possible to remove this function at all and use bulk > version to lookup single packet. But I thought maybe somebody could use it. > Yes, I understand that. However, if it's unlikely to be used, I would prioritize having ABI-friendliness over having it. /Bruce