From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A8189A04B5; Wed, 30 Sep 2020 15:54:41 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EEA471D669; Wed, 30 Sep 2020 15:54:38 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 286E61D62E for ; Wed, 30 Sep 2020 15:54:36 +0200 (CEST) IronPort-SDR: 35Rz8vArD6kZ9171Jfu/4O6I/W2j3zjZmiFq6aDZaLsTHauoZgvHzWVJluWvu/PEt5AAW8kWgi HmJCpIHdhX5Q== X-IronPort-AV: E=McAfee;i="6000,8403,9759"; a="162505969" X-IronPort-AV: E=Sophos;i="5.77,322,1596524400"; d="scan'208";a="162505969" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Sep 2020 06:54:34 -0700 IronPort-SDR: KATYhuB6NLydyRfYA2rjDhuz+603f1pcNsGx/sn/6+z5Zu22P5/aXf4YQqGmsVBbwejyWeVh9I ZQ/khit9kurw== X-IronPort-AV: E=Sophos;i="5.77,322,1596524400"; d="scan'208";a="457663061" Received: from vmedvedk-mobl.ger.corp.intel.com (HELO [10.252.51.169]) ([10.252.51.169]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Sep 2020 06:54:33 -0700 To: Ciara Power , dev@dpdk.org Cc: Bruce Richardson , Jerin Jacob , Ruifeng Wang References: <20200807155859.63888-1-ciara.power@intel.com> <20200930130415.11211-1-ciara.power@intel.com> <20200930130415.11211-19-ciara.power@intel.com> From: "Medvedkin, Vladimir" Message-ID: <10075e1e-36de-604d-3dd5-12d1eeca5635@intel.com> Date: Wed, 30 Sep 2020 14:54:30 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.2.2 MIME-Version: 1.0 In-Reply-To: <20200930130415.11211-19-ciara.power@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v3 18/18] lpm: choose vector path at runtime X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Ciara, On 30/09/2020 14:04, Ciara Power wrote: > When choosing the vector path, max SIMD bitwidth is now checked to > ensure a vector path is allowable. To do this, rather than the vector > lookup functions being called directly from apps, a generic lookup > function is called which will call the vector functions if suitable. > > Signed-off-by: Ciara Power > --- > lib/librte_lpm/rte_lpm.h | 57 ++++++++++++++++++++++++++------ > lib/librte_lpm/rte_lpm_altivec.h | 2 +- > lib/librte_lpm/rte_lpm_neon.h | 2 +- > lib/librte_lpm/rte_lpm_sse.h | 2 +- > 4 files changed, 50 insertions(+), 13 deletions(-) > > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h > index 03da2d37e0..edba7cafd5 100644 > --- a/lib/librte_lpm/rte_lpm.h > +++ b/lib/librte_lpm/rte_lpm.h > @@ -397,8 +397,18 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t *ips, > /* Mask four results. */ > #define RTE_LPM_MASKX4_RES UINT64_C(0x00ffffff00ffffff) > > +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) > +#include "rte_lpm_neon.h" > +#elif defined(RTE_ARCH_PPC_64) > +#include "rte_lpm_altivec.h" > +#else > +#include "rte_lpm_sse.h" > +#endif > + > /** > - * Lookup four IP addresses in an LPM table. > + * Lookup four IP addresses in an LPM table individually by calling the > + * lookup function for each ip. This is used when lookupx4 is called but > + * the vector path is not suitable. > * > * @param lpm > * LPM object handle > @@ -417,16 +427,43 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t *ips, > * if lookup would fail. > */ > static inline void > -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], > - uint32_t defv); > +rte_lpm_lookupx4_scalar(struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], > + uint32_t defv) > +{ > + int i; > + for (i = 0; i < 4; i++) > + if (rte_lpm_lookup(lpm, ((uint32_t *) &ip)[i], &hop[i]) < 0) > + hop[i] = defv; /* lookupx4 expected to set on failure */ > +} > > -#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) > -#include "rte_lpm_neon.h" > -#elif defined(RTE_ARCH_PPC_64) > -#include "rte_lpm_altivec.h" > -#else > -#include "rte_lpm_sse.h" > -#endif > +/** > + * Lookup four IP addresses in an LPM table. > + * > + * @param lpm > + * LPM object handle > + * @param ip > + * Four IPs to be looked up in the LPM table > + * @param hop > + * Next hop of the most specific rule found for IP (valid on lookup hit only). > + * This is an 4 elements array of two byte values. > + * If the lookup was successful for the given IP, then least significant byte > + * of the corresponding element is the actual next hop and the most > + * significant byte is zero. > + * If the lookup for the given IP failed, then corresponding element would > + * contain default value, see description of then next parameter. > + * @param defv > + * Default value to populate into corresponding element of hop[] array, > + * if lookup would fail. > + */ > +static inline void > +rte_lpm_lookupx4(struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], > + uint32_t defv) > +{ > + if (rte_get_max_simd_bitwidth() >= RTE_MAX_128_SIMD) > + rte_lpm_lookupx4_vec(lpm, ip, hop, defv); > + else > + rte_lpm_lookupx4_scalar(lpm, ip, hop, defv); > +} I'm afraid this will lead to a drop in performance. rte_lpm_lookupx4 is used in the hot path, and a bulk size is too small to amortize the cost of adding this extra logic. > > #ifdef __cplusplus > } > diff --git a/lib/librte_lpm/rte_lpm_altivec.h b/lib/librte_lpm/rte_lpm_altivec.h > index 228c41b38e..82142d3351 100644 > --- a/lib/librte_lpm/rte_lpm_altivec.h > +++ b/lib/librte_lpm/rte_lpm_altivec.h > @@ -16,7 +16,7 @@ extern "C" { > #endif > > static inline void > -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], > +rte_lpm_lookupx4_vec(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], > uint32_t defv) > { > vector signed int i24; > diff --git a/lib/librte_lpm/rte_lpm_neon.h b/lib/librte_lpm/rte_lpm_neon.h > index 6c131d3125..14b184515d 100644 > --- a/lib/librte_lpm/rte_lpm_neon.h > +++ b/lib/librte_lpm/rte_lpm_neon.h > @@ -16,7 +16,7 @@ extern "C" { > #endif > > static inline void > -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], > +rte_lpm_lookupx4_vec(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], > uint32_t defv) > { > uint32x4_t i24; > diff --git a/lib/librte_lpm/rte_lpm_sse.h b/lib/librte_lpm/rte_lpm_sse.h > index 44770b6ff8..cb5477c6cf 100644 > --- a/lib/librte_lpm/rte_lpm_sse.h > +++ b/lib/librte_lpm/rte_lpm_sse.h > @@ -15,7 +15,7 @@ extern "C" { > #endif > > static inline void > -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], > +rte_lpm_lookupx4_vec(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], > uint32_t defv) > { > __m128i i24; > -- Regards, Vladimir