From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id A8189A04B5;
	Wed, 30 Sep 2020 15:54:41 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id EEA471D669;
	Wed, 30 Sep 2020 15:54:38 +0200 (CEST)
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by dpdk.org (Postfix) with ESMTP id 286E61D62E
 for <dev@dpdk.org>; Wed, 30 Sep 2020 15:54:36 +0200 (CEST)
IronPort-SDR: 35Rz8vArD6kZ9171Jfu/4O6I/W2j3zjZmiFq6aDZaLsTHauoZgvHzWVJluWvu/PEt5AAW8kWgi
 HmJCpIHdhX5Q==
X-IronPort-AV: E=McAfee;i="6000,8403,9759"; a="162505969"
X-IronPort-AV: E=Sophos;i="5.77,322,1596524400"; d="scan'208";a="162505969"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2020 06:54:34 -0700
IronPort-SDR: KATYhuB6NLydyRfYA2rjDhuz+603f1pcNsGx/sn/6+z5Zu22P5/aXf4YQqGmsVBbwejyWeVh9I
 ZQ/khit9kurw==
X-IronPort-AV: E=Sophos;i="5.77,322,1596524400"; d="scan'208";a="457663061"
Received: from vmedvedk-mobl.ger.corp.intel.com (HELO [10.252.51.169])
 ([10.252.51.169])
 by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2020 06:54:33 -0700
To: Ciara Power <ciara.power@intel.com>, dev@dpdk.org
Cc: Bruce Richardson <bruce.richardson@intel.com>,
 Jerin Jacob <jerinj@marvell.com>, Ruifeng Wang <ruifeng.wang@arm.com>
References: <20200807155859.63888-1-ciara.power@intel.com>
 <20200930130415.11211-1-ciara.power@intel.com>
 <20200930130415.11211-19-ciara.power@intel.com>
From: "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>
Message-ID: <10075e1e-36de-604d-3dd5-12d1eeca5635@intel.com>
Date: Wed, 30 Sep 2020 14:54:30 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
 Thunderbird/78.2.2
MIME-Version: 1.0
In-Reply-To: <20200930130415.11211-19-ciara.power@intel.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH v3 18/18] lpm: choose vector path at runtime
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Ciara,


On 30/09/2020 14:04, Ciara Power wrote:
> When choosing the vector path, max SIMD bitwidth is now checked to
> ensure a vector path is allowable. To do this, rather than the vector
> lookup functions being called directly from apps, a generic lookup
> function is called which will call the vector functions if suitable.
> 
> Signed-off-by: Ciara Power <ciara.power@intel.com>
> ---
>   lib/librte_lpm/rte_lpm.h         | 57 ++++++++++++++++++++++++++------
>   lib/librte_lpm/rte_lpm_altivec.h |  2 +-
>   lib/librte_lpm/rte_lpm_neon.h    |  2 +-
>   lib/librte_lpm/rte_lpm_sse.h     |  2 +-
>   4 files changed, 50 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index 03da2d37e0..edba7cafd5 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -397,8 +397,18 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t *ips,
>   /* Mask four results. */
>   #define	 RTE_LPM_MASKX4_RES	UINT64_C(0x00ffffff00ffffff)
>   
> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> +#include "rte_lpm_neon.h"
> +#elif defined(RTE_ARCH_PPC_64)
> +#include "rte_lpm_altivec.h"
> +#else
> +#include "rte_lpm_sse.h"
> +#endif
> +
>   /**
> - * Lookup four IP addresses in an LPM table.
> + * Lookup four IP addresses in an LPM table individually by calling the
> + * lookup function for each ip. This is used when lookupx4 is called but
> + * the vector path is not suitable.
>    *
>    * @param lpm
>    *   LPM object handle
> @@ -417,16 +427,43 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t *ips,
>    *   if lookup would fail.
>    */
>   static inline void
> -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> -	uint32_t defv);
> +rte_lpm_lookupx4_scalar(struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> +	uint32_t defv)
> +{
> +	int i;
> +	for (i = 0; i < 4; i++)
> +		if (rte_lpm_lookup(lpm, ((uint32_t *) &ip)[i], &hop[i]) < 0)
> +			hop[i] = defv; /* lookupx4 expected to set on failure */
> +}
>   
> -#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> -#include "rte_lpm_neon.h"
> -#elif defined(RTE_ARCH_PPC_64)
> -#include "rte_lpm_altivec.h"
> -#else
> -#include "rte_lpm_sse.h"
> -#endif
> +/**
> + * Lookup four IP addresses in an LPM table.
> + *
> + * @param lpm
> + *   LPM object handle
> + * @param ip
> + *   Four IPs to be looked up in the LPM table
> + * @param hop
> + *   Next hop of the most specific rule found for IP (valid on lookup hit only).
> + *   This is an 4 elements array of two byte values.
> + *   If the lookup was successful for the given IP, then least significant byte
> + *   of the corresponding element is the  actual next hop and the most
> + *   significant byte is zero.
> + *   If the lookup for the given IP failed, then corresponding element would
> + *   contain default value, see description of then next parameter.
> + * @param defv
> + *   Default value to populate into corresponding element of hop[] array,
> + *   if lookup would fail.
> + */
> +static inline void
> +rte_lpm_lookupx4(struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> +	uint32_t defv)
> +{
> +	if (rte_get_max_simd_bitwidth() >= RTE_MAX_128_SIMD)
> +		rte_lpm_lookupx4_vec(lpm, ip, hop, defv);
> +	else
> +		rte_lpm_lookupx4_scalar(lpm, ip, hop, defv);
> +}

I'm afraid this will lead to a drop in performance. rte_lpm_lookupx4 is 
used in the hot path, and a bulk size is too small to amortize the cost 
of adding this extra logic.

>   
>   #ifdef __cplusplus
>   }
> diff --git a/lib/librte_lpm/rte_lpm_altivec.h b/lib/librte_lpm/rte_lpm_altivec.h
> index 228c41b38e..82142d3351 100644
> --- a/lib/librte_lpm/rte_lpm_altivec.h
> +++ b/lib/librte_lpm/rte_lpm_altivec.h
> @@ -16,7 +16,7 @@ extern "C" {
>   #endif
>   
>   static inline void
> -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> +rte_lpm_lookupx4_vec(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
>   	uint32_t defv)
>   {
>   	vector signed int i24;
> diff --git a/lib/librte_lpm/rte_lpm_neon.h b/lib/librte_lpm/rte_lpm_neon.h
> index 6c131d3125..14b184515d 100644
> --- a/lib/librte_lpm/rte_lpm_neon.h
> +++ b/lib/librte_lpm/rte_lpm_neon.h
> @@ -16,7 +16,7 @@ extern "C" {
>   #endif
>   
>   static inline void
> -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> +rte_lpm_lookupx4_vec(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
>   	uint32_t defv)
>   {
>   	uint32x4_t i24;
> diff --git a/lib/librte_lpm/rte_lpm_sse.h b/lib/librte_lpm/rte_lpm_sse.h
> index 44770b6ff8..cb5477c6cf 100644
> --- a/lib/librte_lpm/rte_lpm_sse.h
> +++ b/lib/librte_lpm/rte_lpm_sse.h
> @@ -15,7 +15,7 @@ extern "C" {
>   #endif
>   
>   static inline void
> -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> +rte_lpm_lookupx4_vec(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
>   	uint32_t defv)
>   {
>   	__m128i i24;
> 

-- 
Regards,
Vladimir