From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 66D0AA0540; Tue, 24 May 2022 18:28:40 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0F408400EF; Tue, 24 May 2022 18:28:40 +0200 (CEST) Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) by mails.dpdk.org (Postfix) with ESMTP id 44EA1400D6 for ; Tue, 24 May 2022 18:28:39 +0200 (CEST) Received: by mail-lj1-f174.google.com with SMTP id i23so21532689ljb.4 for ; Tue, 24 May 2022 09:28:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rFPRcvDE7POC6Ebk6rL6dxvG1ePYY1mJC0RdtmMnJLE=; b=cTfui4d/k6YyFhYV6lcEWRNODGxAKfpM5zDItWjsAJQpdwtcrXjhepXbVOXbsiKWFR a5vqaRsWi+1YDcqCmjXArH6M5fSdbn7l2TclMxnYymGD6YzQ6SasqtEN7AgbuFv9sLkr 3RULdlQvMH6cX+SbOb3Q090SB+GZn52obq1E5WjPxy0kYyHM2YuqLtSELu9eFvfsjv0D bv0jqglm/Z1ChYqbEKxQPU2Ie2+thkQHZJo9qztnWV0R+SiepXmDmv9vlp3jgYbxtRXr l6qI31inkSlSw3U2WYri/WtGjERh1zWU08cNXDepcexYumPRFGKujFP3p9cNsPEoBegD 85EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rFPRcvDE7POC6Ebk6rL6dxvG1ePYY1mJC0RdtmMnJLE=; b=DC1WxDpWZaC7P8xKhfy0TCyjv7+1h3iSnZDpHSVL0e60hySgUcQJ5E4K3J6Kl37QEo IuOtEFklRyCR81YJ5zoQIHlDTCzkjPXWX0zEAHx5eKO8hk8XggZ1kB6dC4Y8Zlqdnblg bcy2SWzVJTQxH18Xx02HLchlGOFr153Je6LYUflOi1mM5f4aS5KNicT3bxIxaY6uhRPR Om+eNN6oR/fgdtCRxPyjhOGsw5q12onlPp9/nHDFR8M1tn6cWYPMdwg3FVOskFT/frWs l1EFKFIDQWNKm4fbZ3chMTk5K6/JE/qoW2KMdm3uXcnOatbL+sevEvzM2hxMy7kwIE8x MqJQ== X-Gm-Message-State: AOAM530Els/k4Mya4WidbB8DJglzbBg36hbH3JIs9FKRwvrXXfdXmMSC TdEqpiPgDHrTk9zSyb4ZtxkblaOX5WI/jdFNCEy5Bg== X-Google-Smtp-Source: ABdhPJyOtGEsHlH3QeJ1H9nsjC7qq0x4RisVbcOqUWQX8WhIVSYA9Ov18lp0mzaKNuzMMO/A2nLtqnFxrXS+P6n2PhQ= X-Received: by 2002:a2e:9094:0:b0:253:c354:9c93 with SMTP id l20-20020a2e9094000000b00253c3549c93mr16466090ljg.226.1653409718661; Tue, 24 May 2022 09:28:38 -0700 (PDT) MIME-Version: 1.0 References: <20220510115824.457885-1-kda@semihalf.com> In-Reply-To: From: =?UTF-8?Q?Stanis=C5=82aw_Kardach?= Date: Tue, 24 May 2022 18:28:02 +0200 Message-ID: Subject: Re: [PATCH 1/1] lpm: add a scalar version of lookupx4 function To: "Medvedkin, Vladimir" Cc: Michal Mazurek , dev , Frank Zhao , Sam Grove , Marcin Wojtas , upstream@semihalf.com, Bruce Richardson Content-Type: text/plain; charset="UTF-8" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Thu, May 19, 2022 at 7:04 PM Medvedkin, Vladimir wrote: > > Hi Stanislaw, Michal, > > As far as I can see, this implementation almost completely repeats other > lookupx4() implementations, except for the use of vector instructions. > > On my board (x86_64) in lpm_perf_autotest your implementation takes about: > LPM LookupX4: 29.5 cycles (fails = 12.5%) > > replacing this code with a simple loop with rte_lpm_lookup(): > > uint32_t nh; > int i, ret; > > for (i = 0; i < 4; i++) { > ret = rte_lpm_lookup((struct rte_lpm *)lpm, ((rte_xmm_t)ip).u32[i], &nh); > hop[i] = (ret == 0) ? nh : defv; > } > > works faster: > LPM LookupX4: 22.2 cycles (fails = 12.5%) > > I'm wondering if this will work faster on your board (I assume it it > RISC-V arch)? Hi Vladimir, On my HiFive Unmatched RISC-V board there is a marginal difference (~ -1.56%): Our version: 210.5 cycles (fails = 12.5%) rte_lpm_lookup version: 213.8 cycles (fails = 12.5%) Given that x86 is faster with rte_lpm_lookup, I'll change to this implementation in the next version. That said I wonder why do we have different const requirements for rte_lpm_lookup() and rte_lpm_lookupx4(): static inline int rte_lpm_lookup(struct rte_lpm *lpm, uint32_t ip, uint32_t *next_hop) static inline void rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], uint32_t defv); I think both should be const. > > Thanks! > > On 10/05/2022 12:58, Stanislaw Kardach wrote: > > From: Michal Mazurek > > > > Add an implementation of the rte_lpm_lookupx4() function for platforms > > without support for vector operations. > > > > This will be useful in the upcoming RISC-V port as well as any platform > > which may want to start with a basic level of LPM support. > > > > Signed-off-by: Michal Mazurek > > Signed-off-by: Stanislaw Kardach > > --- > > doc/guides/rel_notes/release_22_07.rst | 5 + > > lib/lpm/meson.build | 1 + > > lib/lpm/rte_lpm.h | 4 +- > > lib/lpm/rte_lpm_scalar.h | 122 +++++++++++++++++++++++++ > > 4 files changed, 131 insertions(+), 1 deletion(-) > > create mode 100644 lib/lpm/rte_lpm_scalar.h > > > > diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst > > index 4ae91dd94d..73e8d632f2 100644 > > --- a/doc/guides/rel_notes/release_22_07.rst > > +++ b/doc/guides/rel_notes/release_22_07.rst > > @@ -70,6 +70,11 @@ New Features > > * Added AH mode support in lookaside protocol (IPsec) for CN9K & CN10K. > > * Added AES-GMAC support in lookaside protocol (IPsec) for CN9K & CN10K. > > > > +* **Added scalar version of the LPM library.** > > + > > + * Added scalar implementation of ``rte_lpm_lookupx4``. This is a fall-back > > + implementation for platforms that don't support vector operations. > > + > > > > Removed Items > > ------------- > > diff --git a/lib/lpm/meson.build b/lib/lpm/meson.build > > index 78d91d3421..6b47361fce 100644 > > --- a/lib/lpm/meson.build > > +++ b/lib/lpm/meson.build > > @@ -14,6 +14,7 @@ headers = files('rte_lpm.h', 'rte_lpm6.h') > > indirect_headers += files( > > 'rte_lpm_altivec.h', > > 'rte_lpm_neon.h', > > + 'rte_lpm_scalar.h', > > 'rte_lpm_sse.h', > > 'rte_lpm_sve.h', > > ) > > diff --git a/lib/lpm/rte_lpm.h b/lib/lpm/rte_lpm.h > > index eb91960e81..b5db6a353a 100644 > > --- a/lib/lpm/rte_lpm.h > > +++ b/lib/lpm/rte_lpm.h > > @@ -405,8 +405,10 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], > > #endif > > #elif defined(RTE_ARCH_PPC_64) > > #include "rte_lpm_altivec.h" > > -#else > > +#elif defined(RTE_ARCH_X86) > > #include "rte_lpm_sse.h" > > +#else > > +#include "rte_lpm_scalar.h" > > #endif > > > > #ifdef __cplusplus > > diff --git a/lib/lpm/rte_lpm_scalar.h b/lib/lpm/rte_lpm_scalar.h > > new file mode 100644 > > index 0000000000..991b94e687 > > --- /dev/null > > +++ b/lib/lpm/rte_lpm_scalar.h > > @@ -0,0 +1,122 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * Copyright(c) 2022 StarFive > > + * Copyright(c) 2022 SiFive > > + * Copyright(c) 2022 Semihalf > > + */ > > + > > +#ifndef _RTE_LPM_SCALAR_H_ > > +#define _RTE_LPM_SCALAR_H_ > > + > > +#include > > +#include > > +#include > > +#include > > + > > +#ifdef __cplusplus > > +extern "C" { > > +#endif > > + > > +static inline void > > +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], > > + uint32_t defv) > > +{ > > + rte_xmm_t i24; > > + rte_xmm_t i8; > > + uint32_t tbl[4]; > > + uint64_t pt, pt2; > > + const uint32_t *ptbl; > > + > > + const rte_xmm_t mask8 = { > > + .u32 = {UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX}}; > > + > > + /* > > + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 2 LPM entries > > + * as one 64-bit value (0x0300000003000000). > > + */ > > + const uint64_t mask_xv = > > + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | > > + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32); > > + > > + /* > > + * RTE_LPM_LOOKUP_SUCCESS for 2 LPM entries > > + * as one 64-bit value (0x0100000001000000). > > + */ > > + const uint64_t mask_v = > > + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | > > + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32); > > + > > + /* get 4 indexes for tbl24[]. */ > > + i24.x = ip; > > + i24.u32[0] >>= CHAR_BIT; > > + i24.u32[1] >>= CHAR_BIT; > > + i24.u32[2] >>= CHAR_BIT; > > + i24.u32[3] >>= CHAR_BIT; > > + > > + /* extract values from tbl24[] */ > > + ptbl = (const uint32_t *)&lpm->tbl24[i24.u32[0]]; > > + tbl[0] = *ptbl; > > + ptbl = (const uint32_t *)&lpm->tbl24[i24.u32[1]]; > > + tbl[1] = *ptbl; > > + ptbl = (const uint32_t *)&lpm->tbl24[i24.u32[2]]; > > + tbl[2] = *ptbl; > > + ptbl = (const uint32_t *)&lpm->tbl24[i24.u32[3]]; > > + tbl[3] = *ptbl; > > + > > + /* get 4 indexes for tbl8[]. */ > > + i8.x = ip; > > + i8.u64[0] &= mask8.u64[0]; > > + i8.u64[1] &= mask8.u64[1]; > > + > > + pt = (uint64_t)tbl[0] | > > + (uint64_t)tbl[1] << 32; > > + pt2 = (uint64_t)tbl[2] | > > + (uint64_t)tbl[3] << 32; > > + > > + /* search successfully finished for all 4 IP addresses. */ > > + if (likely((pt & mask_xv) == mask_v) && > > + likely((pt2 & mask_xv) == mask_v)) { > > + *(uint64_t *)hop = pt & RTE_LPM_MASKX4_RES; > > + *(uint64_t *)(hop + 2) = pt2 & RTE_LPM_MASKX4_RES; > > + return; > > + } > > + > > + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > > + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > > + i8.u32[0] = i8.u32[0] + > > + (tbl[0] & 0x00FFFFFF) * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > > + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[0]]; > > + tbl[0] = *ptbl; > > + } > > + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > > + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > > + i8.u32[1] = i8.u32[1] + > > + (tbl[1] & 0x00FFFFFF) * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > > + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[1]]; > > + tbl[1] = *ptbl; > > + } > > + if (unlikely((pt2 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > > + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > > + i8.u32[2] = i8.u32[2] + > > + (tbl[2] & 0x00FFFFFF) * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > > + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[2]]; > > + tbl[2] = *ptbl; > > + } > > + if (unlikely((pt2 >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > > + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > > + i8.u32[3] = i8.u32[3] + > > + (tbl[3] & 0x00FFFFFF) * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > > + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[3]]; > > + tbl[3] = *ptbl; > > + } > > + > > + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[0] & 0x00FFFFFF : defv; > > + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[1] & 0x00FFFFFF : defv; > > + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[2] & 0x00FFFFFF : defv; > > + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[3] & 0x00FFFFFF : defv; > > +} > > + > > +#ifdef __cplusplus > > +} > > +#endif > > + > > +#endif /* _RTE_LPM_SCALAR_H_ */ > > -- > Regards, > Vladimir