From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id EF76BA057B; Wed, 1 Apr 2020 07:55:11 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D09471BEBC; Wed, 1 Apr 2020 07:55:11 +0200 (CEST) Received: from relay0242.mxlogin.com (relay0242.mxlogin.com [199.181.239.242]) by dpdk.org (Postfix) with ESMTP id 96B4C1BE95 for ; Wed, 1 Apr 2020 07:55:10 +0200 (CEST) Received: from filter004.mxroute.com ([149.28.56.236] 149.28.56.236.vultr.com) (Authenticated sender: mN4UYu2MZsgR) by relay0242.mxlogin.com (ZoneMTA) with ESMTPSA id 171344ebd700000766.001 for (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256); Wed, 01 Apr 2020 05:55:07 +0000 X-Zone-Loop: f7461f6179be43da5f4c4b5afd9f55d2c3f1053c514a X-Originating-IP: [149.28.56.236] Received: from galaxy.mxroute.com (unknown [23.92.70.113]) by filter004.mxroute.com (Postfix) with ESMTPS id 9B71B3ED9B; Wed, 1 Apr 2020 05:55:01 +0000 (UTC) Received: from [134.191.227.39] by galaxy.mxroute.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.91) (envelope-from ) id 1jJVxB-0002Du-MV; Wed, 01 Apr 2020 01:30:18 -0400 To: Vladimir Medvedkin , dev@dpdk.org Cc: konstantin.ananyev@intel.com, bruce.richardson@intel.com References: <1583757826-375246-1-git-send-email-vladimir.medvedkin@intel.com> <1583757826-375246-4-git-send-email-vladimir.medvedkin@intel.com> From: Ray Kinsella Autocrypt: addr=mdr@ashroe.eu; keydata= mQINBFv8B3wBEAC+5ImcgbIvadt3axrTnt7Sxch3FsmWTTomXfB8YiuHT8KL8L/bFRQSL1f6 ASCHu3M89EjYazlY+vJUWLr0BhK5t/YI7bQzrOuYrl9K94vlLwzD19s/zB/g5YGGR5plJr0s JtJsFGEvF9LL3e+FKMRXveQxBB8A51nAHfwG0WSyx53d61DYz7lp4/Y4RagxaJoHp9lakn8j HV2N6rrnF+qt5ukj5SbbKWSzGg5HQF2t0QQ5tzWhCAKTfcPlnP0GymTBfNMGOReWivi3Qqzr S51Xo7hoGujUgNAM41sxpxmhx8xSwcQ5WzmxgAhJ/StNV9cb3HWIoE5StCwQ4uXOLplZNGnS uxNdegvKB95NHZjRVRChg/uMTGpg9PqYbTIFoPXjuk27sxZLRJRrueg4tLbb3HM39CJwSB++ YICcqf2N+GVD48STfcIlpp12/HI+EcDSThzfWFhaHDC0hyirHxJyHXjnZ8bUexI/5zATn/ux TpMbc/vicJxeN+qfaVqPkCbkS71cHKuPluM3jE8aNCIBNQY1/j87k5ELzg3qaesLo2n1krBH bKvFfAmQuUuJT84/IqfdVtrSCTabvDuNBDpYBV0dGbTwaRfE7i+LiJJclUr8lOvHUpJ4Y6a5 0cxEPxm498G12Z3NoY/mP5soItPIPtLR0rA0fage44zSPwp6cQARAQABtBxSYXkgS2luc2Vs bGEgPG1kckBhc2hyb2UuZXU+iQJUBBMBCAA+FiEEcDUDlKDJaDuJlfZfdJdaH/sCCpsFAlv8 B3wCGyMFCQlmAYAFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQdJdaH/sCCptdtRAAl0oE msa+djBVYLIsax+0f8acidtWg2l9f7kc2hEjp9h9aZCpPchQvhhemtew/nKavik3RSnLTAyn B3C/0GNlmvI1l5PFROOgPZwz4xhJKGN7jOsRrbkJa23a8ly5UXwF3Vqnlny7D3z+7cu1qq/f VRK8qFyWkAb+xgqeZ/hTcbJUWtW+l5Zb+68WGEp8hB7TuJLEWb4+VKgHTpQ4vElYj8H3Z94a 04s2PJMbLIZSgmKDASnyrKY0CzTpPXx5rSJ1q+B1FCsfepHLqt3vKSALa3ld6bJ8fSJtDUJ7 JLiU8dFZrywgDIVme01jPbjJtUScW6jONLvhI8Z2sheR71UoKqGomMHNQpZ03ViVWBEALzEt TcjWgJFn8yAmxqM4nBnZ+hE3LbMo34KCHJD4eg18ojDt3s9VrDLa+V9fNxUHPSib9FD9UX/1 +nGfU/ZABmiTuUDM7WZdXri7HaMpzDRJUKI6b+/uunF8xH/h/MHW16VuMzgI5dkOKKv1LejD dT5mA4R+2zBS+GsM0oa2hUeX9E5WwjaDzXtVDg6kYq8YvEd+m0z3M4e6diFeLS77/sAOgaYL 92UcoKD+Beym/fVuC6/55a0e12ksTmgk5/ZoEdoNQLlVgd2INtvnO+0k5BJcn66ZjKn3GbEC VqFbrnv1GnA58nEInRCTzR1k26h9nmS5Ag0EW/wHfAEQAMth1vHr3fOZkVOPfod3M6DkQir5 xJvUW5EHgYUjYCPIa2qzgIVVuLDqZgSCCinyooG5dUJONVHj3nCbITCpJp4eB3PI84RPfDcC hf/V34N/Gx5mTeoymSZDBmXT8YtvV/uJvn+LvHLO4ZJdvq5ZxmDyxfXFmkm3/lLw0+rrNdK5 pt6OnVlCqEU9tcDBezjUwDtOahyV20XqxtUttN4kQWbDRkhT+HrA9WN9l2HX91yEYC+zmF1S OhBqRoTPLrR6g4sCWgFywqztpvZWhyIicJipnjac7qL/wRS+wrWfsYy6qWLIV80beN7yoa6v ccnuy4pu2uiuhk9/edtlmFE4dNdoRf7843CV9k1yRASTlmPkU59n0TJbw+okTa9fbbQgbIb1 pWsAuicRHyLUIUz4f6kPgdgty2FgTKuPuIzJd1s8s6p2aC1qo+Obm2gnBTduB+/n1Jw+vKpt 07d+CKEKu4CWwvZZ8ktJJLeofi4hMupTYiq+oMzqH+V1k6QgNm0Da489gXllU+3EFC6W1qKj tkvQzg2rYoWeYD1Qn8iXcO4Fpk6wzylclvatBMddVlQ6qrYeTmSbCsk+m2KVrz5vIyja0o5Y yfeN29s9emXnikmNfv/dA5fpi8XCANNnz3zOfA93DOB9DBf0TQ2/OrSPGjB3op7RCfoPBZ7u AjJ9dM7VABEBAAGJAjwEGAEIACYWIQRwNQOUoMloO4mV9l90l1of+wIKmwUCW/wHfAIbDAUJ CWYBgAAKCRB0l1of+wIKm3KlD/9w/LOG5rtgtCUWPl4B3pZvGpNym6XdK8cop9saOnE85zWf u+sKWCrxNgYkYP7aZrYMPwqDvilxhbTsIJl5HhPgpTO1b0i+c0n1Tij3EElj5UCg3q8mEc17 c+5jRrY3oz77g7E3oPftAjaq1ybbXjY4K32o3JHFR6I8wX3m9wJZJe1+Y+UVrrjY65gZFxcA thNVnWKErarVQGjeNgHV4N1uF3pIx3kT1N4GSnxhoz4Bki91kvkbBhUgYfNflGURfZT3wIKK +d50jd7kqRouXUCzTdzmDh7jnYrcEFM4nvyaYu0JjSS5R672d9SK5LVIfWmoUGzqD4AVmUW8 pcv461+PXchuS8+zpltR9zajl72Q3ymlT4BTAQOlCWkD0snBoKNUB5d2EXPNV13nA0qlm4U2 GpROfJMQXjV6fyYRvttKYfM5xYKgRgtP0z5lTAbsjg9WFKq0Fndh7kUlmHjuAIwKIV4Tzo75 QO2zC0/NTaTjmrtiXhP+vkC4pcrOGNsbHuaqvsc/ZZ0siXyYsqbctj/sCd8ka2r94u+c7o4l BGaAm+FtwAfEAkXHu4y5Phuv2IRR+x1wTey1U1RaEPgN8xq0LQ1OitX4t2mQwjdPihZQBCnZ wzOrkbzlJMNrMKJpEgulmxAHmYJKgvZHXZXtLJSejFjR0GdHJcL5rwVOMWB8cg== Message-ID: <7309c8de-ea2a-f780-432b-8f60847f6733@ashroe.eu> Date: Wed, 1 Apr 2020 06:54:58 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <1583757826-375246-4-git-send-email-vladimir.medvedkin@intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-AuthUser: mdr@ashroe.eu Subject: Re: [dpdk-dev] [PATCH 3/6] fib: introduce AVX512 lookup X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 09/03/2020 12:43, Vladimir Medvedkin wrote: > Add new lookup implementation for DIR24_8 algorithm using > AVX512 instruction set > > Signed-off-by: Vladimir Medvedkin > --- > lib/librte_fib/dir24_8.c | 71 ++++++++++++++++++++++++ > lib/librte_fib/dir24_8_avx512.h | 116 ++++++++++++++++++++++++++++++++++++++++ > lib/librte_fib/rte_fib.h | 3 +- > 3 files changed, 189 insertions(+), 1 deletion(-) > create mode 100644 lib/librte_fib/dir24_8_avx512.h > > diff --git a/lib/librte_fib/dir24_8.c b/lib/librte_fib/dir24_8.c > index 825d061..9f51dfc 100644 > --- a/lib/librte_fib/dir24_8.c > +++ b/lib/librte_fib/dir24_8.c > @@ -245,6 +245,62 @@ dir24_8_lookup_bulk_uni(void *p, const uint32_t *ips, > } > } > > +#ifdef __AVX512F__ > + > +#include "dir24_8_avx512.h" > + > +static void > +rte_dir24_8_vec_lookup_bulk_1b(void *p, const uint32_t *ips, > + uint64_t *next_hops, const unsigned int n) > +{ > + uint32_t i; > + for (i = 0; i < (n / 16); i++) > + dir24_8_vec_lookup_x16(p, ips + i * 16, next_hops + i * 16, > + sizeof(uint8_t)); > + > + dir24_8_lookup_bulk_1b(p, ips + i * 16, next_hops + i * 16, > + n - i * 16); > +} > + > +static void > +rte_dir24_8_vec_lookup_bulk_2b(void *p, const uint32_t *ips, > + uint64_t *next_hops, const unsigned int n) > +{ > + uint32_t i; > + for (i = 0; i < (n / 16); i++) > + dir24_8_vec_lookup_x16(p, ips + i * 16, next_hops + i * 16, > + sizeof(uint16_t)); > + > + dir24_8_lookup_bulk_2b(p, ips + i * 16, next_hops + i * 16, > + n - i * 16); > +} > + > +static void > +rte_dir24_8_vec_lookup_bulk_4b(void *p, const uint32_t *ips, > + uint64_t *next_hops, const unsigned int n) > +{ > + uint32_t i; > + for (i = 0; i < (n / 16); i++) > + dir24_8_vec_lookup_x16(p, ips + i * 16, next_hops + i * 16, > + sizeof(uint32_t)); > + > + dir24_8_lookup_bulk_4b(p, ips + i * 16, next_hops + i * 16, > + n - i * 16); > +} > + > +static void > +rte_dir24_8_vec_lookup_bulk_8b(void *p, const uint32_t *ips, > + uint64_t *next_hops, const unsigned int n) > +{ > + uint32_t i; > + for (i = 0; i < (n / 8); i++) > + dir24_8_vec_lookup_x8_8b(p, ips + i * 8, next_hops + i * 8); > + > + dir24_8_lookup_bulk_8b(p, ips + i * 8, next_hops + i * 8, n - i * 8); > +} > + > +#endif /* __AVX512F__ */ > + > rte_fib_lookup_fn_t > dir24_8_get_lookup_fn(void *p, enum rte_fib_dir24_8_lookup_type type) > { > @@ -285,6 +341,21 @@ dir24_8_get_lookup_fn(void *p, enum rte_fib_dir24_8_lookup_type type) > } > case RTE_FIB_DIR24_8_SCALAR_UNI: > return dir24_8_lookup_bulk_uni; > +#ifdef __AVX512F__ > + case RTE_FIB_DIR24_8_VECTOR: > + switch (nh_sz) { > + case RTE_FIB_DIR24_8_1B: > + return rte_dir24_8_vec_lookup_bulk_1b; > + case RTE_FIB_DIR24_8_2B: > + return rte_dir24_8_vec_lookup_bulk_2b; > + case RTE_FIB_DIR24_8_4B: > + return rte_dir24_8_vec_lookup_bulk_4b; > + case RTE_FIB_DIR24_8_8B: > + return rte_dir24_8_vec_lookup_bulk_8b; > + default: > + return NULL; > + } > +#endif > default: > return NULL; > } > diff --git a/lib/librte_fib/dir24_8_avx512.h b/lib/librte_fib/dir24_8_avx512.h > new file mode 100644 > index 0000000..3b6680c > --- /dev/null > +++ b/lib/librte_fib/dir24_8_avx512.h > @@ -0,0 +1,116 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#ifndef _DIR248_AVX512_H_ > +#define _DIR248_AVX512_H_ > + > +#include > + > +static __rte_always_inline void > +dir24_8_vec_lookup_x16(void *p, const uint32_t *ips, > + uint64_t *next_hops, int size) > +{ > + struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p; > + __mmask16 msk_ext; > + __mmask16 exp_msk = 0x5555; > + __m512i ip_vec, idxes, res, bytes; > + const __m512i zero = _mm512_set1_epi32(0); > + const __m512i lsb = _mm512_set1_epi32(1); > + const __m512i lsbyte_msk = _mm512_set1_epi32(0xff); > + __m512i tmp1, tmp2, res_msk; > + __m256i tmp256; > + /* used to mask gather values if size is 1/2 (8/16 bit next hops) */ > + if (size == sizeof(uint8_t)) > + res_msk = _mm512_set1_epi32(UINT8_MAX); > + else if (size == sizeof(uint16_t)) > + res_msk = _mm512_set1_epi32(UINT16_MAX); > + > + ip_vec = _mm512_loadu_si512(ips); > + /* mask 24 most significant bits */ > + idxes = _mm512_srli_epi32(ip_vec, 8); > + > + /** > + * lookup in tbl24 > + * Put it inside branch to make compiller happy with -O0 > + */ typo on compiler. why not _mm512_i32gather_epi32(idxes, (const int *)dp->tbl24, size/sizeof(uint8_t)); presume compiler didn't like it for some reason? > + if (size == sizeof(uint8_t)) { > + res = _mm512_i32gather_epi32(idxes, (const int *)dp->tbl24, 1); > + res = _mm512_and_epi32(res, res_msk); > + } else if (size == sizeof(uint16_t)) { > + res = _mm512_i32gather_epi32(idxes, (const int *)dp->tbl24, 2); > + res = _mm512_and_epi32(res, res_msk); > + } else > + res = _mm512_i32gather_epi32(idxes, (const int *)dp->tbl24, 4); > + > + /* get extended entries indexes */ > + msk_ext = _mm512_test_epi32_mask(res, lsb); > + > + if (msk_ext != 0) { > + idxes = _mm512_srli_epi32(res, 1); > + idxes = _mm512_slli_epi32(idxes, 8); > + bytes = _mm512_and_epi32(ip_vec, lsbyte_msk); > + idxes = _mm512_maskz_add_epi32(msk_ext, idxes, bytes); > + if (size == sizeof(uint8_t)) { > + idxes = _mm512_mask_i32gather_epi32(zero, msk_ext, > + idxes, (const int *)dp->tbl8, 1); > + idxes = _mm512_and_epi32(idxes, res_msk); > + } else if (size == sizeof(uint16_t)) { > + idxes = _mm512_mask_i32gather_epi32(zero, msk_ext, > + idxes, (const int *)dp->tbl8, 2); > + idxes = _mm512_and_epi32(idxes, res_msk); > + } else > + idxes = _mm512_mask_i32gather_epi32(zero, msk_ext, > + idxes, (const int *)dp->tbl8, 4); > + > + res = _mm512_mask_blend_epi32(msk_ext, res, idxes); > + } > + > + res = _mm512_srli_epi32(res, 1); > + tmp1 = _mm512_maskz_expand_epi32(exp_msk, res); > + tmp256 = _mm512_extracti32x8_epi32(res, 1); > + tmp2 = _mm512_maskz_expand_epi32(exp_msk, > + _mm512_castsi256_si512(tmp256)); > + _mm512_storeu_si512(next_hops, tmp1); > + _mm512_storeu_si512(next_hops + 8, tmp2); > +} > + > +static __rte_always_inline void > +dir24_8_vec_lookup_x8_8b(void *p, const uint32_t *ips, > + uint64_t *next_hops) > +{ > + struct dir24_8_tbl *dp = (struct dir24_8_tbl *)p; > + const __m512i zero = _mm512_set1_epi32(0); > + const __m512i lsbyte_msk = _mm512_set1_epi64(0xff); > + const __m512i lsb = _mm512_set1_epi64(1); > + __m512i res, idxes, bytes; > + __m256i idxes_256, ip_vec; > + __mmask8 msk_ext; > + > + ip_vec = _mm256_loadu_si256((const void *)ips); > + /* mask 24 most significant bits */ > + idxes_256 = _mm256_srli_epi32(ip_vec, 8); > + > + /* lookup in tbl24 */ > + res = _mm512_i32gather_epi64(idxes_256, (const void *)dp->tbl24, 8); > + > + /* get extended entries indexes */ > + msk_ext = _mm512_test_epi64_mask(res, lsb); > + > + if (msk_ext != 0) { > + bytes = _mm512_cvtepi32_epi64(ip_vec); > + idxes = _mm512_srli_epi64(res, 1); > + idxes = _mm512_slli_epi64(idxes, 8); > + bytes = _mm512_and_epi64(bytes, lsbyte_msk); > + idxes = _mm512_maskz_add_epi64(msk_ext, idxes, bytes); > + idxes = _mm512_mask_i64gather_epi64(zero, msk_ext, idxes, > + (const void *)dp->tbl8, 8); > + > + res = _mm512_mask_blend_epi64(msk_ext, res, idxes); > + } > + > + res = _mm512_srli_epi64(res, 1); > + _mm512_storeu_si512(next_hops, res); > +} > + > +#endif /* _DIR248_AVX512_H_ */ > diff --git a/lib/librte_fib/rte_fib.h b/lib/librte_fib/rte_fib.h > index 0e98775..89d0f12 100644 > --- a/lib/librte_fib/rte_fib.h > +++ b/lib/librte_fib/rte_fib.h > @@ -50,7 +50,8 @@ enum rte_fib_dir24_8_nh_sz { > enum rte_fib_dir24_8_lookup_type { > RTE_FIB_DIR24_8_SCALAR_MACRO, > RTE_FIB_DIR24_8_SCALAR_INLINE, > - RTE_FIB_DIR24_8_SCALAR_UNI > + RTE_FIB_DIR24_8_SCALAR_UNI, > + RTE_FIB_DIR24_8_VECTOR > }; > > /** FIB configuration structure */ >