From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7FA164545B; Fri, 14 Jun 2024 15:42:57 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 68DF3402D3; Fri, 14 Jun 2024 15:42:57 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 9AC2B402D0 for ; Fri, 14 Jun 2024 15:42:56 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718372576; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MyKnc4SYN1p8U+5HcP+GSSygxHud5AAFXxvwav4oSW8=; b=Zr973ThBKZsxLdd2qVBg9bLABWvC1RopcrJScEyUetcAifd6vJHs86Wn0R1KQRH3qU30ih yX6PIlvJF1H+wTeXf9mveeL/Y84lD6Lt5jbjokWCT4WDzKYiRpTqlmF//RCYT2f1UmU67I OXETom30+O0MkvtH7+Zws/8s4YPEx64= Received: from mail-lf1-f69.google.com (mail-lf1-f69.google.com [209.85.167.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-683-lLTDwxhONEGMlPzKSalUCg-1; Fri, 14 Jun 2024 09:42:52 -0400 X-MC-Unique: lLTDwxhONEGMlPzKSalUCg-1 Received: by mail-lf1-f69.google.com with SMTP id 2adb3069b0e04-52c6f37cc97so1427411e87.0 for ; Fri, 14 Jun 2024 06:42:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718372571; x=1718977371; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MyKnc4SYN1p8U+5HcP+GSSygxHud5AAFXxvwav4oSW8=; b=qDAOB1YWpW3WUy9wBYoBIZYPExLUJwJvsUvXFbx+2sSdVt7CJiKdrCxkV6KiQwfYJ9 tPHP4Gq0mtj2nKOshxIhOSVyBLc8NnDv1nSXjFen7iX/W1pnk86pNsWebOAHx6JRgUu6 Wjygmuy555h2N0MaIYzau3cW4HqVCAcG9gzZ0ob5pw7BSr+5f9j2MrxYwVezdE1o6tXA s7L1I+0Dg7R2vZkPnLkQaijhYX/wbrqGkW+8Au7jKsa1/PVz3rb47Sr3y1lNp9pmsVP/ Mi8nCgKQWfY1XvlC1kiCtJTzMfZXdVWDILfNGBTcm2odIht6ZLIbfoXCSj3uItcuqppW mAIw== X-Forwarded-Encrypted: i=1; AJvYcCW8v8VlMbVwzlChSBzpDo4aeTR09zFVzTrY0Uq0sz2wtQa5tAsvE0NafIjs/VRKb5NFeiFh9OBlcPWOk6E= X-Gm-Message-State: AOJu0YwuBz/oJsWspa8gjhtq4ule9bGN1lInNjYAseF7lkVfBk8NAU70 DHAKQvWNsvzJWZ5Y3QgqkjZpQPc3XtXB5hMwM6b7e63qZRmzxuX3CTX5A/ldhjICcVq4ONvfe+4 0U1VETv5vKRY/rKdXNf/lMC9E/C4pWqhSepNeUAvBNypNi7W6eL/lx59zHWgif3p58vCaRAXvh3 PjBKSbyzHt4b1UHEU= X-Received: by 2002:a19:ac03:0:b0:52c:4cfa:c5a6 with SMTP id 2adb3069b0e04-52ca6e6d56dmr1422490e87.34.1718372571533; Fri, 14 Jun 2024 06:42:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF9IpqGQsps8+Gx21FHSJFzeWLWVJcf4kCeeie3vOnEPHpY/MXXDlrMoJNT4KFhjTaFYNsybtAiStO6X04I5J4= X-Received: by 2002:a19:ac03:0:b0:52c:4cfa:c5a6 with SMTP id 2adb3069b0e04-52ca6e6d56dmr1422486e87.34.1718372571127; Fri, 14 Jun 2024 06:42:51 -0700 (PDT) MIME-Version: 1.0 References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240430162743.1525484-1-yoan.picchi@arm.com> <20240430162743.1525484-5-yoan.picchi@arm.com> In-Reply-To: <20240430162743.1525484-5-yoan.picchi@arm.com> From: David Marchand Date: Fri, 14 Jun 2024 15:42:37 +0200 Message-ID: Subject: Re: [PATCH v9 4/4] hash: add SVE support for bulk key lookup To: Yoan Picchi Cc: Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin , dev@dpdk.org, nd@arm.com, Harjot Singh , Nathan Brown , Ruifeng Wang X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Tue, Apr 30, 2024 at 6:28=E2=80=AFPM Yoan Picchi w= rote: > > - Implemented SVE code for comparing signatures in bulk lookup. > - New SVE code is ~5% slower than optimized NEON for N2 processor for > 128b vectors. > > Signed-off-by: Yoan Picchi > Signed-off-by: Harjot Singh > Reviewed-by: Nathan Brown > Reviewed-by: Ruifeng Wang > --- > lib/hash/arch/arm/compare_signatures.h | 58 ++++++++++++++++++++++++++ > lib/hash/rte_cuckoo_hash.c | 7 +++- > lib/hash/rte_cuckoo_hash.h | 1 + > 3 files changed, 65 insertions(+), 1 deletion(-) > > diff --git a/lib/hash/arch/arm/compare_signatures.h b/lib/hash/arch/arm/c= ompare_signatures.h > index 72bd171484..b4b4cf04e9 100644 > --- a/lib/hash/arch/arm/compare_signatures.h > +++ b/lib/hash/arch/arm/compare_signatures.h > @@ -47,6 +47,64 @@ compare_signatures_dense(uint16_t *hitmask_buffer, > *hitmask_buffer =3D vaddvq_u16(hit2); > } > break; > +#endif > +#if defined(RTE_HAS_SVE_ACLE) > + case RTE_HASH_COMPARE_SVE: { > + svuint16_t vsign, shift, sv_matches; > + svbool_t pred, match, bucket_wide_pred; > + int i =3D 0; > + uint64_t vl =3D svcnth(); > + > + vsign =3D svdup_u16(sig); > + shift =3D svindex_u16(0, 1); > + > + if (vl >=3D 2 * RTE_HASH_BUCKET_ENTRIES && RTE_HASH_BUCKE= T_ENTRIES <=3D 8) { > + svuint16_t primary_array_vect, secondary_array_ve= ct; > + bucket_wide_pred =3D svwhilelt_b16(0, RTE_HASH_BU= CKET_ENTRIES); > + primary_array_vect =3D svld1_u16(bucket_wide_pred= , prim_bucket_sigs); > + secondary_array_vect =3D svld1_u16(bucket_wide_pr= ed, sec_bucket_sigs); > + > + /* We merged the two vectors so we can do both co= mparisons at once */ > + primary_array_vect =3D svsplice_u16(bucket_wide_p= red, > + primary_array_vect, > + secondary_array_vect); > + pred =3D svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRI= ES); > + > + /* Compare all signatures in the buckets */ > + match =3D svcmpeq_u16(pred, vsign, primary_array_= vect); > + if (svptest_any(svptrue_b16(), match)) { > + sv_matches =3D svdup_u16(1); > + sv_matches =3D svlsl_u16_z(match, sv_matc= hes, shift); > + *hitmask_buffer =3D svorv_u16(svptrue_b16= (), sv_matches); > + } > + } else { > + do { > + pred =3D svwhilelt_b16(i, RTE_HASH_BUCKET= _ENTRIES); > + uint16_t lower_half =3D 0; > + uint16_t upper_half =3D 0; > + /* Compare all signatures in the primary = bucket */ > + match =3D svcmpeq_u16(pred, vsign, svld1_= u16(pred, > + &prim_bucket_sigs= [i])); > + if (svptest_any(svptrue_b16(), match)) { > + sv_matches =3D svdup_u16(1); > + sv_matches =3D svlsl_u16_z(match,= sv_matches, shift); > + lower_half =3D svorv_u16(svptrue_= b16(), sv_matches); > + } > + /* Compare all signatures in the secondar= y bucket */ > + match =3D svcmpeq_u16(pred, vsign, svld1_= u16(pred, > + &sec_bucket_sigs[= i])); > + if (svptest_any(svptrue_b16(), match)) { > + sv_matches =3D svdup_u16(1); > + sv_matches =3D svlsl_u16_z(match,= sv_matches, shift); > + upper_half =3D svorv_u16(svptrue_= b16(), sv_matches) > + << RTE_HASH_BUCKET_ENTRIE= S; > + } > + hitmask_buffer[i / 8] =3D upper_half | lo= wer_half; > + i +=3D vl; > + } while (i < RTE_HASH_BUCKET_ENTRIES); > + } > + } > + break; > #endif > default: > for (unsigned int i =3D 0; i < RTE_HASH_BUCKET_ENTRIES; i= ++) { > diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c > index 0697743cdf..75f555ba2c 100644 > --- a/lib/hash/rte_cuckoo_hash.c > +++ b/lib/hash/rte_cuckoo_hash.c > @@ -450,8 +450,13 @@ rte_hash_create(const struct rte_hash_parameters *pa= rams) > h->sig_cmp_fn =3D RTE_HASH_COMPARE_SSE; > else > #elif defined(RTE_ARCH_ARM64) > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) { > h->sig_cmp_fn =3D RTE_HASH_COMPARE_NEON; > +#if defined(RTE_HAS_SVE_ACLE) > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE)) > + h->sig_cmp_fn =3D RTE_HASH_COMPARE_SVE; > +#endif > + } > else > #endif > h->sig_cmp_fn =3D RTE_HASH_COMPARE_SCALAR; > diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h > index a528f1d1a0..01ad01c258 100644 > --- a/lib/hash/rte_cuckoo_hash.h > +++ b/lib/hash/rte_cuckoo_hash.h > @@ -139,6 +139,7 @@ enum rte_hash_sig_compare_function { > RTE_HASH_COMPARE_SCALAR =3D 0, > RTE_HASH_COMPARE_SSE, > RTE_HASH_COMPARE_NEON, > + RTE_HASH_COMPARE_SVE, > RTE_HASH_COMPARE_NUM > }; I am surprised the ABI check does not complain over this change. RTE_HASH_COMPARE_NUM is not used and knowing the number of compare function implementations should not be of interest for an application. But it still seem an ABI breakage to me. RTE_HASH_COMPARE_NUM can be removed in v24.11. And ideally, sig_cmp_fn should be made opaque (or moved to an opaque struct out of the rte_hash public struct). --=20 David Marchand