From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C3F5643C93; Tue, 12 Mar 2024 16:08:14 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8EEA2402E0; Tue, 12 Mar 2024 16:08:14 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id EED6B402D8 for ; Tue, 12 Mar 2024 16:08:12 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A86A8FEC; Tue, 12 Mar 2024 08:08:49 -0700 (PDT) Received: from [10.57.69.190] (unknown [10.57.69.190]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 164983F762; Tue, 12 Mar 2024 08:08:09 -0700 (PDT) Message-ID: <6213e5dd-1a2d-42e3-b494-ca9865fecdce@foss.arm.com> Date: Tue, 12 Mar 2024 15:08:10 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 4/4] hash: add SVE support for bulk key lookup Content-Language: en-US To: fengchengwen , Yoan Picchi , Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin Cc: dev@dpdk.org, nd@arm.com, Harjot Singh , Nathan Brown , Ruifeng Wang References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240311232112.736613-1-yoan.picchi@arm.com> <20240311232112.736613-5-yoan.picchi@arm.com> <1ba747ce-eeab-5ad5-9e38-596584c73ad0@huawei.com> From: Yoan Picchi In-Reply-To: <1ba747ce-eeab-5ad5-9e38-596584c73ad0@huawei.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 3/12/24 03:57, fengchengwen wrote: > Hi Yoan, > > On 2024/3/12 7:21, Yoan Picchi wrote: >> - Implemented SVE code for comparing signatures in bulk lookup. >> - Added Defines in code for SVE code support. >> - Optimise NEON code > > This commit does not include this part. Pls only describe the content in this commit. Thank you. I forgot to edit that out after moving commit around. > >> - New SVE code is ~5% slower than optimized NEON for N2 processor. >> >> Signed-off-by: Yoan Picchi >> Signed-off-by: Harjot Singh >> Reviewed-by: Nathan Brown >> Reviewed-by: Ruifeng Wang >> --- >> lib/hash/arch/arm/compare_signatures.h | 58 ++++++++++++++++++++++++++ >> lib/hash/rte_cuckoo_hash.c | 2 + >> 2 files changed, 60 insertions(+) >> >> diff --git a/lib/hash/arch/arm/compare_signatures.h b/lib/hash/arch/arm/compare_signatures.h >> index b5a457f936..8a0627e119 100644 >> --- a/lib/hash/arch/arm/compare_signatures.h >> +++ b/lib/hash/arch/arm/compare_signatures.h >> @@ -47,6 +47,64 @@ compare_signatures_dense(uint16_t *hitmask_buffer, >> *hitmask_buffer = vaddvq_u16(hit2); >> } >> break; >> +#endif >> +#if defined(RTE_HAS_SVE_ACLE) >> + case RTE_HASH_COMPARE_SVE: { > > ... > >> #endif >> default: >> for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { >> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c >> index e41f03270a..7a474267f0 100644 >> --- a/lib/hash/rte_cuckoo_hash.c >> +++ b/lib/hash/rte_cuckoo_hash.c >> @@ -452,6 +452,8 @@ rte_hash_create(const struct rte_hash_parameters *params) >> #elif defined(RTE_ARCH_ARM64) >> if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) { >> h->sig_cmp_fn = RTE_HASH_COMPARE_NEON; >> + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE)) >> + h->sig_cmp_fn = RTE_HASH_COMPARE_SVE; > > The RTE_HASH_COMPARE_SVE was defined in "PATCH v6 1/4] hash: pack the hitmask for hash in bulk lookup", > but its first use is in this commit, so I think it should defined in this commit. > > If RTE_CPUFLAG_SVE and RTE_HAS_SVE_ACLE both set, then SVE impl will be chosen. > If RTE_CPUFLAG_SVE defined, but RTE_HAS_SVE_ACLE was not, then scalar will be chosen. --- in this case we could back to NEON impl. > So I suggest direct use "#if defined(RTE_HAS_SVE_ACLE)" here. Sounds fair. I'll do it. > >> } >> else >> #endif >> > > Plus: > I notice the commit log said the SVE performance is slower than NEON. > > And I also notice other platform SVE also lower than NEON, > 1. b4ee9c07bd config/arm: disable SVE ACLE for CN10K > 2. 4eea7c6461 config/arm: add SVE ACLE control flag > > So maybe we should disable RTE_HAS_SVE_ACLE default by: > diff --git a/config/arm/meson.build b/config/arm/meson.build > index 9d6fb87d7f..a5b890d100 100644 > --- a/config/arm/meson.build > +++ b/config/arm/meson.build > @@ -875,7 +875,7 @@ endif > > if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != '' > compile_time_cpuflags += ['RTE_CPUFLAG_SVE'] > - if (cc.check_header('arm_sve.h') and soc_config.get('sve_acle', true)) > + if (cc.check_header('arm_sve.h') and soc_config.get('sve_acle', false)) > dpdk_conf.set('RTE_HAS_SVE_ACLE', 1) > endif > endif > > If the platform verify SVE has higher performance, then it could enable SVE by add "sve_acle: true" in soc_xxx config. > > Thanks Here I kinda disagree. In this particular instance, SVE is a bit slower with narrow vectors (128b), but could be faster with some wider vector sizes. Even in general SVE 128b is not just slower than neon. It's a case by case basis. Sometime it's slower, sometime it's faster, so I don't think we should just disable it by default. In any case, disabling it should be its own patch with much discussion, not just a offhand thing we include in the middle of this patch. This SVE version is still faster than the upstream neon version. I just happen to have improved the neon version even more.