From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3BCF8A04BB; Tue, 6 Oct 2020 17:08:05 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 1A8BA1B68E; Tue, 6 Oct 2020 17:08:04 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 148F21B671 for ; Tue, 6 Oct 2020 17:08:02 +0200 (CEST) IronPort-SDR: IVuAUv6RFbu3CyWWUR1Sx3hO2/vWS4oapnNQoPkyoZOVbdcztBmLBcKbGWmWFI/Jwq4qRTOJyo 9oOoow9xzBuw== X-IronPort-AV: E=McAfee;i="6000,8403,9765"; a="143919402" X-IronPort-AV: E=Sophos;i="5.77,343,1596524400"; d="scan'208";a="143919402" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Oct 2020 08:03:29 -0700 IronPort-SDR: Sm9ZGUKD8iCAiPpbyTVUYuHJjGR6MnqmmTERic6o8TPwVyUA4bE30DFIf7+R9kypUUOrz776Y0 dzxBw6OB2p1w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,343,1596524400"; d="scan'208";a="518315332" Received: from sivswdev08.ir.intel.com ([10.237.217.47]) by fmsmga005.fm.intel.com with ESMTP; 06 Oct 2020 08:03:24 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: jerinj@marvell.com, ruifeng.wang@arm.com, vladimir.medvedkin@intel.com, Konstantin Ananyev Date: Tue, 6 Oct 2020 16:03:02 +0100 Message-Id: <20201006150316.5776-1-konstantin.ananyev@intel.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20201005184526.7465-1-konstantin.ananyev@intel.com> References: <20201005184526.7465-1-konstantin.ananyev@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v4 00/14] acl: introduce AVX512 classify methods X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" These patch series introduce support of AVX512 specific classify implementation for ACL library. It adds two new algorithms: - RTE_ACL_CLASSIFY_AVX512X16 - can process up to 16 flows in parallel. It uses 256-bit width instructions/registers only (to avoid frequency level change). On my SKX box test-acl shows ~15-30% improvement (depending on rule-set and input burst size) when switching from AVX2 to AVX512X16 classify algorithms. - RTE_ACL_CLASSIFY_AVX512X32 - can process up to 32 flows in parallel. It uses 512-bit width instructions/registers and provides higher performance then AVX512X16, but can cause frequency level change. On my SKX box test-acl shows ~50-70% improvement (depending on rule-set and input burst size) when switching from AVX2 to AVX512X32 classify algorithms. ICX and CLX testing showed similar level of speedup. Current AVX512 classify implementation is only supported on x86_64. Note that this series introduce a formal ABI incompatibility with previous versions of ACL library. Depends-on: patch-79310 ("eal/x86: introduce AVX 512-bit type") v3 -> v4 Fix problems with meson 0.47 Updates to conform latest changes in the mainline (removal of RTE_MACHINE_CPUFLAG_*) Fix checkpatch warnings v2 -> v3: Fix checkpatch warnings Split AVX512 algorithm into two and deduplicate common code v1 -> v2: Deduplicated 8/16 code paths as much as possible Updated default algorithm selection Removed library constructor to make it easier integrate with https://patches.dpdk.org/project/dpdk/list/?series=11831 Updated docs Konstantin Ananyev (14): acl: fix x86 build when compiler doesn't support AVX2 doc: fix missing classify methods in ACL guide acl: remove of unused enum value acl: remove library constructor app/acl: few small improvements test/acl: expand classify test coverage acl: add infrastructure to support AVX512 classify acl: introduce 256-bit width AVX512 classify implementation acl: update default classify algorithm selection acl: introduce 512-bit width AVX512 classify implementation acl: for AVX512 classify use 4B load whenever possible acl: deduplicate AVX512 code paths test/acl: add AVX512 classify support app/acl: add AVX512 classify support app/test-acl/main.c | 23 +- app/test/test_acl.c | 105 ++-- config/x86/meson.build | 3 +- .../prog_guide/packet_classif_access_ctrl.rst | 20 + doc/guides/rel_notes/deprecation.rst | 4 - doc/guides/rel_notes/release_20_11.rst | 12 + lib/librte_acl/acl.h | 16 + lib/librte_acl/acl_bld.c | 34 ++ lib/librte_acl/acl_gen.c | 2 +- lib/librte_acl/acl_run_avx512.c | 164 ++++++ lib/librte_acl/acl_run_avx512_common.h | 477 ++++++++++++++++++ lib/librte_acl/acl_run_avx512x16.h | 341 +++++++++++++ lib/librte_acl/acl_run_avx512x8.h | 253 ++++++++++ lib/librte_acl/meson.build | 48 ++ lib/librte_acl/rte_acl.c | 212 ++++++-- lib/librte_acl/rte_acl.h | 4 +- 16 files changed, 1618 insertions(+), 100 deletions(-) create mode 100644 lib/librte_acl/acl_run_avx512.c create mode 100644 lib/librte_acl/acl_run_avx512_common.h create mode 100644 lib/librte_acl/acl_run_avx512x16.h create mode 100644 lib/librte_acl/acl_run_avx512x8.h -- 2.17.1