DPDK patches and discussions
 help / color / mirror / Atom feed
From: David Marchand <david.marchand@redhat.com>
To: dev@dpdk.org
Cc: viktorin@rehivetech.com, ruifeng.wang@arm.com,
	jerinj@marvell.com, drc@linux.vnet.ibm.com,
	bruce.richardson@intel.com, konstantin.ananyev@intel.com,
	ciara.power@intel.com
Subject: [dpdk-dev] [PATCH v10 18/18] acl: check max SIMD bitwidth
Date: Mon, 19 Oct 2020 15:48:58 +0200
Message-ID: <20201019134858.32507-19-david.marchand@redhat.com> (raw)
In-Reply-To: <20201019134858.32507-1-david.marchand@redhat.com>

From: Ciara Power <ciara.power@intel.com>

When choosing a vector path to take, an extra condition must be
satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
path. These checks are added in the check alg helper functions.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v10:
  - updated #ifdef magic for arm and ppc checks,
v8:
  - Added AVX512X32 into default options for get best alg function.
  - Updated doc to reflect the above change.
v7:
  - Removed global variable for max SIMD bitwidth.
  - Added helper function for checking AVX512 cpu flags.
  - Separated condition checking for the AVX512 algorithms to allow for
    checking 256/512 max SIMD bitwidth, respectively.
  - Added to docs to reflect the added changes in algorithm selection.
---
 .../prog_guide/packet_classif_access_ctrl.rst | 16 +++---
 lib/librte_acl/rte_acl.c                      | 53 +++++++++++++------
 lib/librte_acl/rte_acl.h                      |  1 +
 3 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/doc/guides/prog_guide/packet_classif_access_ctrl.rst b/doc/guides/prog_guide/packet_classif_access_ctrl.rst
index 7659af8eb5..1811db4618 100644
--- a/doc/guides/prog_guide/packet_classif_access_ctrl.rst
+++ b/doc/guides/prog_guide/packet_classif_access_ctrl.rst
@@ -368,24 +368,27 @@ After rte_acl_build() over given AC context has finished successfully, it can be
 There are several implementations of classify algorithm:
 
 *   **RTE_ACL_CLASSIFY_SCALAR**: generic implementation, doesn't require any specific HW support.
+    Requires max SIMD bitwidth to be at least 64.
 
 *   **RTE_ACL_CLASSIFY_SSE**: vector implementation, can process up to 8 flows in parallel. Requires SSE 4.1 support.
+    Requires max SIMD bitwidth to be at least 128.
 
 *   **RTE_ACL_CLASSIFY_AVX2**: vector implementation, can process up to 16 flows in parallel. Requires AVX2 support.
+    Requires max SIMD bitwidth to be at least 256.
 
 *   **RTE_ACL_CLASSIFY_NEON**: vector implementation, can process up to 8 flows
-    in parallel. Requires NEON support.
+    in parallel. Requires NEON support. Requires max SIMD bitwidth to be at least 128.
 
 *   **RTE_ACL_CLASSIFY_ALTIVEC**: vector implementation, can process up to 8
-    flows in parallel. Requires ALTIVEC support.
+    flows in parallel. Requires ALTIVEC support. Requires max SIMD bitwidth to be at least 128.
 
 *   **RTE_ACL_CLASSIFY_AVX512X16**: vector implementation, can process up to 16
     flows in parallel. Uses 256-bit width SIMD registers.
-    Requires AVX512 support.
+    Requires AVX512 support. Requires max SIMD bitwidth to be at least 256.
 
 *   **RTE_ACL_CLASSIFY_AVX512X32**: vector implementation, can process up to 32
     flows in parallel. Uses 512-bit width SIMD registers.
-    Requires AVX512 support.
+    Requires AVX512 support. Requires max SIMD bitwidth to be at least 512.
 
 It is purely a runtime decision which method to choose, there is no build-time difference.
 All implementations operates over the same internal RT structures and use similar principles. The main difference is that vector implementations can manually exploit IA SIMD instructions and process several input data flows in parallel.
@@ -393,9 +396,8 @@ At startup ACL library determines the highest available classify method for the
 
 .. note::
 
-     Right now ``RTE_ACL_CLASSIFY_AVX512X32`` is not selected by default
-     (due to possible frequency level change), but it can be selected at
-     runtime by apps through the use of ACL API: ``rte_acl_set_ctx_classify``.
+     Runtime algorithm selection obeys EAL max SIMD bitwidth parameter.
+     For more details about expected behaviour please see :ref:`max_simd_bitwidth`
 
 Application Programming Interface (API) Usage
 ---------------------------------------------
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index 7c2f60b2d6..4e693b2488 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -6,6 +6,7 @@
 #include <rte_string_fns.h>
 #include <rte_acl.h>
 #include <rte_tailq.h>
+#include <rte_vect.h>
 
 #include "acl.h"
 
@@ -114,14 +115,14 @@ acl_check_alg_arm(enum rte_acl_classify_alg alg)
 {
 	if (alg == RTE_ACL_CLASSIFY_NEON) {
 #if defined(RTE_ARCH_ARM64)
-		return 0;
+		if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_128)
+			return 0;
 #elif defined(RTE_ARCH_ARM)
-		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
+		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON) &&
+				rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_128)
 			return 0;
-		return -ENOTSUP;
-#else
-		return -ENOTSUP;
 #endif
+		return -ENOTSUP;
 	}
 
 	return -EINVAL;
@@ -136,15 +137,26 @@ acl_check_alg_ppc(enum rte_acl_classify_alg alg)
 {
 	if (alg == RTE_ACL_CLASSIFY_ALTIVEC) {
 #if defined(RTE_ARCH_PPC_64)
-		return 0;
-#else
-		return -ENOTSUP;
+		if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_128)
+			return 0;
 #endif
+		return -ENOTSUP;
 	}
 
 	return -EINVAL;
 }
 
+#ifdef CC_AVX512_SUPPORT
+static int
+acl_check_avx512_cpu_flags(void)
+{
+	return (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) &&
+			rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) &&
+			rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512CD) &&
+			rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW));
+}
+#endif
+
 /*
  * Helper function for acl_check_alg.
  * Check support for x86 specific classify methods.
@@ -152,13 +164,19 @@ acl_check_alg_ppc(enum rte_acl_classify_alg alg)
 static int
 acl_check_alg_x86(enum rte_acl_classify_alg alg)
 {
-	if (alg == RTE_ACL_CLASSIFY_AVX512X16 ||
-			alg == RTE_ACL_CLASSIFY_AVX512X32) {
+	if (alg == RTE_ACL_CLASSIFY_AVX512X32) {
 #ifdef CC_AVX512_SUPPORT
-		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) &&
-			rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) &&
-			rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512CD) &&
-			rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW))
+		if (acl_check_avx512_cpu_flags() != 0 &&
+			rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_512)
+			return 0;
+#endif
+		return -ENOTSUP;
+	}
+
+	if (alg == RTE_ACL_CLASSIFY_AVX512X16) {
+#ifdef CC_AVX512_SUPPORT
+		if (acl_check_avx512_cpu_flags() != 0 &&
+			rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256)
 			return 0;
 #endif
 		return -ENOTSUP;
@@ -166,7 +184,8 @@ acl_check_alg_x86(enum rte_acl_classify_alg alg)
 
 	if (alg == RTE_ACL_CLASSIFY_AVX2) {
 #ifdef CC_AVX2_SUPPORT
-		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
+		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) &&
+				rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256)
 			return 0;
 #endif
 		return -ENOTSUP;
@@ -174,7 +193,8 @@ acl_check_alg_x86(enum rte_acl_classify_alg alg)
 
 	if (alg == RTE_ACL_CLASSIFY_SSE) {
 #ifdef RTE_ARCH_X86
-		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
+		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1) &&
+				rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_128)
 			return 0;
 #endif
 		return -ENOTSUP;
@@ -226,6 +246,7 @@ acl_get_best_alg(void)
 #elif defined(RTE_ARCH_PPC_64)
 		RTE_ACL_CLASSIFY_ALTIVEC,
 #elif defined(RTE_ARCH_X86)
+		RTE_ACL_CLASSIFY_AVX512X32,
 		RTE_ACL_CLASSIFY_AVX512X16,
 		RTE_ACL_CLASSIFY_AVX2,
 		RTE_ACL_CLASSIFY_SSE,
diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h
index 1bfed00743..f7f5f08701 100644
--- a/lib/librte_acl/rte_acl.h
+++ b/lib/librte_acl/rte_acl.h
@@ -329,6 +329,7 @@ rte_acl_classify_alg(const struct rte_acl_ctx *ctx,
  *   New default classify algorithm for given ACL context.
  *   It is the caller responsibility to ensure that the value refers to the
  *   existing algorithm, and that it could be run on the given CPU.
+ *   The max SIMD bitwidth value in EAL must also allow for the chosen algorithm.
  * @return
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOTSUP requested algorithm is not supported by given platform.
-- 
2.23.0


  parent reply	other threads:[~2020-10-19 13:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-19 13:48 [dpdk-dev] [PATCH v10 00/18] add max SIMD bitwidth to EAL David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 01/18] eal: control max SIMD bitwidth David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 02/18] doc: describe how to enable AVX512 David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 03/18] net/i40e: check max SIMD bitwidth David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 04/18] net/axgbe: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 05/18] net/bnxt: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 06/18] net/enic: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 07/18] net/fm10k: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 08/18] net/iavf: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 09/18] net/ice: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 10/18] net/ixgbe: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 11/18] net/mlx5: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 12/18] net/virtio: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 13/18] distributor: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 14/18] member: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 15/18] efd: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 16/18] net: " David Marchand
2020-10-19 13:48 ` [dpdk-dev] [PATCH v10 17/18] node: choose vector path at runtime David Marchand
2020-10-19 13:48 ` David Marchand [this message]
2020-10-19 14:41 ` [dpdk-dev] [PATCH v10 00/18] add max SIMD bitwidth to EAL David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201019134858.32507-19-david.marchand@redhat.com \
    --to=david.marchand@redhat.com \
    --cc=bruce.richardson@intel.com \
    --cc=ciara.power@intel.com \
    --cc=dev@dpdk.org \
    --cc=drc@linux.vnet.ibm.com \
    --cc=jerinj@marvell.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=ruifeng.wang@arm.com \
    --cc=viktorin@rehivetech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git