From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 970E2B39F for ; Mon, 25 Aug 2014 18:27:37 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 25 Aug 2014 09:31:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,397,1406617200"; d="scan'208";a="592998313" Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157]) by orsmga002.jf.intel.com with ESMTP; 25 Aug 2014 09:30:33 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.158]) by IRSMSX103.ger.corp.intel.com ([169.254.3.112]) with mapi id 14.03.0195.001; Mon, 25 Aug 2014 17:30:05 +0100 From: "Ananyev, Konstantin" To: Neil Horman , "dev@dpdk.org" Thread-Topic: [PATCHv3] librte_acl make it build/work for 'default' target Thread-Index: AQHPvXyxpI9Dyrn+z0KyIlaoUWuXzZvhe20w Date: Mon, 25 Aug 2014 16:30:05 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772582135D369@IRSMSX105.ger.corp.intel.com> References: <1407436263-9360-1-git-send-email-konstantin.ananyev@intel.com> <1408652100-29217-1-git-send-email-nhorman@tuxdriver.com> In-Reply-To: <1408652100-29217-1-git-send-email-nhorman@tuxdriver.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCHv3] librte_acl make it build/work for 'default' target X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Aug 2014 16:27:39 -0000 Hi Neil, > -----Original Message----- > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Thursday, August 21, 2014 9:15 PM > To: dev@dpdk.org > Cc: Ananyev, Konstantin; thomas.monjalon@6wind.com; Neil Horman > Subject: [PATCHv3] librte_acl make it build/work for 'default' target >=20 > Make ACL library to build/work on 'default' architecture: > - make rte_acl_classify_scalar really scalar > (make sure it wouldn't use sse4 instrincts through resolve_priority()). > - Provide two versions of rte_acl_classify code path: > rte_acl_classify_sse() - could be build and used only on systems with s= se4.2 > and upper, return -ENOTSUP on lower arch. > rte_acl_classify_scalar() - a slower version, but could be build and us= ed > on all systems. > - keep common code shared between these two codepaths. >=20 > v2 chages: > run-time selection of most appropriate code-path for given ISA. > By default the highest supprted one is selected. > User can still override that selection by manually assigning new value t= o > the global function pointer rte_acl_default_classify. > rte_acl_classify() becomes a macro calling whatever rte_acl_default_clas= sify > points to. >=20 I see you decided not to wait for me and fix everything by yourself :) > V3 Changes > Updated classify pointer to be a function so as to better preserve ABI As I said in my previous mail it generates extra jump... Though from numbers I got the performance impact is negligible: < 1%. So I suppose, I don't have a good enough reason to object :) Though I still think we better keep rte_acl_classify_scalar() publically a= vailable (same as we do for rte acl_classify_sse()): First of all keep rte_acl_classify_scalar() is already part of our public = API. Also, as I remember, one of the customers explicitly asked for scalar versi= on and they planned to call it directly. Plus using rte_acl_select_classify() to always switch between implementatio= ns is not always handy: - it is global, which means that we can't simultaneously use classify_scal= ar() and classify_sse() for 2 different ACL contexts. =20 - to properly support such switching we then will need to support something= like (see app/test/test_acl.c below): old_alg =3D rte_acl_get_classify(); rte_acl_select_classify(new_alg); ... rte_acl_select_classify(old_alg);=20 =20 > REmoved macro definitions for match check functions to make them static = inline More comments inlined below. Thanks Konstantin >=20 > Signed-off-by: Neil Horman > --- > app/test-acl/main.c | 13 +- > app/test/test_acl.c | 12 +- > lib/librte_acl/Makefile | 5 +- > lib/librte_acl/acl_bld.c | 5 +- > lib/librte_acl/acl_match_check.h | 83 ++++ > lib/librte_acl/acl_run.c | 944 ---------------------------------= ------ > lib/librte_acl/acl_run.h | 220 +++++++++ > lib/librte_acl/acl_run_scalar.c | 198 ++++++++ > lib/librte_acl/acl_run_sse.c | 627 ++++++++++++++++++++++++++ > lib/librte_acl/rte_acl.c | 46 ++ > lib/librte_acl/rte_acl.h | 26 +- > 11 files changed, 1216 insertions(+), 963 deletions(-) > create mode 100644 lib/librte_acl/acl_match_check.h > delete mode 100644 lib/librte_acl/acl_run.c > create mode 100644 lib/librte_acl/acl_run.h > create mode 100644 lib/librte_acl/acl_run_scalar.c > create mode 100644 lib/librte_acl/acl_run_sse.c >=20 > diff --git a/app/test-acl/main.c b/app/test-acl/main.c > index d654409..a77f47d 100644 > --- a/app/test-acl/main.c > +++ b/app/test-acl/main.c > @@ -787,6 +787,10 @@ acx_init(void) > /* perform build. */ > ret =3D rte_acl_build(config.acx, &cfg); >=20 > + /* setup default rte_acl_classify */ > + if (config.scalar) > + rte_acl_select_classify(ACL_CLASSIFY_SCALAR); > + > dump_verbose(DUMP_NONE, stdout, > "rte_acl_build(%u) finished with %d\n", > config.bld_categories, ret); > @@ -815,13 +819,8 @@ search_ip5tuples_once(uint32_t categories, uint32_t = step, int scalar) > v +=3D config.trace_sz; > } >=20 > - if (scalar !=3D 0) > - ret =3D rte_acl_classify_scalar(config.acx, data, > - results, n, categories); > - > - else > - ret =3D rte_acl_classify(config.acx, data, > - results, n, categories); > + ret =3D rte_acl_classify(config.acx, data, results, > + n, categories); >=20 > if (ret !=3D 0) > rte_exit(ret, "classify for ipv%c_5tuples returns %d\n", > diff --git a/app/test/test_acl.c b/app/test/test_acl.c > index 869f6d3..2fcef6e 100644 > --- a/app/test/test_acl.c > +++ b/app/test/test_acl.c > @@ -148,7 +148,8 @@ test_classify_run(struct rte_acl_ctx *acx) > } >=20 > /* make a quick check for scalar */ > - ret =3D rte_acl_classify_scalar(acx, data, results, > + rte_acl_select_classify(ACL_CLASSIFY_SCALAR); > + ret =3D rte_acl_classify(acx, data, results, > RTE_DIM(acl_test_data), RTE_ACL_MAX_CATEGORIES); As I said above, that doesn't seem correct: we set rte_acl_default_classify= =3D rte_acl_classify_scalar and never restore it back to the original valu= e. To support it properly, we need to: old_alg =3D rte_acl_get_classify(); rte_acl_select_classify(new_alg); ... rte_acl_select_classify(old_alg); Make all this just to keep UT valid seems like a big hassle to me. So I said above - probably better just leave it to call rte_acl_classify_sc= alar() directly. > if (ret !=3D 0) { > printf("Line %i: SSE classify failed!\n", __LINE__); > @@ -362,7 +363,8 @@ test_invalid_layout(void) > } >=20 > /* classify tuples (scalar) */ > - ret =3D rte_acl_classify_scalar(acx, data, results, > + rte_acl_select_classify(ACL_CLASSIFY_SCALAR); > + ret =3D rte_acl_classify(acx, data, results, > RTE_DIM(results), 1); > if (ret !=3D 0) { > printf("Line %i: Scalar classify failed!\n", __LINE__); > @@ -850,7 +852,8 @@ test_invalid_parameters(void) > /* scalar classify test */ >=20 > /* cover zero categories in classify (should not fail) */ > - result =3D rte_acl_classify_scalar(acx, NULL, NULL, 0, 0); > + rte_acl_select_classify(ACL_CLASSIFY_SCALAR); > + result =3D rte_acl_classify(acx, NULL, NULL, 0, 0); > if (result !=3D 0) { > printf("Line %i: Scalar classify with zero categories " > "failed!\n", __LINE__); > @@ -859,7 +862,8 @@ test_invalid_parameters(void) > } >=20 > /* cover invalid but positive categories in classify */ > - result =3D rte_acl_classify_scalar(acx, NULL, NULL, 0, 3); > + rte_acl_select_classify(ACL_CLASSIFY_SCALAR); > + result =3D rte_acl_classify(acx, NULL, NULL, 0, 3); > if (result =3D=3D 0) { > printf("Line %i: Scalar classify with 3 categories " > "should have failed!\n", __LINE__); > diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile > index 4fe4593..65e566d 100644 > --- a/lib/librte_acl/Makefile > +++ b/lib/librte_acl/Makefile > @@ -43,7 +43,10 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) +=3D tb_mem.c > SRCS-$(CONFIG_RTE_LIBRTE_ACL) +=3D rte_acl.c > SRCS-$(CONFIG_RTE_LIBRTE_ACL) +=3D acl_bld.c > SRCS-$(CONFIG_RTE_LIBRTE_ACL) +=3D acl_gen.c > -SRCS-$(CONFIG_RTE_LIBRTE_ACL) +=3D acl_run.c > +SRCS-$(CONFIG_RTE_LIBRTE_ACL) +=3D acl_run_scalar.c > +SRCS-$(CONFIG_RTE_LIBRTE_ACL) +=3D acl_run_sse.c > + > +CFLAGS_acl_run_sse.o +=3D -msse4.1 >=20 > # install this header file > SYMLINK-$(CONFIG_RTE_LIBRTE_ACL)-include :=3D rte_acl_osdep.h > diff --git a/lib/librte_acl/acl_bld.c b/lib/librte_acl/acl_bld.c > index 873447b..09d58ea 100644 > --- a/lib/librte_acl/acl_bld.c > +++ b/lib/librte_acl/acl_bld.c > @@ -31,7 +31,6 @@ > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE= . > */ >=20 > -#include > #include > #include "tb_mem.h" > #include "acl.h" > @@ -1480,8 +1479,8 @@ acl_calc_wildness(struct rte_acl_build_rule *head, >=20 > switch (rule->config->defs[n].type) { > case RTE_ACL_FIELD_TYPE_BITMASK: > - wild =3D (size - > - _mm_popcnt_u32(fld->mask_range.u8)) / > + wild =3D (size - __builtin_popcount( > + fld->mask_range.u8)) / > size; > break; >=20 > diff --git a/lib/librte_acl/acl_match_check.h b/lib/librte_acl/acl_match_= check.h > new file mode 100644 > index 0000000..4dc1982 > --- /dev/null > +++ b/lib/librte_acl/acl_match_check.h As a nit: we probably don't need a special header just for one function and= can place it inside acl_run.h. > @@ -0,0 +1,83 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyrig= ht > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS F= OR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGH= T > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTA= L, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF US= E, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON A= NY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE U= SE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE= . > + */ > + > +#ifndef _ACL_MATCH_CHECK_H_ > +#define _ACL_MATCH_CHECK_H_ > + > +/* > + * Detect matches. If a match node transition is found, then this trie > + * traversal is complete and fill the slot with the next trie > + * to be processed. > + */ > +static inline uint64_t > +acl_match_check(uint64_t transition, int slot, > + const struct rte_acl_ctx *ctx, struct parms *parms, > + struct acl_flow_data *flows, void (*resolve_priority)( > + uint64_t transition, int n, const struct rte_acl_ctx *ctx, > + struct parms *parms, const struct rte_acl_match_results *p, > + uint32_t categories)) Ugh, that's really hard to read. Can we create a typedef for resolve_priority function type: typedef void (*resolve_priority_t)(uint64_t, int, const struct rte_acl_ctx *ctx, struct parms *, const struct rte_acl_match_results *, uint32_t); And use it here? > +{ > + const struct rte_acl_match_results *p; > + > + p =3D (const struct rte_acl_match_results *) > + (flows->trans + ctx->match_index); > + > + if (transition & RTE_ACL_NODE_MATCH) { > + > + /* Remove flags from index and decrement active traversals */ > + transition &=3D RTE_ACL_NODE_INDEX; > + flows->started--; > + > + /* Resolve priorities for this trie and running results */ > + if (flows->categories =3D=3D 1) > + resolve_single_priority(transition, slot, ctx, > + parms, p); > + else > + resolve_priority(transition, slot, ctx, parms, > + p, flows->categories); > + > + /* Count down completed tries for this search request */ > + parms[slot].cmplt->count--; > + > + /* Fill the slot with the next trie or idle trie */ > + transition =3D acl_start_next_trie(flows, parms, slot, ctx); > + > + } else if (transition =3D=3D ctx->idle) { > + /* reset indirection table for idle slots */ > + parms[slot].data_index =3D idle; > + } > + > + return transition; > +} > + > +#endif > diff --git a/lib/librte_acl/acl_run.c b/lib/librte_acl/acl_run.c > deleted file mode 100644 > index e3d9fc1..0000000 > --- a/lib/librte_acl/acl_run.c > +++ /dev/null > @@ -1,944 +0,0 @@ > -/*- > - * BSD LICENSE > - * > - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > - * All rights reserved. > - * > - * Redistribution and use in source and binary forms, with or without > - * modification, are permitted provided that the following conditions > - * are met: > - * > - * * Redistributions of source code must retain the above copyright > - * notice, this list of conditions and the following disclaimer. > - * * Redistributions in binary form must reproduce the above copyrig= ht > - * notice, this list of conditions and the following disclaimer in > - * the documentation and/or other materials provided with the > - * distribution. > - * * Neither the name of Intel Corporation nor the names of its > - * contributors may be used to endorse or promote products derived > - * from this software without specific prior written permission. > - * > - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS F= OR > - * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGH= T > - * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTA= L, > - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF US= E, > - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON A= NY > - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE U= SE > - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE= . > - */ > - > -#include > -#include "acl_vect.h" > -#include "acl.h" > - > -#define MAX_SEARCHES_SSE8 8 > -#define MAX_SEARCHES_SSE4 4 > -#define MAX_SEARCHES_SSE2 2 > -#define MAX_SEARCHES_SCALAR 2 > - > -#define GET_NEXT_4BYTES(prm, idx) \ > - (*((const int32_t *)((prm)[(idx)].data + *(prm)[idx].data_index++))) > - > - > -#define RTE_ACL_NODE_INDEX ((uint32_t)~RTE_ACL_NODE_TYPE) > - > -#define SCALAR_QRANGE_MULT 0x01010101 > -#define SCALAR_QRANGE_MASK 0x7f7f7f7f > -#define SCALAR_QRANGE_MIN 0x80808080 > - > -enum { > - SHUFFLE32_SLOT1 =3D 0xe5, > - SHUFFLE32_SLOT2 =3D 0xe6, > - SHUFFLE32_SLOT3 =3D 0xe7, > - SHUFFLE32_SWAP64 =3D 0x4e, > -}; > - > -/* > - * Structure to manage N parallel trie traversals. > - * The runtime trie traversal routines can process 8, 4, or 2 tries > - * in parallel. Each packet may require multiple trie traversals (up to = 4). > - * This structure is used to fill the slots (0 to n-1) for parallel proc= essing > - * with the trie traversals needed for each packet. > - */ > -struct acl_flow_data { > - uint32_t num_packets; > - /* number of packets processed */ > - uint32_t started; > - /* number of trie traversals in progress */ > - uint32_t trie; > - /* current trie index (0 to N-1) */ > - uint32_t cmplt_size; > - uint32_t total_packets; > - uint32_t categories; > - /* number of result categories per packet. */ > - /* maximum number of packets to process */ > - const uint64_t *trans; > - const uint8_t **data; > - uint32_t *results; > - struct completion *last_cmplt; > - struct completion *cmplt_array; > -}; > - > -/* > - * Structure to maintain running results for > - * a single packet (up to 4 tries). > - */ > -struct completion { > - uint32_t *results; /* running results. */ > - int32_t priority[RTE_ACL_MAX_CATEGORIES]; /* running priorities. */ > - uint32_t count; /* num of remaining tries *= / > - /* true for allocated struct */ > -} __attribute__((aligned(XMM_SIZE))); > - > -/* > - * One parms structure for each slot in the search engine. > - */ > -struct parms { > - const uint8_t *data; > - /* input data for this packet */ > - const uint32_t *data_index; > - /* data indirection for this trie */ > - struct completion *cmplt; > - /* completion data for this packet */ > -}; > - > -/* > - * Define an global idle node for unused engine slots > - */ > -static const uint32_t idle[UINT8_MAX + 1]; > - > -static const rte_xmm_t mm_type_quad_range =3D { > - .u32 =3D { > - RTE_ACL_NODE_QRANGE, > - RTE_ACL_NODE_QRANGE, > - RTE_ACL_NODE_QRANGE, > - RTE_ACL_NODE_QRANGE, > - }, > -}; > - > -static const rte_xmm_t mm_type_quad_range64 =3D { > - .u32 =3D { > - RTE_ACL_NODE_QRANGE, > - RTE_ACL_NODE_QRANGE, > - 0, > - 0, > - }, > -}; > - > -static const rte_xmm_t mm_shuffle_input =3D { > - .u32 =3D {0x00000000, 0x04040404, 0x08080808, 0x0c0c0c0c}, > -}; > - > -static const rte_xmm_t mm_shuffle_input64 =3D { > - .u32 =3D {0x00000000, 0x04040404, 0x80808080, 0x80808080}, > -}; > - > -static const rte_xmm_t mm_ones_16 =3D { > - .u16 =3D {1, 1, 1, 1, 1, 1, 1, 1}, > -}; > - > -static const rte_xmm_t mm_bytes =3D { > - .u32 =3D {UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX}, > -}; > - > -static const rte_xmm_t mm_bytes64 =3D { > - .u32 =3D {UINT8_MAX, UINT8_MAX, 0, 0}, > -}; > - > -static const rte_xmm_t mm_match_mask =3D { > - .u32 =3D { > - RTE_ACL_NODE_MATCH, > - RTE_ACL_NODE_MATCH, > - RTE_ACL_NODE_MATCH, > - RTE_ACL_NODE_MATCH, > - }, > -}; > - > -static const rte_xmm_t mm_match_mask64 =3D { > - .u32 =3D { > - RTE_ACL_NODE_MATCH, > - 0, > - RTE_ACL_NODE_MATCH, > - 0, > - }, > -}; > - > -static const rte_xmm_t mm_index_mask =3D { > - .u32 =3D { > - RTE_ACL_NODE_INDEX, > - RTE_ACL_NODE_INDEX, > - RTE_ACL_NODE_INDEX, > - RTE_ACL_NODE_INDEX, > - }, > -}; > - > -static const rte_xmm_t mm_index_mask64 =3D { > - .u32 =3D { > - RTE_ACL_NODE_INDEX, > - RTE_ACL_NODE_INDEX, > - 0, > - 0, > - }, > -}; > - > -/* > - * Allocate a completion structure to manage the tries for a packet. > - */ > -static inline struct completion * > -alloc_completion(struct completion *p, uint32_t size, uint32_t tries, > - uint32_t *results) > -{ > - uint32_t n; > - > - for (n =3D 0; n < size; n++) { > - > - if (p[n].count =3D=3D 0) { > - > - /* mark as allocated and set number of tries. */ > - p[n].count =3D tries; > - p[n].results =3D results; > - return &(p[n]); > - } > - } > - > - /* should never get here */ > - return NULL; > -} > - > -/* > - * Resolve priority for a single result trie. > - */ > -static inline void > -resolve_single_priority(uint64_t transition, int n, > - const struct rte_acl_ctx *ctx, struct parms *parms, > - const struct rte_acl_match_results *p) > -{ > - if (parms[n].cmplt->count =3D=3D ctx->num_tries || > - parms[n].cmplt->priority[0] <=3D > - p[transition].priority[0]) { > - > - parms[n].cmplt->priority[0] =3D p[transition].priority[0]; > - parms[n].cmplt->results[0] =3D p[transition].results[0]; > - } > - > - parms[n].cmplt->count--; > -} > - > -/* > - * Resolve priority for multiple results. This consists comparing > - * the priority of the current traversal with the running set of > - * results for the packet. For each result, keep a running array of > - * the result (rule number) and its priority for each category. > - */ > -static inline void > -resolve_priority(uint64_t transition, int n, const struct rte_acl_ctx *c= tx, > - struct parms *parms, const struct rte_acl_match_results *p, > - uint32_t categories) > -{ > - uint32_t x; > - xmm_t results, priority, results1, priority1, selector; > - xmm_t *saved_results, *saved_priority; > - > - for (x =3D 0; x < categories; x +=3D RTE_ACL_RESULTS_MULTIPLIER) { > - > - saved_results =3D (xmm_t *)(&parms[n].cmplt->results[x]); > - saved_priority =3D > - (xmm_t *)(&parms[n].cmplt->priority[x]); > - > - /* get results and priorities for completed trie */ > - results =3D MM_LOADU((const xmm_t *)&p[transition].results[x]); > - priority =3D MM_LOADU((const xmm_t *)&p[transition].priority[x]); > - > - /* if this is not the first completed trie */ > - if (parms[n].cmplt->count !=3D ctx->num_tries) { > - > - /* get running best results and their priorities */ > - results1 =3D MM_LOADU(saved_results); > - priority1 =3D MM_LOADU(saved_priority); > - > - /* select results that are highest priority */ > - selector =3D MM_CMPGT32(priority1, priority); > - results =3D MM_BLENDV8(results, results1, selector); > - priority =3D MM_BLENDV8(priority, priority1, selector); > - } > - > - /* save running best results and their priorities */ > - MM_STOREU(saved_results, results); > - MM_STOREU(saved_priority, priority); > - } > - > - /* Count down completed tries for this search request */ > - parms[n].cmplt->count--; > -} > - > -/* > - * Routine to fill a slot in the parallel trie traversal array (parms) f= rom > - * the list of packets (flows). > - */ > -static inline uint64_t > -acl_start_next_trie(struct acl_flow_data *flows, struct parms *parms, in= t n, > - const struct rte_acl_ctx *ctx) > -{ > - uint64_t transition; > - > - /* if there are any more packets to process */ > - if (flows->num_packets < flows->total_packets) { > - parms[n].data =3D flows->data[flows->num_packets]; > - parms[n].data_index =3D ctx->trie[flows->trie].data_index; > - > - /* if this is the first trie for this packet */ > - if (flows->trie =3D=3D 0) { > - flows->last_cmplt =3D alloc_completion(flows->cmplt_array, > - flows->cmplt_size, ctx->num_tries, > - flows->results + > - flows->num_packets * flows->categories); > - } > - > - /* set completion parameters and starting index for this slot */ > - parms[n].cmplt =3D flows->last_cmplt; > - transition =3D > - flows->trans[parms[n].data[*parms[n].data_index++] + > - ctx->trie[flows->trie].root_index]; > - > - /* > - * if this is the last trie for this packet, > - * then setup next packet. > - */ > - flows->trie++; > - if (flows->trie >=3D ctx->num_tries) { > - flows->trie =3D 0; > - flows->num_packets++; > - } > - > - /* keep track of number of active trie traversals */ > - flows->started++; > - > - /* no more tries to process, set slot to an idle position */ > - } else { > - transition =3D ctx->idle; > - parms[n].data =3D (const uint8_t *)idle; > - parms[n].data_index =3D idle; > - } > - return transition; > -} > - > -/* > - * Detect matches. If a match node transition is found, then this trie > - * traversal is complete and fill the slot with the next trie > - * to be processed. > - */ > -static inline uint64_t > -acl_match_check_transition(uint64_t transition, int slot, > - const struct rte_acl_ctx *ctx, struct parms *parms, > - struct acl_flow_data *flows) > -{ > - const struct rte_acl_match_results *p; > - > - p =3D (const struct rte_acl_match_results *) > - (flows->trans + ctx->match_index); > - > - if (transition & RTE_ACL_NODE_MATCH) { > - > - /* Remove flags from index and decrement active traversals */ > - transition &=3D RTE_ACL_NODE_INDEX; > - flows->started--; > - > - /* Resolve priorities for this trie and running results */ > - if (flows->categories =3D=3D 1) > - resolve_single_priority(transition, slot, ctx, > - parms, p); > - else > - resolve_priority(transition, slot, ctx, parms, p, > - flows->categories); > - > - /* Fill the slot with the next trie or idle trie */ > - transition =3D acl_start_next_trie(flows, parms, slot, ctx); > - > - } else if (transition =3D=3D ctx->idle) { > - /* reset indirection table for idle slots */ > - parms[slot].data_index =3D idle; > - } > - > - return transition; > -} > - > -/* > - * Extract transitions from an XMM register and check for any matches > - */ > -static void > -acl_process_matches(xmm_t *indicies, int slot, const struct rte_acl_ctx = *ctx, > - struct parms *parms, struct acl_flow_data *flows) > -{ > - uint64_t transition1, transition2; > - > - /* extract transition from low 64 bits. */ > - transition1 =3D MM_CVT64(*indicies); > - > - /* extract transition from high 64 bits. */ > - *indicies =3D MM_SHUFFLE32(*indicies, SHUFFLE32_SWAP64); > - transition2 =3D MM_CVT64(*indicies); > - > - transition1 =3D acl_match_check_transition(transition1, slot, ctx, > - parms, flows); > - transition2 =3D acl_match_check_transition(transition2, slot + 1, ctx, > - parms, flows); > - > - /* update indicies with new transitions. */ > - *indicies =3D MM_SET64(transition2, transition1); > -} > - > -/* > - * Check for a match in 2 transitions (contained in SSE register) > - */ > -static inline void > -acl_match_check_x2(int slot, const struct rte_acl_ctx *ctx, struct parms= *parms, > - struct acl_flow_data *flows, xmm_t *indicies, xmm_t match_mask) > -{ > - xmm_t temp; > - > - temp =3D MM_AND(match_mask, *indicies); > - while (!MM_TESTZ(temp, temp)) { > - acl_process_matches(indicies, slot, ctx, parms, flows); > - temp =3D MM_AND(match_mask, *indicies); > - } > -} > - > -/* > - * Check for any match in 4 transitions (contained in 2 SSE registers) > - */ > -static inline void > -acl_match_check_x4(int slot, const struct rte_acl_ctx *ctx, struct parms= *parms, > - struct acl_flow_data *flows, xmm_t *indicies1, xmm_t *indicies2, > - xmm_t match_mask) > -{ > - xmm_t temp; > - > - /* put low 32 bits of each transition into one register */ > - temp =3D (xmm_t)MM_SHUFFLEPS((__m128)*indicies1, (__m128)*indicies2, > - 0x88); > - /* test for match node */ > - temp =3D MM_AND(match_mask, temp); > - > - while (!MM_TESTZ(temp, temp)) { > - acl_process_matches(indicies1, slot, ctx, parms, flows); > - acl_process_matches(indicies2, slot + 2, ctx, parms, flows); > - > - temp =3D (xmm_t)MM_SHUFFLEPS((__m128)*indicies1, > - (__m128)*indicies2, > - 0x88); > - temp =3D MM_AND(match_mask, temp); > - } > -} > - > -/* > - * Calculate the address of the next transition for > - * all types of nodes. Note that only DFA nodes and range > - * nodes actually transition to another node. Match > - * nodes don't move. > - */ > -static inline xmm_t > -acl_calc_addr(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input, > - xmm_t ones_16, xmm_t bytes, xmm_t type_quad_range, > - xmm_t *indicies1, xmm_t *indicies2) > -{ > - xmm_t addr, node_types, temp; > - > - /* > - * Note that no transition is done for a match > - * node and therefore a stream freezes when > - * it reaches a match. > - */ > - > - /* Shuffle low 32 into temp and high 32 into indicies2 */ > - temp =3D (xmm_t)MM_SHUFFLEPS((__m128)*indicies1, (__m128)*indicies2, > - 0x88); > - *indicies2 =3D (xmm_t)MM_SHUFFLEPS((__m128)*indicies1, > - (__m128)*indicies2, 0xdd); > - > - /* Calc node type and node addr */ > - node_types =3D MM_ANDNOT(index_mask, temp); > - addr =3D MM_AND(index_mask, temp); > - > - /* > - * Calc addr for DFAs - addr =3D dfa_index + input_byte > - */ > - > - /* mask for DFA type (0) nodes */ > - temp =3D MM_CMPEQ32(node_types, MM_XOR(node_types, node_types)); > - > - /* add input byte to DFA position */ > - temp =3D MM_AND(temp, bytes); > - temp =3D MM_AND(temp, next_input); > - addr =3D MM_ADD32(addr, temp); > - > - /* > - * Calc addr for Range nodes -> range_index + range(input) > - */ > - node_types =3D MM_CMPEQ32(node_types, type_quad_range); > - > - /* > - * Calculate number of range boundaries that are less than the > - * input value. Range boundaries for each node are in signed 8 bit, > - * ordered from -128 to 127 in the indicies2 register. > - * This is effectively a popcnt of bytes that are greater than the > - * input byte. > - */ > - > - /* shuffle input byte to all 4 positions of 32 bit value */ > - temp =3D MM_SHUFFLE8(next_input, shuffle_input); > - > - /* check ranges */ > - temp =3D MM_CMPGT8(temp, *indicies2); > - > - /* convert -1 to 1 (bytes greater than input byte */ > - temp =3D MM_SIGN8(temp, temp); > - > - /* horizontal add pairs of bytes into words */ > - temp =3D MM_MADD8(temp, temp); > - > - /* horizontal add pairs of words into dwords */ > - temp =3D MM_MADD16(temp, ones_16); > - > - /* mask to range type nodes */ > - temp =3D MM_AND(temp, node_types); > - > - /* add index into node position */ > - return MM_ADD32(addr, temp); > -} > - > -/* > - * Process 4 transitions (in 2 SIMD registers) in parallel > - */ > -static inline xmm_t > -transition4(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input, > - xmm_t ones_16, xmm_t bytes, xmm_t type_quad_range, > - const uint64_t *trans, xmm_t *indicies1, xmm_t *indicies2) > -{ > - xmm_t addr; > - uint64_t trans0, trans2; > - > - /* Calculate the address (array index) for all 4 transitions. */ > - > - addr =3D acl_calc_addr(index_mask, next_input, shuffle_input, ones_16, > - bytes, type_quad_range, indicies1, indicies2); > - > - /* Gather 64 bit transitions and pack back into 2 registers. */ > - > - trans0 =3D trans[MM_CVT32(addr)]; > - > - /* get slot 2 */ > - > - /* {x0, x1, x2, x3} -> {x2, x1, x2, x3} */ > - addr =3D MM_SHUFFLE32(addr, SHUFFLE32_SLOT2); > - trans2 =3D trans[MM_CVT32(addr)]; > - > - /* get slot 1 */ > - > - /* {x2, x1, x2, x3} -> {x1, x1, x2, x3} */ > - addr =3D MM_SHUFFLE32(addr, SHUFFLE32_SLOT1); > - *indicies1 =3D MM_SET64(trans[MM_CVT32(addr)], trans0); > - > - /* get slot 3 */ > - > - /* {x1, x1, x2, x3} -> {x3, x1, x2, x3} */ > - addr =3D MM_SHUFFLE32(addr, SHUFFLE32_SLOT3); > - *indicies2 =3D MM_SET64(trans[MM_CVT32(addr)], trans2); > - > - return MM_SRL32(next_input, 8); > -} > - > -static inline void > -acl_set_flow(struct acl_flow_data *flows, struct completion *cmplt, > - uint32_t cmplt_size, const uint8_t **data, uint32_t *results, > - uint32_t data_num, uint32_t categories, const uint64_t *trans) > -{ > - flows->num_packets =3D 0; > - flows->started =3D 0; > - flows->trie =3D 0; > - flows->last_cmplt =3D NULL; > - flows->cmplt_array =3D cmplt; > - flows->total_packets =3D data_num; > - flows->categories =3D categories; > - flows->cmplt_size =3D cmplt_size; > - flows->data =3D data; > - flows->results =3D results; > - flows->trans =3D trans; > -} > - > -/* > - * Execute trie traversal with 8 traversals in parallel > - */ > -static inline void > -search_sse_8(const struct rte_acl_ctx *ctx, const uint8_t **data, > - uint32_t *results, uint32_t total_packets, uint32_t categories) > -{ > - int n; > - struct acl_flow_data flows; > - uint64_t index_array[MAX_SEARCHES_SSE8]; > - struct completion cmplt[MAX_SEARCHES_SSE8]; > - struct parms parms[MAX_SEARCHES_SSE8]; > - xmm_t input0, input1; > - xmm_t indicies1, indicies2, indicies3, indicies4; > - > - acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > - total_packets, categories, ctx->trans_table); > - > - for (n =3D 0; n < MAX_SEARCHES_SSE8; n++) { > - cmplt[n].count =3D 0; > - index_array[n] =3D acl_start_next_trie(&flows, parms, n, ctx); > - } > - > - /* > - * indicies1 contains index_array[0,1] > - * indicies2 contains index_array[2,3] > - * indicies3 contains index_array[4,5] > - * indicies4 contains index_array[6,7] > - */ > - > - indicies1 =3D MM_LOADU((xmm_t *) &index_array[0]); > - indicies2 =3D MM_LOADU((xmm_t *) &index_array[2]); > - > - indicies3 =3D MM_LOADU((xmm_t *) &index_array[4]); > - indicies4 =3D MM_LOADU((xmm_t *) &index_array[6]); > - > - /* Check for any matches. */ > - acl_match_check_x4(0, ctx, parms, &flows, > - &indicies1, &indicies2, mm_match_mask.m); > - acl_match_check_x4(4, ctx, parms, &flows, > - &indicies3, &indicies4, mm_match_mask.m); > - > - while (flows.started > 0) { > - > - /* Gather 4 bytes of input data for each stream. */ > - input0 =3D MM_INSERT32(mm_ones_16.m, GET_NEXT_4BYTES(parms, 0), > - 0); > - input1 =3D MM_INSERT32(mm_ones_16.m, GET_NEXT_4BYTES(parms, 4), > - 0); > - > - input0 =3D MM_INSERT32(input0, GET_NEXT_4BYTES(parms, 1), 1); > - input1 =3D MM_INSERT32(input1, GET_NEXT_4BYTES(parms, 5), 1); > - > - input0 =3D MM_INSERT32(input0, GET_NEXT_4BYTES(parms, 2), 2); > - input1 =3D MM_INSERT32(input1, GET_NEXT_4BYTES(parms, 6), 2); > - > - input0 =3D MM_INSERT32(input0, GET_NEXT_4BYTES(parms, 3), 3); > - input1 =3D MM_INSERT32(input1, GET_NEXT_4BYTES(parms, 7), 3); > - > - /* Process the 4 bytes of input on each stream. */ > - > - input0 =3D transition4(mm_index_mask.m, input0, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies1, &indicies2); > - > - input1 =3D transition4(mm_index_mask.m, input1, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies3, &indicies4); > - > - input0 =3D transition4(mm_index_mask.m, input0, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies1, &indicies2); > - > - input1 =3D transition4(mm_index_mask.m, input1, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies3, &indicies4); > - > - input0 =3D transition4(mm_index_mask.m, input0, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies1, &indicies2); > - > - input1 =3D transition4(mm_index_mask.m, input1, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies3, &indicies4); > - > - input0 =3D transition4(mm_index_mask.m, input0, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies1, &indicies2); > - > - input1 =3D transition4(mm_index_mask.m, input1, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies3, &indicies4); > - > - /* Check for any matches. */ > - acl_match_check_x4(0, ctx, parms, &flows, > - &indicies1, &indicies2, mm_match_mask.m); > - acl_match_check_x4(4, ctx, parms, &flows, > - &indicies3, &indicies4, mm_match_mask.m); > - } > -} > - > -/* > - * Execute trie traversal with 4 traversals in parallel > - */ > -static inline void > -search_sse_4(const struct rte_acl_ctx *ctx, const uint8_t **data, > - uint32_t *results, int total_packets, uint32_t categories) > -{ > - int n; > - struct acl_flow_data flows; > - uint64_t index_array[MAX_SEARCHES_SSE4]; > - struct completion cmplt[MAX_SEARCHES_SSE4]; > - struct parms parms[MAX_SEARCHES_SSE4]; > - xmm_t input, indicies1, indicies2; > - > - acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > - total_packets, categories, ctx->trans_table); > - > - for (n =3D 0; n < MAX_SEARCHES_SSE4; n++) { > - cmplt[n].count =3D 0; > - index_array[n] =3D acl_start_next_trie(&flows, parms, n, ctx); > - } > - > - indicies1 =3D MM_LOADU((xmm_t *) &index_array[0]); > - indicies2 =3D MM_LOADU((xmm_t *) &index_array[2]); > - > - /* Check for any matches. */ > - acl_match_check_x4(0, ctx, parms, &flows, > - &indicies1, &indicies2, mm_match_mask.m); > - > - while (flows.started > 0) { > - > - /* Gather 4 bytes of input data for each stream. */ > - input =3D MM_INSERT32(mm_ones_16.m, GET_NEXT_4BYTES(parms, 0), 0); > - input =3D MM_INSERT32(input, GET_NEXT_4BYTES(parms, 1), 1); > - input =3D MM_INSERT32(input, GET_NEXT_4BYTES(parms, 2), 2); > - input =3D MM_INSERT32(input, GET_NEXT_4BYTES(parms, 3), 3); > - > - /* Process the 4 bytes of input on each stream. */ > - input =3D transition4(mm_index_mask.m, input, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies1, &indicies2); > - > - input =3D transition4(mm_index_mask.m, input, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies1, &indicies2); > - > - input =3D transition4(mm_index_mask.m, input, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies1, &indicies2); > - > - input =3D transition4(mm_index_mask.m, input, > - mm_shuffle_input.m, mm_ones_16.m, > - mm_bytes.m, mm_type_quad_range.m, > - flows.trans, &indicies1, &indicies2); > - > - /* Check for any matches. */ > - acl_match_check_x4(0, ctx, parms, &flows, > - &indicies1, &indicies2, mm_match_mask.m); > - } > -} > - > -static inline xmm_t > -transition2(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input, > - xmm_t ones_16, xmm_t bytes, xmm_t type_quad_range, > - const uint64_t *trans, xmm_t *indicies1) > -{ > - uint64_t t; > - xmm_t addr, indicies2; > - > - indicies2 =3D MM_XOR(ones_16, ones_16); > - > - addr =3D acl_calc_addr(index_mask, next_input, shuffle_input, ones_16, > - bytes, type_quad_range, indicies1, &indicies2); > - > - /* Gather 64 bit transitions and pack 2 per register. */ > - > - t =3D trans[MM_CVT32(addr)]; > - > - /* get slot 1 */ > - addr =3D MM_SHUFFLE32(addr, SHUFFLE32_SLOT1); > - *indicies1 =3D MM_SET64(trans[MM_CVT32(addr)], t); > - > - return MM_SRL32(next_input, 8); > -} > - > -/* > - * Execute trie traversal with 2 traversals in parallel. > - */ > -static inline void > -search_sse_2(const struct rte_acl_ctx *ctx, const uint8_t **data, > - uint32_t *results, uint32_t total_packets, uint32_t categories) > -{ > - int n; > - struct acl_flow_data flows; > - uint64_t index_array[MAX_SEARCHES_SSE2]; > - struct completion cmplt[MAX_SEARCHES_SSE2]; > - struct parms parms[MAX_SEARCHES_SSE2]; > - xmm_t input, indicies; > - > - acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > - total_packets, categories, ctx->trans_table); > - > - for (n =3D 0; n < MAX_SEARCHES_SSE2; n++) { > - cmplt[n].count =3D 0; > - index_array[n] =3D acl_start_next_trie(&flows, parms, n, ctx); > - } > - > - indicies =3D MM_LOADU((xmm_t *) &index_array[0]); > - > - /* Check for any matches. */ > - acl_match_check_x2(0, ctx, parms, &flows, &indicies, mm_match_mask64.m)= ; > - > - while (flows.started > 0) { > - > - /* Gather 4 bytes of input data for each stream. */ > - input =3D MM_INSERT32(mm_ones_16.m, GET_NEXT_4BYTES(parms, 0), 0); > - input =3D MM_INSERT32(input, GET_NEXT_4BYTES(parms, 1), 1); > - > - /* Process the 4 bytes of input on each stream. */ > - > - input =3D transition2(mm_index_mask64.m, input, > - mm_shuffle_input64.m, mm_ones_16.m, > - mm_bytes64.m, mm_type_quad_range64.m, > - flows.trans, &indicies); > - > - input =3D transition2(mm_index_mask64.m, input, > - mm_shuffle_input64.m, mm_ones_16.m, > - mm_bytes64.m, mm_type_quad_range64.m, > - flows.trans, &indicies); > - > - input =3D transition2(mm_index_mask64.m, input, > - mm_shuffle_input64.m, mm_ones_16.m, > - mm_bytes64.m, mm_type_quad_range64.m, > - flows.trans, &indicies); > - > - input =3D transition2(mm_index_mask64.m, input, > - mm_shuffle_input64.m, mm_ones_16.m, > - mm_bytes64.m, mm_type_quad_range64.m, > - flows.trans, &indicies); > - > - /* Check for any matches. */ > - acl_match_check_x2(0, ctx, parms, &flows, &indicies, > - mm_match_mask64.m); > - } > -} > - > -/* > - * When processing the transition, rather than using if/else > - * construct, the offset is calculated for DFA and QRANGE and > - * then conditionally added to the address based on node type. > - * This is done to avoid branch mis-predictions. Since the > - * offset is rather simple calculation it is more efficient > - * to do the calculation and do a condition move rather than > - * a conditional branch to determine which calculation to do. > - */ > -static inline uint32_t > -scan_forward(uint32_t input, uint32_t max) > -{ > - return (input =3D=3D 0) ? max : rte_bsf32(input); > -} > - > -static inline uint64_t > -scalar_transition(const uint64_t *trans_table, uint64_t transition, > - uint8_t input) > -{ > - uint32_t addr, index, ranges, x, a, b, c; > - > - /* break transition into component parts */ > - ranges =3D transition >> (sizeof(index) * CHAR_BIT); > - > - /* calc address for a QRANGE node */ > - c =3D input * SCALAR_QRANGE_MULT; > - a =3D ranges | SCALAR_QRANGE_MIN; > - index =3D transition & ~RTE_ACL_NODE_INDEX; > - a -=3D (c & SCALAR_QRANGE_MASK); > - b =3D c & SCALAR_QRANGE_MIN; > - addr =3D transition ^ index; > - a &=3D SCALAR_QRANGE_MIN; > - a ^=3D (ranges ^ b) & (a ^ b); > - x =3D scan_forward(a, 32) >> 3; > - addr +=3D (index =3D=3D RTE_ACL_NODE_DFA) ? input : x; > - > - /* pickup next transition */ > - transition =3D *(trans_table + addr); > - return transition; > -} > - > -int > -rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **d= ata, > - uint32_t *results, uint32_t num, uint32_t categories) > -{ > - int n; > - uint64_t transition0, transition1; > - uint32_t input0, input1; > - struct acl_flow_data flows; > - uint64_t index_array[MAX_SEARCHES_SCALAR]; > - struct completion cmplt[MAX_SEARCHES_SCALAR]; > - struct parms parms[MAX_SEARCHES_SCALAR]; > - > - if (categories !=3D 1 && > - ((RTE_ACL_RESULTS_MULTIPLIER - 1) & categories) !=3D 0) > - return -EINVAL; > - > - acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, num, > - categories, ctx->trans_table); > - > - for (n =3D 0; n < MAX_SEARCHES_SCALAR; n++) { > - cmplt[n].count =3D 0; > - index_array[n] =3D acl_start_next_trie(&flows, parms, n, ctx); > - } > - > - transition0 =3D index_array[0]; > - transition1 =3D index_array[1]; > - > - while (flows.started > 0) { > - > - input0 =3D GET_NEXT_4BYTES(parms, 0); > - input1 =3D GET_NEXT_4BYTES(parms, 1); > - > - for (n =3D 0; n < 4; n++) { > - if (likely((transition0 & RTE_ACL_NODE_MATCH) =3D=3D 0)) > - transition0 =3D scalar_transition(flows.trans, > - transition0, (uint8_t)input0); > - > - input0 >>=3D CHAR_BIT; > - > - if (likely((transition1 & RTE_ACL_NODE_MATCH) =3D=3D 0)) > - transition1 =3D scalar_transition(flows.trans, > - transition1, (uint8_t)input1); > - > - input1 >>=3D CHAR_BIT; > - > - } > - if ((transition0 | transition1) & RTE_ACL_NODE_MATCH) { > - transition0 =3D acl_match_check_transition(transition0, > - 0, ctx, parms, &flows); > - transition1 =3D acl_match_check_transition(transition1, > - 1, ctx, parms, &flows); > - > - } > - } > - return 0; > -} > - > -int > -rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **data, > - uint32_t *results, uint32_t num, uint32_t categories) > -{ > - if (categories !=3D 1 && > - ((RTE_ACL_RESULTS_MULTIPLIER - 1) & categories) !=3D 0) > - return -EINVAL; > - > - if (likely(num >=3D MAX_SEARCHES_SSE8)) > - search_sse_8(ctx, data, results, num, categories); > - else if (num >=3D MAX_SEARCHES_SSE4) > - search_sse_4(ctx, data, results, num, categories); > - else > - search_sse_2(ctx, data, results, num, categories); > - > - return 0; > -} > diff --git a/lib/librte_acl/acl_run.h b/lib/librte_acl/acl_run.h > new file mode 100644 > index 0000000..c39650e > --- /dev/null > +++ b/lib/librte_acl/acl_run.h > @@ -0,0 +1,220 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyrig= ht > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS F= OR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGH= T > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTA= L, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF US= E, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON A= NY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE U= SE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE= . > + */ > + > +#ifndef _ACL_RUN_H_ > +#define _ACL_RUN_H_ > + > +#include > +#include "acl_vect.h" > +#include "acl.h" > + > +#define MAX_SEARCHES_SSE8 8 > +#define MAX_SEARCHES_SSE4 4 > +#define MAX_SEARCHES_SSE2 2 > +#define MAX_SEARCHES_SCALAR 2 > + > +#define GET_NEXT_4BYTES(prm, idx) \ > + (*((const int32_t *)((prm)[(idx)].data + *(prm)[idx].data_index++))) > + > + > +#define RTE_ACL_NODE_INDEX ((uint32_t)~RTE_ACL_NODE_TYPE) > + > +#define SCALAR_QRANGE_MULT 0x01010101 > +#define SCALAR_QRANGE_MASK 0x7f7f7f7f > +#define SCALAR_QRANGE_MIN 0x80808080 > + > +/* > + * Structure to manage N parallel trie traversals. > + * The runtime trie traversal routines can process 8, 4, or 2 tries > + * in parallel. Each packet may require multiple trie traversals (up to = 4). > + * This structure is used to fill the slots (0 to n-1) for parallel proc= essing > + * with the trie traversals needed for each packet. > + */ > +struct acl_flow_data { > + uint32_t num_packets; > + /* number of packets processed */ > + uint32_t started; > + /* number of trie traversals in progress */ > + uint32_t trie; > + /* current trie index (0 to N-1) */ > + uint32_t cmplt_size; > + uint32_t total_packets; > + uint32_t categories; > + /* number of result categories per packet. */ > + /* maximum number of packets to process */ > + const uint64_t *trans; > + const uint8_t **data; > + uint32_t *results; > + struct completion *last_cmplt; > + struct completion *cmplt_array; > +}; > + > +/* > + * Structure to maintain running results for > + * a single packet (up to 4 tries). > + */ > +struct completion { > + uint32_t *results; /* running results. */ > + int32_t priority[RTE_ACL_MAX_CATEGORIES]; /* running priorities. */ > + uint32_t count; /* num of remaining tries *= / > + /* true for allocated struct */ > +} __attribute__((aligned(XMM_SIZE))); > + > +/* > + * One parms structure for each slot in the search engine. > + */ > +struct parms { > + const uint8_t *data; > + /* input data for this packet */ > + const uint32_t *data_index; > + /* data indirection for this trie */ > + struct completion *cmplt; > + /* completion data for this packet */ > +}; > + > +/* > + * Define an global idle node for unused engine slots > + */ > +static const uint32_t idle[UINT8_MAX + 1]; > + > +/* > + * Allocate a completion structure to manage the tries for a packet. > + */ > +static inline struct completion * > +alloc_completion(struct completion *p, uint32_t size, uint32_t tries, > + uint32_t *results) > +{ > + uint32_t n; > + > + for (n =3D 0; n < size; n++) { > + > + if (p[n].count =3D=3D 0) { > + > + /* mark as allocated and set number of tries. */ > + p[n].count =3D tries; > + p[n].results =3D results; > + return &(p[n]); > + } > + } > + > + /* should never get here */ > + return NULL; > +} > + > +/* > + * Resolve priority for a single result trie. > + */ > +static inline void > +resolve_single_priority(uint64_t transition, int n, > + const struct rte_acl_ctx *ctx, struct parms *parms, > + const struct rte_acl_match_results *p) > +{ > + if (parms[n].cmplt->count =3D=3D ctx->num_tries || > + parms[n].cmplt->priority[0] <=3D > + p[transition].priority[0]) { > + > + parms[n].cmplt->priority[0] =3D p[transition].priority[0]; > + parms[n].cmplt->results[0] =3D p[transition].results[0]; > + } > +} > + > +/* > + * Routine to fill a slot in the parallel trie traversal array (parms) f= rom > + * the list of packets (flows). > + */ > +static inline uint64_t > +acl_start_next_trie(struct acl_flow_data *flows, struct parms *parms, in= t n, > + const struct rte_acl_ctx *ctx) > +{ > + uint64_t transition; > + > + /* if there are any more packets to process */ > + if (flows->num_packets < flows->total_packets) { > + parms[n].data =3D flows->data[flows->num_packets]; > + parms[n].data_index =3D ctx->trie[flows->trie].data_index; > + > + /* if this is the first trie for this packet */ > + if (flows->trie =3D=3D 0) { > + flows->last_cmplt =3D alloc_completion(flows->cmplt_array, > + flows->cmplt_size, ctx->num_tries, > + flows->results + > + flows->num_packets * flows->categories); > + } > + > + /* set completion parameters and starting index for this slot */ > + parms[n].cmplt =3D flows->last_cmplt; > + transition =3D > + flows->trans[parms[n].data[*parms[n].data_index++] + > + ctx->trie[flows->trie].root_index]; > + > + /* > + * if this is the last trie for this packet, > + * then setup next packet. > + */ > + flows->trie++; > + if (flows->trie >=3D ctx->num_tries) { > + flows->trie =3D 0; > + flows->num_packets++; > + } > + > + /* keep track of number of active trie traversals */ > + flows->started++; > + > + /* no more tries to process, set slot to an idle position */ > + } else { > + transition =3D ctx->idle; > + parms[n].data =3D (const uint8_t *)idle; > + parms[n].data_index =3D idle; > + } > + return transition; > +} > + > +static inline void > +acl_set_flow(struct acl_flow_data *flows, struct completion *cmplt, > + uint32_t cmplt_size, const uint8_t **data, uint32_t *results, > + uint32_t data_num, uint32_t categories, const uint64_t *trans) > +{ > + flows->num_packets =3D 0; > + flows->started =3D 0; > + flows->trie =3D 0; > + flows->last_cmplt =3D NULL; > + flows->cmplt_array =3D cmplt; > + flows->total_packets =3D data_num; > + flows->categories =3D categories; > + flows->cmplt_size =3D cmplt_size; > + flows->data =3D data; > + flows->results =3D results; > + flows->trans =3D trans; > +} > + > +#endif /* _ACL_RUN_H_ */ > diff --git a/lib/librte_acl/acl_run_scalar.c b/lib/librte_acl/acl_run_sca= lar.c > new file mode 100644 > index 0000000..a59ff17 > --- /dev/null > +++ b/lib/librte_acl/acl_run_scalar.c > @@ -0,0 +1,198 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyrig= ht > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS F= OR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGH= T > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTA= L, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF US= E, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON A= NY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE U= SE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE= . > + */ > + > +#include "acl_run.h" > +#include "acl_match_check.h" > + > +int > +rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **d= ata, > + uint32_t *results, uint32_t num, uint32_t categories); > + > +/* > + * Resolve priority for multiple results (scalar version). > + * This consists comparing the priority of the current traversal with th= e > + * running set of results for the packet. > + * For each result, keep a running array of the result (rule number) and > + * its priority for each category. > + */ > +static inline void > +resolve_priority_scalar(uint64_t transition, int n, > + const struct rte_acl_ctx *ctx, struct parms *parms, > + const struct rte_acl_match_results *p, uint32_t categories) > +{ > + uint32_t i; > + int32_t *saved_priority; > + uint32_t *saved_results; > + const int32_t *priority; > + const uint32_t *results; > + > + saved_results =3D parms[n].cmplt->results; > + saved_priority =3D parms[n].cmplt->priority; > + > + /* results and priorities for completed trie */ > + results =3D p[transition].results; > + priority =3D p[transition].priority; > + > + /* if this is not the first completed trie */ > + if (parms[n].cmplt->count !=3D ctx->num_tries) { > + for (i =3D 0; i < categories; i +=3D RTE_ACL_RESULTS_MULTIPLIER) { > + > + if (saved_priority[i] <=3D priority[i]) { > + saved_priority[i] =3D priority[i]; > + saved_results[i] =3D results[i]; > + } > + if (saved_priority[i + 1] <=3D priority[i + 1]) { > + saved_priority[i + 1] =3D priority[i + 1]; > + saved_results[i + 1] =3D results[i + 1]; > + } > + if (saved_priority[i + 2] <=3D priority[i + 2]) { > + saved_priority[i + 2] =3D priority[i + 2]; > + saved_results[i + 2] =3D results[i + 2]; > + } > + if (saved_priority[i + 3] <=3D priority[i + 3]) { > + saved_priority[i + 3] =3D priority[i + 3]; > + saved_results[i + 3] =3D results[i + 3]; > + } > + } > + } else { > + for (i =3D 0; i < categories; i +=3D RTE_ACL_RESULTS_MULTIPLIER) { > + saved_priority[i] =3D priority[i]; > + saved_priority[i + 1] =3D priority[i + 1]; > + saved_priority[i + 2] =3D priority[i + 2]; > + saved_priority[i + 3] =3D priority[i + 3]; > + > + saved_results[i] =3D results[i]; > + saved_results[i + 1] =3D results[i + 1]; > + saved_results[i + 2] =3D results[i + 2]; > + saved_results[i + 3] =3D results[i + 3]; > + } > + } > +} > + > +/* > + * When processing the transition, rather than using if/else > + * construct, the offset is calculated for DFA and QRANGE and > + * then conditionally added to the address based on node type. > + * This is done to avoid branch mis-predictions. Since the > + * offset is rather simple calculation it is more efficient > + * to do the calculation and do a condition move rather than > + * a conditional branch to determine which calculation to do. > + */ > +static inline uint32_t > +scan_forward(uint32_t input, uint32_t max) > +{ > + return (input =3D=3D 0) ? max : rte_bsf32(input); > +} > + > +static inline uint64_t > +scalar_transition(const uint64_t *trans_table, uint64_t transition, > + uint8_t input) > +{ > + uint32_t addr, index, ranges, x, a, b, c; > + > + /* break transition into component parts */ > + ranges =3D transition >> (sizeof(index) * CHAR_BIT); > + > + /* calc address for a QRANGE node */ > + c =3D input * SCALAR_QRANGE_MULT; > + a =3D ranges | SCALAR_QRANGE_MIN; > + index =3D transition & ~RTE_ACL_NODE_INDEX; > + a -=3D (c & SCALAR_QRANGE_MASK); > + b =3D c & SCALAR_QRANGE_MIN; > + addr =3D transition ^ index; > + a &=3D SCALAR_QRANGE_MIN; > + a ^=3D (ranges ^ b) & (a ^ b); > + x =3D scan_forward(a, 32) >> 3; > + addr +=3D (index =3D=3D RTE_ACL_NODE_DFA) ? input : x; > + > + /* pickup next transition */ > + transition =3D *(trans_table + addr); > + return transition; > +} > + > +int > +rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **d= ata, > + uint32_t *results, uint32_t num, uint32_t categories) > +{ > + int n; > + uint64_t transition0, transition1; > + uint32_t input0, input1; > + struct acl_flow_data flows; > + uint64_t index_array[MAX_SEARCHES_SCALAR]; > + struct completion cmplt[MAX_SEARCHES_SCALAR]; > + struct parms parms[MAX_SEARCHES_SCALAR]; > + > + if (categories !=3D 1 && > + ((RTE_ACL_RESULTS_MULTIPLIER - 1) & categories) !=3D 0) > + return -EINVAL; > + > + acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, num, > + categories, ctx->trans_table); > + > + for (n =3D 0; n < MAX_SEARCHES_SCALAR; n++) { > + cmplt[n].count =3D 0; > + index_array[n] =3D acl_start_next_trie(&flows, parms, n, ctx); > + } > + > + transition0 =3D index_array[0]; > + transition1 =3D index_array[1]; > + > + while (flows.started > 0) { > + > + input0 =3D GET_NEXT_4BYTES(parms, 0); > + input1 =3D GET_NEXT_4BYTES(parms, 1); > + > + for (n =3D 0; n < 4; n++) { > + if (likely((transition0 & RTE_ACL_NODE_MATCH) =3D=3D 0)) > + transition0 =3D scalar_transition(flows.trans, > + transition0, (uint8_t)input0); > + > + input0 >>=3D CHAR_BIT; > + > + if (likely((transition1 & RTE_ACL_NODE_MATCH) =3D=3D 0)) > + transition1 =3D scalar_transition(flows.trans, > + transition1, (uint8_t)input1); > + > + input1 >>=3D CHAR_BIT; > + > + } > + if ((transition0 | transition1) & RTE_ACL_NODE_MATCH) { > + transition0 =3D acl_match_check(transition0, > + 0, ctx, parms, &flows, resolve_priority_scalar); > + transition1 =3D acl_match_check(transition1, > + 1, ctx, parms, &flows, resolve_priority_scalar); > + > + } > + } > + return 0; > +} > diff --git a/lib/librte_acl/acl_run_sse.c b/lib/librte_acl/acl_run_sse.c > new file mode 100644 > index 0000000..3f5c721 > --- /dev/null > +++ b/lib/librte_acl/acl_run_sse.c > @@ -0,0 +1,627 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyrig= ht > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS F= OR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGH= T > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTA= L, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF US= E, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON A= NY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE U= SE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE= . > + */ > + > +#include "acl_run.h" > +#include "acl_match_check.h" > + > +enum { > + SHUFFLE32_SLOT1 =3D 0xe5, > + SHUFFLE32_SLOT2 =3D 0xe6, > + SHUFFLE32_SLOT3 =3D 0xe7, > + SHUFFLE32_SWAP64 =3D 0x4e, > +}; > + > +static const rte_xmm_t mm_type_quad_range =3D { > + .u32 =3D { > + RTE_ACL_NODE_QRANGE, > + RTE_ACL_NODE_QRANGE, > + RTE_ACL_NODE_QRANGE, > + RTE_ACL_NODE_QRANGE, > + }, > +}; > + > +static const rte_xmm_t mm_type_quad_range64 =3D { > + .u32 =3D { > + RTE_ACL_NODE_QRANGE, > + RTE_ACL_NODE_QRANGE, > + 0, > + 0, > + }, > +}; > + > +static const rte_xmm_t mm_shuffle_input =3D { > + .u32 =3D {0x00000000, 0x04040404, 0x08080808, 0x0c0c0c0c}, > +}; > + > +static const rte_xmm_t mm_shuffle_input64 =3D { > + .u32 =3D {0x00000000, 0x04040404, 0x80808080, 0x80808080}, > +}; > + > +static const rte_xmm_t mm_ones_16 =3D { > + .u16 =3D {1, 1, 1, 1, 1, 1, 1, 1}, > +}; > + > +static const rte_xmm_t mm_bytes =3D { > + .u32 =3D {UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX}, > +}; > + > +static const rte_xmm_t mm_bytes64 =3D { > + .u32 =3D {UINT8_MAX, UINT8_MAX, 0, 0}, > +}; > + > +static const rte_xmm_t mm_match_mask =3D { > + .u32 =3D { > + RTE_ACL_NODE_MATCH, > + RTE_ACL_NODE_MATCH, > + RTE_ACL_NODE_MATCH, > + RTE_ACL_NODE_MATCH, > + }, > +}; > + > +static const rte_xmm_t mm_match_mask64 =3D { > + .u32 =3D { > + RTE_ACL_NODE_MATCH, > + 0, > + RTE_ACL_NODE_MATCH, > + 0, > + }, > +}; > + > +static const rte_xmm_t mm_index_mask =3D { > + .u32 =3D { > + RTE_ACL_NODE_INDEX, > + RTE_ACL_NODE_INDEX, > + RTE_ACL_NODE_INDEX, > + RTE_ACL_NODE_INDEX, > + }, > +}; > + > +static const rte_xmm_t mm_index_mask64 =3D { > + .u32 =3D { > + RTE_ACL_NODE_INDEX, > + RTE_ACL_NODE_INDEX, > + 0, > + 0, > + }, > +}; > + > + > +/* > + * Resolve priority for multiple results (sse version). > + * This consists comparing the priority of the current traversal with th= e > + * running set of results for the packet. > + * For each result, keep a running array of the result (rule number) and > + * its priority for each category. > + */ > +static inline void > +resolve_priority_sse(uint64_t transition, int n, const struct rte_acl_ct= x *ctx, > + struct parms *parms, const struct rte_acl_match_results *p, > + uint32_t categories) > +{ > + uint32_t x; > + xmm_t results, priority, results1, priority1, selector; > + xmm_t *saved_results, *saved_priority; > + > + for (x =3D 0; x < categories; x +=3D RTE_ACL_RESULTS_MULTIPLIER) { > + > + saved_results =3D (xmm_t *)(&parms[n].cmplt->results[x]); > + saved_priority =3D > + (xmm_t *)(&parms[n].cmplt->priority[x]); > + > + /* get results and priorities for completed trie */ > + results =3D MM_LOADU((const xmm_t *)&p[transition].results[x]); > + priority =3D MM_LOADU((const xmm_t *)&p[transition].priority[x]); > + > + /* if this is not the first completed trie */ > + if (parms[n].cmplt->count !=3D ctx->num_tries) { > + > + /* get running best results and their priorities */ > + results1 =3D MM_LOADU(saved_results); > + priority1 =3D MM_LOADU(saved_priority); > + > + /* select results that are highest priority */ > + selector =3D MM_CMPGT32(priority1, priority); > + results =3D MM_BLENDV8(results, results1, selector); > + priority =3D MM_BLENDV8(priority, priority1, selector); > + } > + > + /* save running best results and their priorities */ > + MM_STOREU(saved_results, results); > + MM_STOREU(saved_priority, priority); > + } > +} > + > +/* > + * Extract transitions from an XMM register and check for any matches > + */ > +static void > +acl_process_matches(xmm_t *indicies, int slot, const struct rte_acl_ctx = *ctx, > + struct parms *parms, struct acl_flow_data *flows) > +{ > + uint64_t transition1, transition2; > + > + /* extract transition from low 64 bits. */ > + transition1 =3D MM_CVT64(*indicies); > + > + /* extract transition from high 64 bits. */ > + *indicies =3D MM_SHUFFLE32(*indicies, SHUFFLE32_SWAP64); > + transition2 =3D MM_CVT64(*indicies); > + > + transition1 =3D acl_match_check(transition1, slot, ctx, > + parms, flows, resolve_priority_sse); > + transition2 =3D acl_match_check(transition2, slot + 1, ctx, > + parms, flows, resolve_priority_sse); > + > + /* update indicies with new transitions. */ > + *indicies =3D MM_SET64(transition2, transition1); > +} > + > +/* > + * Check for a match in 2 transitions (contained in SSE register) > + */ > +static inline void > +acl_match_check_x2(int slot, const struct rte_acl_ctx *ctx, struct parms= *parms, > + struct acl_flow_data *flows, xmm_t *indicies, xmm_t match_mask) > +{ > + xmm_t temp; > + > + temp =3D MM_AND(match_mask, *indicies); > + while (!MM_TESTZ(temp, temp)) { > + acl_process_matches(indicies, slot, ctx, parms, flows); > + temp =3D MM_AND(match_mask, *indicies); > + } > +} > + > +/* > + * Check for any match in 4 transitions (contained in 2 SSE registers) > + */ > +static inline void > +acl_match_check_x4(int slot, const struct rte_acl_ctx *ctx, struct parms= *parms, > + struct acl_flow_data *flows, xmm_t *indicies1, xmm_t *indicies2, > + xmm_t match_mask) > +{ > + xmm_t temp; > + > + /* put low 32 bits of each transition into one register */ > + temp =3D (xmm_t)MM_SHUFFLEPS((__m128)*indicies1, (__m128)*indicies2, > + 0x88); > + /* test for match node */ > + temp =3D MM_AND(match_mask, temp); > + > + while (!MM_TESTZ(temp, temp)) { > + acl_process_matches(indicies1, slot, ctx, parms, flows); > + acl_process_matches(indicies2, slot + 2, ctx, parms, flows); > + > + temp =3D (xmm_t)MM_SHUFFLEPS((__m128)*indicies1, > + (__m128)*indicies2, > + 0x88); > + temp =3D MM_AND(match_mask, temp); > + } > +} > + > +/* > + * Calculate the address of the next transition for > + * all types of nodes. Note that only DFA nodes and range > + * nodes actually transition to another node. Match > + * nodes don't move. > + */ > +static inline xmm_t > +acl_calc_addr(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input, > + xmm_t ones_16, xmm_t bytes, xmm_t type_quad_range, > + xmm_t *indicies1, xmm_t *indicies2) > +{ > + xmm_t addr, node_types, temp; > + > + /* > + * Note that no transition is done for a match > + * node and therefore a stream freezes when > + * it reaches a match. > + */ > + > + /* Shuffle low 32 into temp and high 32 into indicies2 */ > + temp =3D (xmm_t)MM_SHUFFLEPS((__m128)*indicies1, (__m128)*indicies2, > + 0x88); > + *indicies2 =3D (xmm_t)MM_SHUFFLEPS((__m128)*indicies1, > + (__m128)*indicies2, 0xdd); > + > + /* Calc node type and node addr */ > + node_types =3D MM_ANDNOT(index_mask, temp); > + addr =3D MM_AND(index_mask, temp); > + > + /* > + * Calc addr for DFAs - addr =3D dfa_index + input_byte > + */ > + > + /* mask for DFA type (0) nodes */ > + temp =3D MM_CMPEQ32(node_types, MM_XOR(node_types, node_types)); > + > + /* add input byte to DFA position */ > + temp =3D MM_AND(temp, bytes); > + temp =3D MM_AND(temp, next_input); > + addr =3D MM_ADD32(addr, temp); > + > + /* > + * Calc addr for Range nodes -> range_index + range(input) > + */ > + node_types =3D MM_CMPEQ32(node_types, type_quad_range); > + > + /* > + * Calculate number of range boundaries that are less than the > + * input value. Range boundaries for each node are in signed 8 bit, > + * ordered from -128 to 127 in the indicies2 register. > + * This is effectively a popcnt of bytes that are greater than the > + * input byte. > + */ > + > + /* shuffle input byte to all 4 positions of 32 bit value */ > + temp =3D MM_SHUFFLE8(next_input, shuffle_input); > + > + /* check ranges */ > + temp =3D MM_CMPGT8(temp, *indicies2); > + > + /* convert -1 to 1 (bytes greater than input byte */ > + temp =3D MM_SIGN8(temp, temp); > + > + /* horizontal add pairs of bytes into words */ > + temp =3D MM_MADD8(temp, temp); > + > + /* horizontal add pairs of words into dwords */ > + temp =3D MM_MADD16(temp, ones_16); > + > + /* mask to range type nodes */ > + temp =3D MM_AND(temp, node_types); > + > + /* add index into node position */ > + return MM_ADD32(addr, temp); > +} > + > +/* > + * Process 4 transitions (in 2 SIMD registers) in parallel > + */ > +static inline xmm_t > +transition4(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input, > + xmm_t ones_16, xmm_t bytes, xmm_t type_quad_range, > + const uint64_t *trans, xmm_t *indicies1, xmm_t *indicies2) > +{ > + xmm_t addr; > + uint64_t trans0, trans2; > + > + /* Calculate the address (array index) for all 4 transitions. */ > + > + addr =3D acl_calc_addr(index_mask, next_input, shuffle_input, ones_16, > + bytes, type_quad_range, indicies1, indicies2); > + > + /* Gather 64 bit transitions and pack back into 2 registers. */ > + > + trans0 =3D trans[MM_CVT32(addr)]; > + > + /* get slot 2 */ > + > + /* {x0, x1, x2, x3} -> {x2, x1, x2, x3} */ > + addr =3D MM_SHUFFLE32(addr, SHUFFLE32_SLOT2); > + trans2 =3D trans[MM_CVT32(addr)]; > + > + /* get slot 1 */ > + > + /* {x2, x1, x2, x3} -> {x1, x1, x2, x3} */ > + addr =3D MM_SHUFFLE32(addr, SHUFFLE32_SLOT1); > + *indicies1 =3D MM_SET64(trans[MM_CVT32(addr)], trans0); > + > + /* get slot 3 */ > + > + /* {x1, x1, x2, x3} -> {x3, x1, x2, x3} */ > + addr =3D MM_SHUFFLE32(addr, SHUFFLE32_SLOT3); > + *indicies2 =3D MM_SET64(trans[MM_CVT32(addr)], trans2); > + > + return MM_SRL32(next_input, 8); > +} > + > +/* > + * Execute trie traversal with 8 traversals in parallel > + */ > +static inline int > +search_sse_8(const struct rte_acl_ctx *ctx, const uint8_t **data, > + uint32_t *results, uint32_t total_packets, uint32_t categories) > +{ > + int n; > + struct acl_flow_data flows; > + uint64_t index_array[MAX_SEARCHES_SSE8]; > + struct completion cmplt[MAX_SEARCHES_SSE8]; > + struct parms parms[MAX_SEARCHES_SSE8]; > + xmm_t input0, input1; > + xmm_t indicies1, indicies2, indicies3, indicies4; > + > + acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > + total_packets, categories, ctx->trans_table); > + > + for (n =3D 0; n < MAX_SEARCHES_SSE8; n++) { > + cmplt[n].count =3D 0; > + index_array[n] =3D acl_start_next_trie(&flows, parms, n, ctx); > + } > + > + /* > + * indicies1 contains index_array[0,1] > + * indicies2 contains index_array[2,3] > + * indicies3 contains index_array[4,5] > + * indicies4 contains index_array[6,7] > + */ > + > + indicies1 =3D MM_LOADU((xmm_t *) &index_array[0]); > + indicies2 =3D MM_LOADU((xmm_t *) &index_array[2]); > + > + indicies3 =3D MM_LOADU((xmm_t *) &index_array[4]); > + indicies4 =3D MM_LOADU((xmm_t *) &index_array[6]); > + > + /* Check for any matches. */ > + acl_match_check_x4(0, ctx, parms, &flows, > + &indicies1, &indicies2, mm_match_mask.m); > + acl_match_check_x4(4, ctx, parms, &flows, > + &indicies3, &indicies4, mm_match_mask.m); > + > + while (flows.started > 0) { > + > + /* Gather 4 bytes of input data for each stream. */ > + input0 =3D MM_INSERT32(mm_ones_16.m, GET_NEXT_4BYTES(parms, 0), > + 0); > + input1 =3D MM_INSERT32(mm_ones_16.m, GET_NEXT_4BYTES(parms, 4), > + 0); > + > + input0 =3D MM_INSERT32(input0, GET_NEXT_4BYTES(parms, 1), 1); > + input1 =3D MM_INSERT32(input1, GET_NEXT_4BYTES(parms, 5), 1); > + > + input0 =3D MM_INSERT32(input0, GET_NEXT_4BYTES(parms, 2), 2); > + input1 =3D MM_INSERT32(input1, GET_NEXT_4BYTES(parms, 6), 2); > + > + input0 =3D MM_INSERT32(input0, GET_NEXT_4BYTES(parms, 3), 3); > + input1 =3D MM_INSERT32(input1, GET_NEXT_4BYTES(parms, 7), 3); > + > + /* Process the 4 bytes of input on each stream. */ > + > + input0 =3D transition4(mm_index_mask.m, input0, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies1, &indicies2); > + > + input1 =3D transition4(mm_index_mask.m, input1, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies3, &indicies4); > + > + input0 =3D transition4(mm_index_mask.m, input0, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies1, &indicies2); > + > + input1 =3D transition4(mm_index_mask.m, input1, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies3, &indicies4); > + > + input0 =3D transition4(mm_index_mask.m, input0, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies1, &indicies2); > + > + input1 =3D transition4(mm_index_mask.m, input1, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies3, &indicies4); > + > + input0 =3D transition4(mm_index_mask.m, input0, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies1, &indicies2); > + > + input1 =3D transition4(mm_index_mask.m, input1, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies3, &indicies4); > + > + /* Check for any matches. */ > + acl_match_check_x4(0, ctx, parms, &flows, > + &indicies1, &indicies2, mm_match_mask.m); > + acl_match_check_x4(4, ctx, parms, &flows, > + &indicies3, &indicies4, mm_match_mask.m); > + } > + > + return 0; > +} > + > +/* > + * Execute trie traversal with 4 traversals in parallel > + */ > +static inline int > +search_sse_4(const struct rte_acl_ctx *ctx, const uint8_t **data, > + uint32_t *results, int total_packets, uint32_t categories) > +{ > + int n; > + struct acl_flow_data flows; > + uint64_t index_array[MAX_SEARCHES_SSE4]; > + struct completion cmplt[MAX_SEARCHES_SSE4]; > + struct parms parms[MAX_SEARCHES_SSE4]; > + xmm_t input, indicies1, indicies2; > + > + acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > + total_packets, categories, ctx->trans_table); > + > + for (n =3D 0; n < MAX_SEARCHES_SSE4; n++) { > + cmplt[n].count =3D 0; > + index_array[n] =3D acl_start_next_trie(&flows, parms, n, ctx); > + } > + > + indicies1 =3D MM_LOADU((xmm_t *) &index_array[0]); > + indicies2 =3D MM_LOADU((xmm_t *) &index_array[2]); > + > + /* Check for any matches. */ > + acl_match_check_x4(0, ctx, parms, &flows, > + &indicies1, &indicies2, mm_match_mask.m); > + > + while (flows.started > 0) { > + > + /* Gather 4 bytes of input data for each stream. */ > + input =3D MM_INSERT32(mm_ones_16.m, GET_NEXT_4BYTES(parms, 0), 0); > + input =3D MM_INSERT32(input, GET_NEXT_4BYTES(parms, 1), 1); > + input =3D MM_INSERT32(input, GET_NEXT_4BYTES(parms, 2), 2); > + input =3D MM_INSERT32(input, GET_NEXT_4BYTES(parms, 3), 3); > + > + /* Process the 4 bytes of input on each stream. */ > + input =3D transition4(mm_index_mask.m, input, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies1, &indicies2); > + > + input =3D transition4(mm_index_mask.m, input, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies1, &indicies2); > + > + input =3D transition4(mm_index_mask.m, input, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies1, &indicies2); > + > + input =3D transition4(mm_index_mask.m, input, > + mm_shuffle_input.m, mm_ones_16.m, > + mm_bytes.m, mm_type_quad_range.m, > + flows.trans, &indicies1, &indicies2); > + > + /* Check for any matches. */ > + acl_match_check_x4(0, ctx, parms, &flows, > + &indicies1, &indicies2, mm_match_mask.m); > + } > + > + return 0; > +} > + > +static inline xmm_t > +transition2(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input, > + xmm_t ones_16, xmm_t bytes, xmm_t type_quad_range, > + const uint64_t *trans, xmm_t *indicies1) > +{ > + uint64_t t; > + xmm_t addr, indicies2; > + > + indicies2 =3D MM_XOR(ones_16, ones_16); > + > + addr =3D acl_calc_addr(index_mask, next_input, shuffle_input, ones_16, > + bytes, type_quad_range, indicies1, &indicies2); > + > + /* Gather 64 bit transitions and pack 2 per register. */ > + > + t =3D trans[MM_CVT32(addr)]; > + > + /* get slot 1 */ > + addr =3D MM_SHUFFLE32(addr, SHUFFLE32_SLOT1); > + *indicies1 =3D MM_SET64(trans[MM_CVT32(addr)], t); > + > + return MM_SRL32(next_input, 8); > +} > + > +/* > + * Execute trie traversal with 2 traversals in parallel. > + */ > +static inline int > +search_sse_2(const struct rte_acl_ctx *ctx, const uint8_t **data, > + uint32_t *results, uint32_t total_packets, uint32_t categories) > +{ > + int n; > + struct acl_flow_data flows; > + uint64_t index_array[MAX_SEARCHES_SSE2]; > + struct completion cmplt[MAX_SEARCHES_SSE2]; > + struct parms parms[MAX_SEARCHES_SSE2]; > + xmm_t input, indicies; > + > + acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > + total_packets, categories, ctx->trans_table); > + > + for (n =3D 0; n < MAX_SEARCHES_SSE2; n++) { > + cmplt[n].count =3D 0; > + index_array[n] =3D acl_start_next_trie(&flows, parms, n, ctx); > + } > + > + indicies =3D MM_LOADU((xmm_t *) &index_array[0]); > + > + /* Check for any matches. */ > + acl_match_check_x2(0, ctx, parms, &flows, &indicies, mm_match_mask64.m)= ; > + > + while (flows.started > 0) { > + > + /* Gather 4 bytes of input data for each stream. */ > + input =3D MM_INSERT32(mm_ones_16.m, GET_NEXT_4BYTES(parms, 0), 0); > + input =3D MM_INSERT32(input, GET_NEXT_4BYTES(parms, 1), 1); > + > + /* Process the 4 bytes of input on each stream. */ > + > + input =3D transition2(mm_index_mask64.m, input, > + mm_shuffle_input64.m, mm_ones_16.m, > + mm_bytes64.m, mm_type_quad_range64.m, > + flows.trans, &indicies); > + > + input =3D transition2(mm_index_mask64.m, input, > + mm_shuffle_input64.m, mm_ones_16.m, > + mm_bytes64.m, mm_type_quad_range64.m, > + flows.trans, &indicies); > + > + input =3D transition2(mm_index_mask64.m, input, > + mm_shuffle_input64.m, mm_ones_16.m, > + mm_bytes64.m, mm_type_quad_range64.m, > + flows.trans, &indicies); > + > + input =3D transition2(mm_index_mask64.m, input, > + mm_shuffle_input64.m, mm_ones_16.m, > + mm_bytes64.m, mm_type_quad_range64.m, > + flows.trans, &indicies); > + > + /* Check for any matches. */ > + acl_match_check_x2(0, ctx, parms, &flows, &indicies, > + mm_match_mask64.m); > + } > + > + return 0; > +} > + > +int > +rte_acl_classify_sse(const struct rte_acl_ctx *ctx, const uint8_t **data= , > + uint32_t *results, uint32_t num, uint32_t categories) > +{ > + if (categories !=3D 1 && > + ((RTE_ACL_RESULTS_MULTIPLIER - 1) & categories) !=3D 0) > + return -EINVAL; > + > + if (likely(num >=3D MAX_SEARCHES_SSE8)) > + return search_sse_8(ctx, data, results, num, categories); > + else if (num >=3D MAX_SEARCHES_SSE4) > + return search_sse_4(ctx, data, results, num, categories); > + else > + return search_sse_2(ctx, data, results, num, categories); > +} > diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c > index 7c288bd..b9173c1 100644 > --- a/lib/librte_acl/rte_acl.c > +++ b/lib/librte_acl/rte_acl.c > @@ -38,6 +38,52 @@ >=20 > TAILQ_HEAD(rte_acl_list, rte_tailq_entry); >=20 > +typedef int (*rte_acl_classify_t) > +(const struct rte_acl_ctx *, const uint8_t **, uint32_t *, uint32_t, uin= t32_t); > + > +extern int > +rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **d= ata, > + uint32_t *results, uint32_t num, uint32_t categories); > + > +/* by default, use always avaialbe scalar code path. */ > +rte_acl_classify_t rte_acl_default_classify =3D rte_acl_classify_scalar; Why not 'static'? I thought you'd like to hide it from external world. > + > +void rte_acl_select_classify(enum acl_classify_alg alg) > +{ > + > + switch(alg) > + { > + case ACL_CLASSIFY_DEFAULT: > + case ACL_CLASSIFY_SCALAR: > + rte_acl_default_classify =3D rte_acl_classify_scalar; > + break; > + case ACL_CLASSIFY_SSE: > + rte_acl_default_classify =3D rte_acl_classify_sse; > + break; > + } > + > +} As this is init phase function, I suppose we can add check that alg has a v= alid(supported) value, and return some error as return value, if not. =20 > + > +static void __attribute__((constructor)) > +rte_acl_init(void) > +{ > + enum acl_classify_alg alg =3D ACL_CLASSIFY_DEFAULT; > + > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1)) > + alg =3D ACL_CLASSIFY_SSE; > + > + rte_acl_select_classify(alg); > +} > + > +inline int rte_acl_classify(const struct rte_acl_ctx *ctx, > + const uint8_t **data, > + uint32_t *results, uint32_t num, > + uint32_t categories) > +{ > + return rte_acl_default_classify(ctx, data, results, num, categories); > +} > + > + > struct rte_acl_ctx * > rte_acl_find_existing(const char *name) > { > diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h > index afc0f69..650b306 100644 > --- a/lib/librte_acl/rte_acl.h > +++ b/lib/librte_acl/rte_acl.h > @@ -267,6 +267,9 @@ rte_acl_reset(struct rte_acl_ctx *ctx); > * RTE_ACL_RESULTS_MULTIPLIER and can't be bigger than RTE_ACL_MAX_CATEG= ORIES. > * If more than one rule is applicable for given input buffer and > * given category, then rule with highest priority will be returned as a= match. > + * Note, that this function could be run only on CPUs with SSE4.1 suppor= t. > + * It is up to the caller to make sure that this function is only invoke= d on > + * a machine that supports SSE4.1 ISA. > * Note, that it is a caller responsibility to ensure that input paramet= ers > * are valid and point to correct memory locations. > * > @@ -286,9 +289,10 @@ rte_acl_reset(struct rte_acl_ctx *ctx); > * @return > * zero on successful completion. > * -EINVAL for incorrect arguments. > + * -ENOTSUP for unsupported platforms. Please remove the line above: current implementation doesn't return ENOTSUP (I think that was left from v1). > */ > int > -rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **data, > +rte_acl_classify_sse(const struct rte_acl_ctx *ctx, const uint8_t **data= , > uint32_t *results, uint32_t num, uint32_t categories); >=20 > /** > @@ -323,9 +327,23 @@ rte_acl_classify(const struct rte_acl_ctx *ctx, cons= t uint8_t **data, > * zero on successful completion. > * -EINVAL for incorrect arguments. > */ > -int > -rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **d= ata, > - uint32_t *results, uint32_t num, uint32_t categories); As I said above we'd better keep it. =20 > + > +enum acl_classify_alg { > + ACL_CLASSIFY_DEFAULT =3D 0, > + ACL_CLASSIFY_SCALAR =3D 1, > + ACL_CLASSIFY_SSE =3D 2, > +}; As a nit: as this emum is part of public API, I think it is better to add r= te_ prefix: enum rte_acl_classify_alg > + > +extern inline int rte_acl_classify(const struct rte_acl_ctx *ctx, > + const uint8_t **data, > + uint32_t *results, uint32_t num, > + uint32_t categories); Again as a nit: here and everywhere can we keep same style through the whol= e DPDK - function name from the new line: extern nt rte_acl_classify(...); > +/** > + * Analyze ISA of the current CPU and points rte_acl_default_classify > + * to the highest applicable version of classify function. > + */ > +extern void > +rte_acl_select_classify(enum acl_classify_alg alg); >=20 > /** > * Dump an ACL context structure to the console. > -- > 1.9.3