From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 15458B381 for ; Fri, 29 Aug 2014 19:55:09 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP; 29 Aug 2014 10:53:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,425,1406617200"; d="scan'208";a="595216469" Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153]) by orsmga002.jf.intel.com with ESMTP; 29 Aug 2014 10:58:59 -0700 Received: from irsmsx107.ger.corp.intel.com (163.33.3.99) by IRSMSX101.ger.corp.intel.com (163.33.3.153) with Microsoft SMTP Server (TLS) id 14.3.195.1; Fri, 29 Aug 2014 18:58:59 +0100 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.158]) by IRSMSX107.ger.corp.intel.com ([169.254.10.108]) with mapi id 14.03.0195.001; Fri, 29 Aug 2014 18:58:59 +0100 From: "Ananyev, Konstantin" To: Neil Horman , "dev@dpdk.org" Thread-Topic: [PATCHv4] librte_acl make it build/work for 'default' target Thread-Index: AQHPwwAYqdFMmR/Rkkmn05mC0+rkgZvnpfvw Date: Fri, 29 Aug 2014 17:58:58 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772582135F7FF@IRSMSX105.ger.corp.intel.com> References: <1407436263-9360-1-git-send-email-konstantin.ananyev@intel.com> <1409258292-3238-1-git-send-email-nhorman@tuxdriver.com> In-Reply-To: <1409258292-3238-1-git-send-email-nhorman@tuxdriver.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCHv4] librte_acl make it build/work for 'default' target X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Aug 2014 17:55:11 -0000 > -----Original Message----- > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Thursday, August 28, 2014 9:38 PM > To: dev@dpdk.org > Cc: Neil Horman; Ananyev, Konstantin; thomas.monjalon@6wind.com > Subject: [PATCHv4] librte_acl make it build/work for 'default' target >=20 > Make ACL library to build/work on 'default' architecture: > - make rte_acl_classify_scalar really scalar > (make sure it wouldn't use sse4 instrincts through resolve_priority()). > - Provide two versions of rte_acl_classify code path: > rte_acl_classify_sse() - could be build and used only on systems with s= se4.2 > and upper, return -ENOTSUP on lower arch. > rte_acl_classify_scalar() - a slower version, but could be build and us= ed > on all systems. > - keep common code shared between these two codepaths. >=20 > v2 chages: > run-time selection of most appropriate code-path for given ISA. > By default the highest supprted one is selected. > User can still override that selection by manually assigning new value t= o > the global function pointer rte_acl_default_classify. > rte_acl_classify() becomes a macro calling whatever rte_acl_default_clas= sify > points to. >=20 > V3 Changes > Updated classify pointer to be a function so as to better preserve ABI > REmoved macro definitions for match check functions to make them static = inline >=20 > V4 Changes > Rewrote classification selection mechanim to use a function table, so th= at we > can just store the preferred alg in the rte_acl_ctx struct so that multip= rocess > access works. I understand that leaves us with an extra load instruction= , but I > think thats ok, because it also allows... >=20 > Addition of a new function rte_acl_classify_alg. This function lets you > specify an enum value to override the acl contexts default algorith when = doing a > classification. This allows an application to specify a classification > algorithm without needing to pulicize each method. I know there was conc= ern > over keeping those methods public, but we don't have a static ABI at the = moment, > so this seems to me a reasonable thing to do, as it gives us less of an A= BI > surface to worry about. Good way to overcome the problem. >>From what I am seeing it adds a tiny slowdown (as expected) ...=20 Though it provides a good flexibility and I don't have any better ideas. So I'd say let stick with that approach. Below are few technical comments. Thanks Konstantin >=20 > Fixed misc missed static declarations >=20 > Removed acl_match_check.h and moved match_check function to acl_run.h >=20 > typdeffed function pointer to match check. >=20 > Signed-off-by: Neil Horman > CC: konstantin.ananyev@intel.com > CC: thomas.monjalon@6wind.com > --- > app/test-acl/main.c | 13 +- > app/test/test_acl.c | 10 +- > lib/librte_acl/Makefile | 5 +- > lib/librte_acl/acl.h | 1 + > lib/librte_acl/acl_bld.c | 5 +- > lib/librte_acl/acl_run.c | 944 ----------------------------------= ------ > lib/librte_acl/acl_run.h | 271 ++++++++++++ > lib/librte_acl/acl_run_scalar.c | 197 +++++++++ > lib/librte_acl/acl_run_sse.c | 630 +++++++++++++++++++++++++++ > lib/librte_acl/rte_acl.c | 62 +++ > lib/librte_acl/rte_acl.h | 66 ++- > 11 files changed, 1208 insertions(+), 996 deletions(-) > delete mode 100644 lib/librte_acl/acl_run.c > create mode 100644 lib/librte_acl/acl_run.h > create mode 100644 lib/librte_acl/acl_run_scalar.c > create mode 100644 lib/librte_acl/acl_run_sse.c >=20 > diff --git a/app/test/test_acl.c b/app/test/test_acl.c > index 869f6d3..2169f59 100644 > --- a/app/test/test_acl.c > +++ b/app/test/test_acl.c > @@ -859,7 +859,7 @@ test_invalid_parameters(void) > } >=20 > /* cover invalid but positive categories in classify */ > - result =3D rte_acl_classify_scalar(acx, NULL, NULL, 0, 3); > + result =3D rte_acl_classify(acx, NULL, NULL, 0, 3); Typo, should be: rte_acl_classify_alg(acx, RTE_ACL_CLASSIFY_SCALAR, NULL, NULL, 0, 3);=20 > diff --git a/lib/librte_acl/acl.h b/lib/librte_acl/acl.h > index b9d63fd..9236b7b 100644 > --- a/lib/librte_acl/acl.h > +++ b/lib/librte_acl/acl.h > @@ -168,6 +168,7 @@ struct rte_acl_ctx { > void *mem; > size_t mem_sz; > struct rte_acl_config config; /* copy of build config. */ > + enum rte_acl_classify_alg alg; > }; Each rte_acl_build() will reset all fields of rte_acl_ctx starting from num= _categories and below. So we need to move alg somewhere above num_categories: --- a/lib/librte_acl/acl.h +++ b/lib/librte_acl/acl.h @@ -153,6 +153,7 @@ struct rte_acl_ctx { /** Name of the ACL context. */ int32_t socket_id; /** Socket ID to allocate memory from. */ + enum rte_acl_classify_alg alg; void *rules; uint32_t max_rules; uint32_t rule_sz; @@ -168,9 +169,11 @@ struct rte_acl_ctx { void *mem; size_t mem_sz; struct rte_acl_config config; /* copy of build config. */ - enum rte_acl_classify_alg alg; }; > diff --git a/lib/librte_acl/acl_run_scalar.c b/lib/librte_acl/acl_run_sca= lar.c > new file mode 100644 > index 0000000..4bf58c7 > --- /dev/null > +++ b/lib/librte_acl/acl_run_scalar.c > @@ -0,0 +1,197 @@ > + > +#include "acl_run.h" > + > +int > +rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **d= ata, > + uint32_t *results, uint32_t num, uint32_t categories); No need to put this declaration here. I think you can put both rte_acl_classify_sse(), rte_acl_classify_scalar() = into acl.h (it is internal lib header). And remove another declarations of these functions from rte_acl.c. > diff --git a/lib/librte_acl/acl_run_sse.c b/lib/librte_acl/acl_run_sse.c > new file mode 100644 > index 0000000..7ae63dd > --- /dev/null > +++ b/lib/librte_acl/acl_run_sse.c > +#include "acl_run.h" > + + +int +rte_acl_classify_sse(const struct rte_acl_ctx *ctx, const uint8_t **data, + uint32_t *results, uint32_t num, uint32_t categories); + Move to acl.h. > diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c > index 7c288bd..741bed4 100644 > --- a/lib/librte_acl/rte_acl.c > +++ b/lib/librte_acl/rte_acl.c > @@ -33,11 +33,72 @@ >=20 > #include > #include "acl.h" > +#include "acl_run.h" acl_run.h contains defintions for a lot of functions and should be included= only by acl_run_*.c.=20 I think it is better to move typedef int (*rte_acl_classify_t) into acl.h = and don't include acl_run.h here. >=20 > #define BIT_SIZEOF(x) (sizeof(x) * CHAR_BIT) >=20 > TAILQ_HEAD(rte_acl_list, rte_tailq_entry); >=20 > +extern int > +rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **d= ata, > + uint32_t *results, uint32_t num, uint32_t categories); > + > +extern int > +rte_acl_classify_sse(const struct rte_acl_ctx *ctx, const uint8_t **data= , > + uint32_t *results, uint32_t num, uint32_t categories); > + As above: I think it is safe to move these declarations into acl.h. > +static rte_acl_classify_t classify_fns[] =3D { > + [RTE_ACL_CLASSIFY_DEFAULT] =3D rte_acl_classify_scalar, > + [RTE_ACL_CLASSIFY_SCALAR] =3D rte_acl_classify_scalar, > + [RTE_ACL_CLASSIFY_SSE] =3D rte_acl_classify_sse, > +}; static const static rte_acl_classify_t classify_fns[] ? > + > + > +extern int > +rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **d= ata, > + uint32_t *results, uint32_t num, uint32_t categories); Duplicate. > + > +/* by default, use always avaialbe scalar code path. */ > +static enum rte_acl_classify_alg rte_acl_default_classify =3D RTE_ACL_CL= ASSIFY_SCALAR; Line is longer than 80 chars? > + > +void rte_acl_set_default_classify(enum rte_acl_classify_alg alg) > +{ > + rte_acl_default_classify =3D alg; > +} void rte_acls_set_default_classify(...) Though, I am not sure why do we need it to be public now. Users can setup ALG per context. > + > +void rte_acl_set_ctx_classify(struct rte_acl_ctx *ctx, enum rte_acl_clas= sify_alg alg) > +{ > + ctx->alg =3D alg; > +} Same as above: void rte_acl_set_ctx_classify(...) Plus probably add checking that alg is a valid argument: If ((uint32_t)alg < RTE_DIM(classify_fns)) {ctx->alg=3Dalg; return 0;} return -EINVAL;=20 > + > +static void __attribute__((constructor)) > +rte_acl_init(void) > +{ > + enum rte_acl_classify_alg alg =3D RTE_ACL_CLASSIFY_DEFAULT; > + > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1)) > + alg =3D RTE_ACL_CLASSIFY_SSE; > + > + rte_acl_set_default_classify(alg); > +} > + > +int rte_acl_classify(const struct rte_acl_ctx *ctx, > + const uint8_t **data, > + uint32_t *results, uint32_t num, > + uint32_t categories) > +{ > + return classify_fns[ctx->alg](ctx, data, results, num, categories); > +} > + > +int rte_acl_classify_alg(const struct rte_acl_ctx *ctx, > + enum rte_acl_classify_alg alg, > + const uint8_t **data, > + uint32_t *results, uint32_t num, > + uint32_t categories) > +{ > + return classify_fns[alg](ctx, data, results, num, categories); > +} Can you move alg argument to be the last one? That would prevent copying parameters between registers. Plus same thing with the function definition style.=20 > + > struct rte_acl_ctx * > rte_acl_find_existing(const char *name) > { > @@ -165,6 +226,7 @@ rte_acl_create(const struct rte_acl_param *param) > ctx->max_rules =3D param->max_rule_num; > ctx->rule_sz =3D param->rule_size; > ctx->socket_id =3D param->socket_id; > + ctx->alg =3D rte_acl_default_classify; > snprintf(ctx->name, sizeof(ctx->name), "%s", param->name); >=20 > te->data =3D (void *) ctx; > diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h > index afc0f69..c092a49 100644 > --- a/lib/librte_acl/rte_acl.h > +++ b/lib/librte_acl/rte_acl.h > @@ -259,39 +259,6 @@ void > rte_acl_reset(struct rte_acl_ctx *ctx); >=20 > /** > - * Search for a matching ACL rule for each input data buffer. > - * Each input data buffer can have up to *categories* matches. > - * That implies that results array should be big enough to hold > - * (categories * num) elements. > - * Also categories parameter should be either one or multiple of > - * RTE_ACL_RESULTS_MULTIPLIER and can't be bigger than RTE_ACL_MAX_CATEG= ORIES. > - * If more than one rule is applicable for given input buffer and > - * given category, then rule with highest priority will be returned as a= match. > - * Note, that it is a caller responsibility to ensure that input paramet= ers > - * are valid and point to correct memory locations. > - * > - * @param ctx > - * ACL context to search with. > - * @param data > - * Array of pointers to input data buffers to perform search. > - * Note that all fields in input data buffers supposed to be in networ= k > - * byte order (MSB). > - * @param results > - * Array of search results, *categories* results per each input data b= uffer. > - * @param num > - * Number of elements in the input data buffers array. > - * @param categories > - * Number of maximum possible matches for each input buffer, one possi= ble > - * match per category. > - * @return > - * zero on successful completion. > - * -EINVAL for incorrect arguments. > - */ > -int > -rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **data, > - uint32_t *results, uint32_t num, uint32_t categories); > - > -/** > * Perform scalar search for a matching ACL rule for each input data buf= fer. > * Note, that while the search itself will avoid explicit use of SSE/AVX > * intrinsics, code for comparing matching results/priorities sill might= use > @@ -323,9 +290,36 @@ rte_acl_classify(const struct rte_acl_ctx *ctx, cons= t uint8_t **data, > * zero on successful completion. > * -EINVAL for incorrect arguments. > */ > -int > -rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **d= ata, > - uint32_t *results, uint32_t num, uint32_t categories); > + > +enum rte_acl_classify_alg { > + RTE_ACL_CLASSIFY_DEFAULT =3D 0, > + RTE_ACL_CLASSIFY_SCALAR =3D 1, > + RTE_ACL_CLASSIFY_SSE =3D 2, > +}; > + I think you removed the wrong comment. All public API function declaration supposed to be preceded by formal doxyg= en style comment: Brief explanation, parameters and return value description, etc. Please restore the proper comment for it. BTW, two new functions above - they need a formal comments too. > +extern int > +rte_acl_classify(const struct rte_acl_ctx *ctx, > + const uint8_t **data, > + uint32_t *results, uint32_t num, > + uint32_t categories); > + > +extern int > +rte_acl_classify_alg(const struct rte_acl_ctx *ctx, > + enum rte_acl_classify_alg alg, > + const uint8_t **data, > + uint32_t *results, uint32_t num, > + uint32_t categories); > +/* > + * Set the default classify algorithm for newly allocated classify conte= xts > + */ > +extern void > +rte_acl_set_default_classify(enum rte_acl_classify_alg alg); > + > +/* > + * Override the default classifier function for a given ctx > + */ > +extern void > +rte_acl_set_ctx_classify(struct rte_acl_ctx *ctx, enum rte_acl_classify_= alg alg); >=20 > /** > * Dump an ACL context structure to the console. > -- > 1.9.3 Also, need to update examples/l3fwd-acl/ (remove rte_acl_classify_scalar() = calls). Something like: diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c index 9b2c21b..8cbf202 100644 --- a/examples/l3fwd-acl/main.c +++ b/examples/l3fwd-acl/main.c @@ -278,15 +278,6 @@ send_single_packet(struct rte_mbuf *m, uint8_t port); (in) =3D end + 1; \ } while (0) -#define CLASSIFY(context, data, res, num, cat) do { \ - if (scalar) \ - rte_acl_classify_scalar((context), (data), \ - (res), (num), (cat)); \ - else \ - rte_acl_classify((context), (data), \ - (res), (num), (cat)); \ -} while (0) - /* * ACL rules should have higher priorities than route ones to ensure ACL = rule * always be found when input packets have multi-matches in the database. @@ -1253,6 +1244,9 @@ app_acl_init(void) dump_acl_config(); + if (parm_config.scalar) + rte_acl_set_default_classify(RTE_ACL_CLASSIFY_SCALAR); + /* Load rules from the input file */ if (add_rules(parm_config.rule_ipv4_name, &route_base_ipv4, &route_num_ipv4, &acl_base_ipv4, &acl_num_ipv4, @@ -1436,10 +1430,8 @@ main_loop(__attribute__((unused)) void *dummy) int socketid; const uint64_t drain_tsc =3D (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S * BURST_TX_DRAIN_US; - int scalar =3D parm_config.scalar; prev_tsc =3D 0; - lcore_id =3D rte_lcore_id(); qconf =3D &lcore_conf[lcore_id]; socketid =3D rte_lcore_to_socket_id(lcore_id); @@ -1503,7 +1495,8 @@ main_loop(__attribute__((unused)) void *dummy) nb_rx); if (acl_search.num_ipv4) { - CLASSIFY(acl_config.acx_ipv4[socket= id], + rte_acl_classify( + acl_config.acx_ipv4[socketi= d], acl_search.data_ipv4, acl_search.res_ipv4, acl_search.num_ipv4, @@ -1515,7 +1508,8 @@ main_loop(__attribute__((unused)) void *dummy) } if (acl_search.num_ipv6) { - CLASSIFY(acl_config.acx_ipv6[socket= id], + rte_acl_classify( + acl_config.acx_ipv6[socketi= d], acl_search.data_ipv6, acl_search.res_ipv6, acl_search.num_ipv6,