From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id 9F002B3BA for ; Tue, 26 Aug 2014 19:40:52 +0200 (CEST) Received: from [2001:470:8:a08:34a0:ff46:3145:ad9f] (helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1XMKnU-0003JH-Jo; Tue, 26 Aug 2014 13:44:50 -0400 Date: Tue, 26 Aug 2014 13:44:43 -0400 From: Neil Horman To: "Ananyev, Konstantin" Message-ID: <20140826174443.GB797@hmsreliant.think-freely.org> References: <1407436263-9360-1-git-send-email-konstantin.ananyev@intel.com> <1408652100-29217-1-git-send-email-nhorman@tuxdriver.com> <2601191342CEEE43887BDE71AB9772582135D369@IRSMSX105.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB9772582135D369@IRSMSX105.ger.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Status: No Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCHv3] librte_acl make it build/work for 'default' target X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Aug 2014 17:40:53 -0000 On Mon, Aug 25, 2014 at 04:30:05PM +0000, Ananyev, Konstantin wrote: > Hi Neil, > > > -----Original Message----- > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > Sent: Thursday, August 21, 2014 9:15 PM > > To: dev@dpdk.org > > Cc: Ananyev, Konstantin; thomas.monjalon@6wind.com; Neil Horman > > Subject: [PATCHv3] librte_acl make it build/work for 'default' target > > > > Make ACL library to build/work on 'default' architecture: > > - make rte_acl_classify_scalar really scalar > > (make sure it wouldn't use sse4 instrincts through resolve_priority()). > > - Provide two versions of rte_acl_classify code path: > > rte_acl_classify_sse() - could be build and used only on systems with sse4.2 > > and upper, return -ENOTSUP on lower arch. > > rte_acl_classify_scalar() - a slower version, but could be build and used > > on all systems. > > - keep common code shared between these two codepaths. > > > > v2 chages: > > run-time selection of most appropriate code-path for given ISA. > > By default the highest supprted one is selected. > > User can still override that selection by manually assigning new value to > > the global function pointer rte_acl_default_classify. > > rte_acl_classify() becomes a macro calling whatever rte_acl_default_classify > > points to. > > > > I see you decided not to wait for me and fix everything by yourself :) > Yeah, sorry, I'm getting pinged about enabling these features in Fedora, and it had been about 2 weeks, so I figured I'd just take care of it. > > V3 Changes > > Updated classify pointer to be a function so as to better preserve ABI > > As I said in my previous mail it generates extra jump... > Though from numbers I got the performance impact is negligible: < 1%. > So I suppose, I don't have a good enough reason to object :) > Yeah, I just don't see a way around it. I was hoping that the compiler would have been smart enough to see that the rte_acl_classify function was small and in-linable, but apparently it won't do that. As you note however the performance change is minor (I'm guessing within a standard deviation of your results). > Though I still think we better keep rte_acl_classify_scalar() publically available (same as we do for rte acl_classify_sse()): > First of all keep rte_acl_classify_scalar() is already part of our public API. > Also, as I remember, one of the customers explicitly asked for scalar version and they planned to call it directly. > Plus using rte_acl_select_classify() to always switch between implementations is not always handy: I'm not exactly opposed to this, though it seems odd to me that a user might want to call a particular version of the classifier directly. But I certainly can't predict everything a consumer wants to do. If we really need to keep it public then, it begs the question, is providing a generic entry point even worthwhile? Is it just as easy to expose the scalar/sse and any future versions directly so the application can just embody the intellegence to select the best path? That saves us having to maintain another API point. I can go with consensus on that. > - it is global, which means that we can't simultaneously use classify_scalar() and classify_sse() for 2 different ACL contexts. > - to properly support such switching we then will need to support something like (see app/test/test_acl.c below): > old_alg = rte_acl_get_classify(); > rte_acl_select_classify(new_alg); > ... > rte_acl_select_classify(old_alg); > We could attach the classification method to the acl context, so each rte_acl_ctx can point to whatever classifier funtion it wants to. That would remove the global issues you point out above. Or alternatively we can just not provide a generic entry point and let each user select a specific function. > > REmoved macro definitions for match check functions to make them static inline > > More comments inlined below. >snip> > > > > /* make a quick check for scalar */ > > - ret = rte_acl_classify_scalar(acx, data, results, > > + rte_acl_select_classify(ACL_CLASSIFY_SCALAR); > > + ret = rte_acl_classify(acx, data, results, > > RTE_DIM(acl_test_data), RTE_ACL_MAX_CATEGORIES); > > > As I said above, that doesn't seem correct: we set rte_acl_default_classify = rte_acl_classify_scalar and never restore it back to the original value. > To support it properly, we need to: > old_alg = rte_acl_get_classify(); > rte_acl_select_classify(new_alg); > ... > rte_acl_select_classify(old_alg); > So, for the purposes of this test application, I don't see that as being needed. Every call to rte_acl_classify is preceded by a setting of the classifier function, so you're safe. If you're concerned about other processes using the dpdk library at the same time, you're still safe, as despite being a global variable, data pages in a DSO are Copy on Write, so each process gets their own copy of the global variable. Multiple threads within the same process are problematic, I agree, and thats solvable with the per-acl-context mechanism that I described above, though that shouldn't be needed here as this seems to be a single threaded program. > Make all this just to keep UT valid seems like a big hassle to me. > So I said above - probably better just leave it to call rte_acl_classify_scalar() directly. > That works for me too, though the per-context mechanism seems kind of nice to me. Let me know what you prefer. > > > > > diff --git a/lib/librte_acl/acl_match_check.h b/lib/librte_acl/acl_match_check.h > > new file mode 100644 > > index 0000000..4dc1982 > > --- /dev/null > > +++ b/lib/librte_acl/acl_match_check.h > > As a nit: we probably don't need a special header just for one function and can place it inside acl_run.h. > Agreed, I can move that to acl_run.h. > > > + */ > > +static inline uint64_t > > +acl_match_check(uint64_t transition, int slot, > > + const struct rte_acl_ctx *ctx, struct parms *parms, > > + struct acl_flow_data *flows, void (*resolve_priority)( > > + uint64_t transition, int n, const struct rte_acl_ctx *ctx, > > + struct parms *parms, const struct rte_acl_match_results *p, > > + uint32_t categories)) > > Ugh, that's really hard to read. > Can we create a typedef for resolve_priority function type: > typedef void (*resolve_priority_t)(uint64_t, int, > const struct rte_acl_ctx *ctx, struct parms *, > const struct rte_acl_match_results *, uint32_t); > And use it here? > Sure, I'm fine with doing that. > > > + > > +/* by default, use always avaialbe scalar code path. */ > > +rte_acl_classify_t rte_acl_default_classify = rte_acl_classify_scalar; > > Why not 'static'? > I thought you'd like to hide it from external world. > Doh! I didn't do the one thing that I really meant to do. I removed it from the header file but I forgot to declare the variable static. I'll fix that. > > + > > +void rte_acl_select_classify(enum acl_classify_alg alg) > > +{ > > + > > + switch(alg) > > + { > > + case ACL_CLASSIFY_DEFAULT: > > + case ACL_CLASSIFY_SCALAR: > > + rte_acl_default_classify = rte_acl_classify_scalar; > > + break; > > + case ACL_CLASSIFY_SSE: > > + rte_acl_default_classify = rte_acl_classify_sse; > > + break; > > + } > > + > > +} > > As this is init phase function, I suppose we can add check that alg has a valid(supported) value, and return some error as return value, if not. > Not sure I follow what you're saying above, are you suggesting that we add a rte_cpu_get_flag_enabled check to rte_acl_select_classify above? > > > * > > @@ -286,9 +289,10 @@ rte_acl_reset(struct rte_acl_ctx *ctx); > > * @return > > * zero on successful completion. > > * -EINVAL for incorrect arguments. > > + * -ENOTSUP for unsupported platforms. > > Please remove the line above: current implementation doesn't return ENOTSUP > (I think that was left from v1). > Yup, probably was. I'll remove it. > > */ > > int > > -rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **data, > > +rte_acl_classify_sse(const struct rte_acl_ctx *ctx, const uint8_t **data, > > uint32_t *results, uint32_t num, uint32_t categories); > > > > /** > > @@ -323,9 +327,23 @@ rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **data, > > * zero on successful completion. > > * -EINVAL for incorrect arguments. > > */ > > -int > > -rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **data, > > - uint32_t *results, uint32_t num, uint32_t categories); > > > As I said above we'd better keep it. > Ok, can do. > > + > > +enum acl_classify_alg { > > + ACL_CLASSIFY_DEFAULT = 0, > > + ACL_CLASSIFY_SCALAR = 1, > > + ACL_CLASSIFY_SSE = 2, > > +}; > > As a nit: as this emum is part of public API, I think it is better to add rte_ prefix: enum rte_acl_classify_alg > Sure, done. > > + > > +extern inline int rte_acl_classify(const struct rte_acl_ctx *ctx, > > + const uint8_t **data, > > + uint32_t *results, uint32_t num, > > + uint32_t categories); > > Again as a nit: here and everywhere can we keep same style through the whole DPDK - function name from the new line: > extern nt > rte_acl_classify(...); > Ok I'll produce another version based on your feedback regarding the per-context-calssifier method vs. just removing the generic classifier. Regards Neil