Re: [dpdk-dev] [PATCHv3] librte_acl make it build/work for 'default' target

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Neil Horman <nhorman@tuxdriver.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCHv3] librte_acl make it build/work for 'default' target
Date: Tue, 26 Aug 2014 13:44:43 -0400	[thread overview]
Message-ID: <20140826174443.GB797@hmsreliant.think-freely.org> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772582135D369@IRSMSX105.ger.corp.intel.com>

On Mon, Aug 25, 2014 at 04:30:05PM +0000, Ananyev, Konstantin wrote:
> Hi Neil,
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Thursday, August 21, 2014 9:15 PM
> > To: dev@dpdk.org
> > Cc: Ananyev, Konstantin; thomas.monjalon@6wind.com; Neil Horman
> > Subject: [PATCHv3] librte_acl make it build/work for 'default' target
> > 
> > Make ACL library to build/work on 'default' architecture:
> > - make rte_acl_classify_scalar really scalar
> >  (make sure it wouldn't use sse4 instrincts through resolve_priority()).
> > - Provide two versions of rte_acl_classify code path:
> >   rte_acl_classify_sse() - could be build and used only on systems with sse4.2
> >   and upper, return -ENOTSUP on lower arch.
> >   rte_acl_classify_scalar() - a slower version, but could be build and used
> >   on all systems.
> > - keep common code shared between these two codepaths.
> > 
> > v2 chages:
> >  run-time selection of most appropriate code-path for given ISA.
> >  By default the highest supprted one is selected.
> >  User can still override that selection by manually assigning new value to
> >  the global function pointer rte_acl_default_classify.
> >  rte_acl_classify() becomes a macro calling whatever rte_acl_default_classify
> >  points to.
> > 
> 
> I see you decided not to wait for me and fix everything by yourself :)
> 
Yeah, sorry, I'm getting pinged about enabling these features in Fedora, and it
had been about 2 weeks, so I figured I'd just take care of it.

> > V3 Changes
> >  Updated classify pointer to be a function so as to better preserve ABI
> 
> As I said in my previous mail it generates extra jump...
> Though from numbers I got the performance impact is negligible: < 1%.
> So I suppose, I don't have a good enough reason to object :)
> 
Yeah, I just don't see a way around it.  I was hoping that the compiler would
have been smart enough to see that the rte_acl_classify function was small and
in-linable, but apparently it won't do that.  As you note however the
performance change is minor (I'm guessing within a standard deviation of your
results).

> Though I still think we better keep  rte_acl_classify_scalar() publically available (same as we do for rte acl_classify_sse()):
> First of all keep  rte_acl_classify_scalar() is already part of our public API.
> Also, as I remember, one of the customers explicitly asked for scalar version and they planned to call it directly.
> Plus using rte_acl_select_classify() to always switch between implementations is not always handy:

I'm not exactly opposed to this, though it seems odd to me that a user might
want to call a particular version of the classifier directly.  But I certainly
can't predict everything a consumer wants to do.  If we really need to keep it
public then, it begs the question, is providing a generic entry point even
worthwhile?  Is it just as easy to expose the scalar/sse and any future versions
directly so the application can just embody the intellegence to select the best
path?  That saves us having to maintain another API point.  I can go with
consensus on that.

> -  it is global, which means that we can't simultaneously use classify_scalar() and classify_sse() for 2 different ACL contexts.  
> - to properly support such switching we then will need to support something like (see app/test/test_acl.c below):
>   old_alg = rte_acl_get_classify();
>   rte_acl_select_classify(new_alg);
>   ...
>   rte_acl_select_classify(old_alg); 
>   
We could attach the classification method to the acl context, so each
rte_acl_ctx can point to whatever classifier funtion it wants to.  That would
remove the global issues you point out above.  Or alternatively we can just not
provide a generic entry point and let each user select a specific function.


> >  REmoved macro definitions for match check functions to make them static inline
> 
> More comments inlined below.
>snip> 
> > 
> >  	/* make a quick check for scalar */
> > -	ret = rte_acl_classify_scalar(acx, data, results,
> > +	rte_acl_select_classify(ACL_CLASSIFY_SCALAR);
> > +	ret = rte_acl_classify(acx, data, results,
> >  			RTE_DIM(acl_test_data), RTE_ACL_MAX_CATEGORIES);
> 
> 
> As I said above, that doesn't seem correct: we set rte_acl_default_classify = rte_acl_classify_scalar and never restore it back to the original value.
> To support it properly, we need to:
> old_alg = rte_acl_get_classify();
>  rte_acl_select_classify(new_alg);
>  ...
>  rte_acl_select_classify(old_alg);
> 
So, for the purposes of this test application, I don't see that as being needed.
Every call to rte_acl_classify is preceded by a setting of the classifier
function, so you're safe.  If you're concerned about other processes using the
dpdk library at the same time, you're still safe, as despite being a global
variable, data pages in a DSO are Copy on Write, so each process gets their own
copy of the global variable.

Multiple threads within the same process are problematic, I agree, and thats
solvable with the per-acl-context mechanism that I described above, though that
shouldn't be needed here as this seems to be a single threaded program.

> Make all this just to keep UT valid seems like a big hassle to me.
> So I said above - probably better just leave it to call rte_acl_classify_scalar() directly.
> 
That works for me too, though the per-context mechanism seems kind of nice to
me.  Let me know what you prefer.

><snip>
> > 
> > diff --git a/lib/librte_acl/acl_match_check.h b/lib/librte_acl/acl_match_check.h
> > new file mode 100644
> > index 0000000..4dc1982
> > --- /dev/null
> > +++ b/lib/librte_acl/acl_match_check.h
> 
> As a nit: we probably don't need a special header just for one function and can place it inside acl_run.h.
> 
Agreed, I can move that to acl_run.h.

><snip>
> > + */
> > +static inline uint64_t
> > +acl_match_check(uint64_t transition, int slot,
> > +	const struct rte_acl_ctx *ctx, struct parms *parms,
> > +	struct acl_flow_data *flows, void (*resolve_priority)(
> > +	uint64_t transition, int n, const struct rte_acl_ctx *ctx,
> > +	struct parms *parms, const struct rte_acl_match_results *p,
> > +	uint32_t categories))
> 
> Ugh, that's really hard to read.
> Can we create a typedef for resolve_priority function type:
> typedef void (*resolve_priority_t)(uint64_t, int,
>         const struct rte_acl_ctx *ctx, struct parms *,
>         const struct rte_acl_match_results *, uint32_t);
> And use it here?
> 
Sure, I'm fine with doing that.

><snip>
> > +
> > +/* by default, use always avaialbe scalar code path. */
> > +rte_acl_classify_t rte_acl_default_classify = rte_acl_classify_scalar;
> 
> Why not 'static'?
> I thought you'd like to hide it  from external world.
> 
Doh!  I didn't do the one thing that I really meant to do.  I removed it from
the header file but I forgot to declare the variable static.  I'll fix that.

> > +
> > +void rte_acl_select_classify(enum acl_classify_alg alg)
> > +{
> > +
> > +	switch(alg)
> > +	{
> > +		case ACL_CLASSIFY_DEFAULT:
> > +		case ACL_CLASSIFY_SCALAR:
> > +			rte_acl_default_classify = rte_acl_classify_scalar;
> > +			break;
> > +		case ACL_CLASSIFY_SSE:
> > +			rte_acl_default_classify = rte_acl_classify_sse;
> > +			break;
> > +	}
> > +
> > +}
> 
> As this is init phase function, I suppose we can add check that alg has a valid(supported) value, and return some error as return value, if not.  
> 
Not sure I follow what you're saying above, are you suggesting that we add a
rte_cpu_get_flag_enabled check to rte_acl_select_classify above?

><snip>
> >   *
> > @@ -286,9 +289,10 @@ rte_acl_reset(struct rte_acl_ctx *ctx);
> >   * @return
> >   *   zero on successful completion.
> >   *   -EINVAL for incorrect arguments.
> > + *   -ENOTSUP for unsupported platforms.
> 
> Please remove the line above: current implementation doesn't return ENOTSUP
> (I think that was left from v1).
> 
Yup, probably was.  I'll remove it.

> >   */
> >  int
> > -rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **data,
> > +rte_acl_classify_sse(const struct rte_acl_ctx *ctx, const uint8_t **data,
> >  	uint32_t *results, uint32_t num, uint32_t categories);
> > 
> >  /**
> > @@ -323,9 +327,23 @@ rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **data,
> >   *   zero on successful completion.
> >   *   -EINVAL for incorrect arguments.
> >   */
> > -int
> > -rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **data,
> > -	uint32_t *results, uint32_t num, uint32_t categories);
> 
> 
> As I said above we'd better keep it.  
> 
Ok, can do.

> > +
> > +enum acl_classify_alg {
> > +	ACL_CLASSIFY_DEFAULT = 0,
> > +	ACL_CLASSIFY_SCALAR = 1,
> > +	ACL_CLASSIFY_SSE = 2,
> > +};
> 
> As a nit: as this emum is part of public API, I think it is better to add rte_ prefix: enum rte_acl_classify_alg
> 
Sure, done.

> > +
> > +extern inline int rte_acl_classify(const struct rte_acl_ctx *ctx,
> > +				   const uint8_t **data,
> > +				   uint32_t *results, uint32_t num,
> > +				   uint32_t categories);
> 
> Again as a nit: here and everywhere can we keep same style through the whole DPDK - function name from the new line:
> extern nt
> rte_acl_classify(...);
> 
Ok

I'll produce another version based on your feedback regarding the
per-context-calssifier method vs. just removing the generic classifier.

Regards
Neil

next prev parent reply	other threads:[~2014-08-26 17:40 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-07 18:31 [dpdk-dev] [PATCHv2] " Konstantin Ananyev
2014-08-07 20:11 ` Neil Horman
2014-08-07 20:58   ` Vincent JARDIN
2014-08-07 21:28     ` Chris Wright
2014-08-08  2:07       ` Neil Horman
2014-08-08 11:49   ` Ananyev, Konstantin
2014-08-08 12:25     ` Neil Horman
2014-08-08 13:09       ` Ananyev, Konstantin
2014-08-08 14:30         ` Neil Horman
2014-08-11 22:23           ` Thomas Monjalon
2014-08-21 20:15 ` [dpdk-dev] [PATCHv3] " Neil Horman
2014-08-25 16:30   ` Ananyev, Konstantin
2014-08-26 17:44     ` Neil Horman [this message]
2014-08-27 11:25       ` Ananyev, Konstantin
2014-08-27 18:56         ` Neil Horman
2014-08-27 19:18           ` Ananyev, Konstantin
2014-08-28  9:02             ` Richardson, Bruce
2014-08-28 15:55             ` Neil Horman
2014-08-28 20:38 ` [dpdk-dev] [PATCHv4] " Neil Horman
2014-08-29 17:58   ` Ananyev, Konstantin
2014-09-01 11:05     ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140826174443.GB797@hmsreliant.think-freely.org \
    --to=nhorman@tuxdriver.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).