From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 9D30FB389 for ; Wed, 27 Aug 2014 21:14:44 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP; 27 Aug 2014 12:18:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,413,1406617200"; d="scan'208";a="590758111" Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159]) by fmsmga002.fm.intel.com with ESMTP; 27 Aug 2014 12:18:45 -0700 Received: from irsmsx155.ger.corp.intel.com (163.33.192.3) by IRSMSX104.ger.corp.intel.com (163.33.3.159) with Microsoft SMTP Server (TLS) id 14.3.195.1; Wed, 27 Aug 2014 20:18:44 +0100 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.158]) by IRSMSX155.ger.corp.intel.com ([169.254.14.85]) with mapi id 14.03.0195.001; Wed, 27 Aug 2014 20:18:44 +0100 From: "Ananyev, Konstantin" To: Neil Horman Thread-Topic: [PATCHv3] librte_acl make it build/work for 'default' target Thread-Index: AQHPvXyxpI9Dyrn+z0KyIlaoUWuXzZvhe20wgAGjrICAASwXIIAAemkAgAASnKA= Date: Wed, 27 Aug 2014 19:18:44 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772582135DD43@IRSMSX105.ger.corp.intel.com> References: <1407436263-9360-1-git-send-email-konstantin.ananyev@intel.com> <1408652100-29217-1-git-send-email-nhorman@tuxdriver.com> <2601191342CEEE43887BDE71AB9772582135D369@IRSMSX105.ger.corp.intel.com> <20140826174443.GB797@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB9772582135DC13@IRSMSX105.ger.corp.intel.com> <20140827185653.GA31916@localhost.localdomain> In-Reply-To: <20140827185653.GA31916@localhost.localdomain> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCHv3] librte_acl make it build/work for 'default' target X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Aug 2014 19:14:45 -0000 > -----Original Message----- > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Wednesday, August 27, 2014 7:57 PM > To: Ananyev, Konstantin > Cc: dev@dpdk.org; thomas.monjalon@6wind.com > Subject: Re: [PATCHv3] librte_acl make it build/work for 'default' target >=20 > On Wed, Aug 27, 2014 at 11:25:04AM +0000, Ananyev, Konstantin wrote: > > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > > Sent: Tuesday, August 26, 2014 6:45 PM > > > To: Ananyev, Konstantin > > > Cc: dev@dpdk.org; thomas.monjalon@6wind.com > > > Subject: Re: [PATCHv3] librte_acl make it build/work for 'default' ta= rget > > > > > > On Mon, Aug 25, 2014 at 04:30:05PM +0000, Ananyev, Konstantin wrote: > > > > Hi Neil, > > > > > > > > > -----Original Message----- > > > > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > > > > Sent: Thursday, August 21, 2014 9:15 PM > > > > > To: dev@dpdk.org > > > > > Cc: Ananyev, Konstantin; thomas.monjalon@6wind.com; Neil Horman > > > > > Subject: [PATCHv3] librte_acl make it build/work for 'default' ta= rget > > > > > > > > > > Make ACL library to build/work on 'default' architecture: > > > > > - make rte_acl_classify_scalar really scalar > > > > > (make sure it wouldn't use sse4 instrincts through resolve_prior= ity()). > > > > > - Provide two versions of rte_acl_classify code path: > > > > > rte_acl_classify_sse() - could be build and used only on system= s with sse4.2 > > > > > and upper, return -ENOTSUP on lower arch. > > > > > rte_acl_classify_scalar() - a slower version, but could be buil= d and used > > > > > on all systems. > > > > > - keep common code shared between these two codepaths. > > > > > > > > > > v2 chages: > > > > > run-time selection of most appropriate code-path for given ISA. > > > > > By default the highest supprted one is selected. > > > > > User can still override that selection by manually assigning new= value to > > > > > the global function pointer rte_acl_default_classify. > > > > > rte_acl_classify() becomes a macro calling whatever rte_acl_defa= ult_classify > > > > > points to. > > > > > > > > > > > > > I see you decided not to wait for me and fix everything by yourself= :) > > > > > > > Yeah, sorry, I'm getting pinged about enabling these features in Fedo= ra, and it > > > had been about 2 weeks, so I figured I'd just take care of it. > > > > No worries. I admit that it was a long delay from my side. > > > > > > > > > > V3 Changes > > > > > Updated classify pointer to be a function so as to better preser= ve ABI > > > > > > > > As I said in my previous mail it generates extra jump... > > > > Though from numbers I got the performance impact is negligible: < 1= %. > > > > So I suppose, I don't have a good enough reason to object :) > > > > > > > Yeah, I just don't see a way around it. I was hoping that the compil= er would > > > have been smart enough to see that the rte_acl_classify function was = small and > > > in-linable, but apparently it won't do that. As you note however the > > > performance change is minor (I'm guessing within a standard deviation= of your > > > results). > > > > > > > Though I still think we better keep rte_acl_classify_scalar() publ= ically available (same as we do for rte acl_classify_sse()): > > > > First of all keep rte_acl_classify_scalar() is already part of our= public API. > > > > Also, as I remember, one of the customers explicitly asked for scal= ar version and they planned to call it directly. > > > > Plus using rte_acl_select_classify() to always switch between imple= mentations is not always handy: > > > > > > I'm not exactly opposed to this, though it seems odd to me that a use= r might > > > want to call a particular version of the classifier directly. But I = certainly > > > can't predict everything a consumer wants to do. If we really need t= o keep it > > > public then, it begs the question, is providing a generic entry point= even > > > worthwhile? Is it just as easy to expose the scalar/sse and any futu= re versions > > > directly so the application can just embody the intellegence to selec= t the best > > > path? That saves us having to maintain another API point. I can go = with > > > consensus on that. > > > > > > > - it is global, which means that we can't simultaneously use class= ify_scalar() and classify_sse() for 2 different ACL contexts. > > > > - to properly support such switching we then will need to support s= omething like (see app/test/test_acl.c below): > > > > old_alg =3D rte_acl_get_classify(); > > > > rte_acl_select_classify(new_alg); > > > > ... > > > > rte_acl_select_classify(old_alg); > > > > > > > We could attach the classification method to the acl context, so each > > > rte_acl_ctx can point to whatever classifier funtion it wants to. Th= at would > > > remove the global issues you point out above. > > > > I thought about that approach too. > > But there is one implication with DPDK MP model: > > Same ACL context can be shared by different DPDK processes, > > while acl_classify() could be loaded to the different addresses. > > Of course we can overcome it by creating a global table of function poi= nters indexed by calssify_alg and > > store inside ACL ctx alg instead of actual function pointer. > > But that means extra overhead of at least two loads per classify() call= . > > > Hmm, how is the context shared around between processes? Is it just shar= ed as a > common cow data page resulting from a fork? If so, then we should be goo= d > because the DSO text will be at the same address (i.e. the pointer will s= till be > valid). If you do some sort of message passing, then, yes, thats a probl= em. >=20 No, it is not parent-child relationship. There could be a group of independently spawned processes. One of them should be 'primary' (starts first), other 'secondary's'. All hugepage memory pages mapped by the primary process, supposed to be map= ped to the same VAs by each secondary. =20 So all stuff that is allocated from hugepage memory is shared between all p= rocesses in the group. More detailed description: http://dpdk.org/doc/intel/dpdk-prog-guide-1.7.= 0.pdf, section 23. >=20 > > > Or alternatively we can just not > > > provide a generic entry point and let each user select a specific fun= ction. > > > > I wonder can we have sort of mixed approach: > > 1. provide a generic entry point that would be set to the best (from ou= r knowledge) available classify function. > > 2. Let each user use a specific function if he wants too. > > > > i.e: > > - keep classify_scalar/classify_sse/classify_... public. > > - keep your current implementation of rte_acl_classify() > > BTW in that way, we probably can make acl_select_classify() static. > > > Agreed, depending on your answer above, this might be the best solution. >=20 > > So most users would just use generic entry point and wouldn't need to w= rite their own code wrappers around it. > > For users who need to use a particular classify() version - they can c= all it directly. > > > It does seem reasonable. Let me know what the ctx sharing mechanism is f= rom > above, and we can settle this. >=20 > > > > > > > > > > > REmoved macro definitions for match check functions to make them= static inline > > > > > > > > More comments inlined below. > > > >snip> > > > > > > > > > > /* make a quick check for scalar */ > > > > > - ret =3D rte_acl_classify_scalar(acx, data, results, > > > > > + rte_acl_select_classify(ACL_CLASSIFY_SCALAR); > > > > > + ret =3D rte_acl_classify(acx, data, results, > > > > > RTE_DIM(acl_test_data), RTE_ACL_MAX_CATEGORIES); > > > > > > > > > > > > As I said above, that doesn't seem correct: we set rte_acl_default_= classify =3D rte_acl_classify_scalar and never restore it back to > the > > > original value. > > > > To support it properly, we need to: > > > > old_alg =3D rte_acl_get_classify(); > > > > rte_acl_select_classify(new_alg); > > > > ... > > > > rte_acl_select_classify(old_alg); > > > > > > > So, for the purposes of this test application, I don't see that as be= ing needed. > > > Every call to rte_acl_classify is preceded by a setting of the classi= fier > > > function, so you're safe. > > > > Not every, that's a problem. > > As I can see, in test/test_acl.c you replaced > > rte_acl_classify_scalar(); > > with > > rte_acl_select_classify(SCALAR); > > rte_acl_classify(); > > > > And never restore previous value of rte_acl_default_classify. > > Right now rte_acl_default_classify is global, so after first: > > rte_acl_select_classify(SCALAR); > > all subsequent rte_acl_classify() will actually use scalar version. > > > Hmm, ok, I'll take a closer look at it. >=20 > > > If you're concerned about other processes using the > > > dpdk library at the same time, you're still safe, as despite being a = global > > > variable, data pages in a DSO are Copy on Write, so each process gets= their own > > > copy of the global variable. > > > > No, my concern here is only about app/test here. > > > > > > > > Multiple threads within the same process are problematic, I agree, an= d thats > > > solvable with the per-acl-context mechanism that I described above, t= hough that > > > shouldn't be needed here as this seems to be a single threaded progra= m. > > > > > > > Make all this just to keep UT valid seems like a big hassle to me. > > > > So I said above - probably better just leave it to call rte_acl_cla= ssify_scalar() directly. > > > > > > > That works for me too, though the per-context mechanism seems kind of= nice to > > > me. Let me know what you prefer. > > > > > > > > > > > > > > > > > diff --git a/lib/librte_acl/acl_match_check.h b/lib/librte_acl/ac= l_match_check.h > > > > > new file mode 100644 > > > > > index 0000000..4dc1982 > > > > > --- /dev/null > > > > > +++ b/lib/librte_acl/acl_match_check.h > > > > > > > > As a nit: we probably don't need a special header just for one func= tion and can place it inside acl_run.h. > > > > > > > Agreed, I can move that to acl_run.h. > > > > > > > > > > > > + */ > > > > > +static inline uint64_t > > > > > +acl_match_check(uint64_t transition, int slot, > > > > > + const struct rte_acl_ctx *ctx, struct parms *parms, > > > > > + struct acl_flow_data *flows, void (*resolve_priority)( > > > > > + uint64_t transition, int n, const struct rte_acl_ctx *ctx, > > > > > + struct parms *parms, const struct rte_acl_match_results *p, > > > > > + uint32_t categories)) > > > > > > > > Ugh, that's really hard to read. > > > > Can we create a typedef for resolve_priority function type: > > > > typedef void (*resolve_priority_t)(uint64_t, int, > > > > const struct rte_acl_ctx *ctx, struct parms *, > > > > const struct rte_acl_match_results *, uint32_t); > > > > And use it here? > > > > > > > Sure, I'm fine with doing that. > > > > > > > > > > > > + > > > > > +/* by default, use always avaialbe scalar code path. */ > > > > > +rte_acl_classify_t rte_acl_default_classify =3D rte_acl_classify= _scalar; > > > > > > > > Why not 'static'? > > > > I thought you'd like to hide it from external world. > > > > > > > Doh! I didn't do the one thing that I really meant to do. I removed= it from > > > the header file but I forgot to declare the variable static. I'll fi= x that. > > > > > > > > + > > > > > +void rte_acl_select_classify(enum acl_classify_alg alg) > > > > > +{ > > > > > + > > > > > + switch(alg) > > > > > + { > > > > > + case ACL_CLASSIFY_DEFAULT: > > > > > + case ACL_CLASSIFY_SCALAR: > > > > > + rte_acl_default_classify =3D rte_acl_classify_scalar; > > > > > + break; > > > > > + case ACL_CLASSIFY_SSE: > > > > > + rte_acl_default_classify =3D rte_acl_classify_sse; > > > > > + break; > > > > > + } > > > > > + > > > > > +} > > > > > > > > As this is init phase function, I suppose we can add check that alg= has a valid(supported) value, and return some error as return > > > value, if not. > > > > > > > Not sure I follow what you're saying above, are you suggesting that w= e add a > > > rte_cpu_get_flag_enabled check to rte_acl_select_classify above? > > > > > > > > > > > > * > > > > > @@ -286,9 +289,10 @@ rte_acl_reset(struct rte_acl_ctx *ctx); > > > > > * @return > > > > > * zero on successful completion. > > > > > * -EINVAL for incorrect arguments. > > > > > + * -ENOTSUP for unsupported platforms. > > > > > > > > Please remove the line above: current implementation doesn't return= ENOTSUP > > > > (I think that was left from v1). > > > > > > > Yup, probably was. I'll remove it. > > > > > > > > */ > > > > > int > > > > > -rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **= data, > > > > > +rte_acl_classify_sse(const struct rte_acl_ctx *ctx, const uint8_= t **data, > > > > > uint32_t *results, uint32_t num, uint32_t categories); > > > > > > > > > > /** > > > > > @@ -323,9 +327,23 @@ rte_acl_classify(const struct rte_acl_ctx *c= tx, const uint8_t **data, > > > > > * zero on successful completion. > > > > > * -EINVAL for incorrect arguments. > > > > > */ > > > > > -int > > > > > -rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uin= t8_t **data, > > > > > - uint32_t *results, uint32_t num, uint32_t categories); > > > > > > > > > > > > As I said above we'd better keep it. > > > > > > > Ok, can do. > > > > > > > > + > > > > > +enum acl_classify_alg { > > > > > + ACL_CLASSIFY_DEFAULT =3D 0, > > > > > + ACL_CLASSIFY_SCALAR =3D 1, > > > > > + ACL_CLASSIFY_SSE =3D 2, > > > > > +}; > > > > > > > > As a nit: as this emum is part of public API, I think it is better = to add rte_ prefix: enum rte_acl_classify_alg > > > > > > > Sure, done. > > > > > > > > + > > > > > +extern inline int rte_acl_classify(const struct rte_acl_ctx *ctx= , > > > > > + const uint8_t **data, > > > > > + uint32_t *results, uint32_t num, > > > > > + uint32_t categories); > > > > > > > > Again as a nit: here and everywhere can we keep same style through = the whole DPDK - function name from the new line: > > > > extern nt > > > > rte_acl_classify(...); > > > > > > > Ok > > > > > > I'll produce another version based on your feedback regarding the > > > per-context-calssifier method vs. just removing the generic classifie= r. > > > > > > Regards > > > Neil > > > >