From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E215CA2EFC for ; Mon, 14 Oct 2019 01:07:39 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 1A3B21C190; Mon, 14 Oct 2019 01:07:39 +0200 (CEST) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 6305F1C18F for ; Mon, 14 Oct 2019 01:07:36 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Oct 2019 16:07:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,293,1566889200"; d="scan'208";a="198135736" Received: from irsmsx109.ger.corp.intel.com ([163.33.3.23]) by orsmga003.jf.intel.com with ESMTP; 13 Oct 2019 16:07:32 -0700 Received: from irsmsx101.ger.corp.intel.com ([169.254.1.76]) by IRSMSX109.ger.corp.intel.com ([169.254.13.52]) with mapi id 14.03.0439.000; Mon, 14 Oct 2019 00:07:32 +0100 From: "Zhang, Roy Fan" To: Akhil Goyal , "Ananyev, Konstantin" , "'dev@dpdk.org'" , "De Lara Guarch, Pablo" , 'Thomas Monjalon' , "Doherty, Declan" CC: 'Anoob Joseph' Thread-Topic: [RFC PATCH 1/9] security: introduce CPU Crypto action type and API Thread-Index: AQHVYm4YTD6+9RxOHE+pmjY8VoDMaqcbQpEAgAAy7FCAAtgfAIAASlOAgAYb5YCAAa+pgIABrxSAgAZUwgCAAP4egIABrpuAgAuzNYCAAo53AIAE6DAAgAAWqQCAAaS2gIADDN8AgAZAoACAAsecgIAAavMAgAMfPYCAA9IdAA== Date: Sun, 13 Oct 2019 23:07:31 +0000 Message-ID: <9F7182E3F746AB4EA17801C148F3C6044C06504A@IRSMSX101.ger.corp.intel.com> References: <20190903154046.55992-1-roy.fan.zhang@intel.com> <20190903154046.55992-2-roy.fan.zhang@intel.com> <9F7182E3F746AB4EA17801C148F3C6043369D686@IRSMSX101.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580191926A17@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580191962CD5@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580191966116@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580191966C23@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258019196A767@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258019196D53D@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258019196F386@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258019197206C@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258019197446B@irsmsx105.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_NT x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYzg3ZDUyMGUtYzAwYi00ZDVjLTk2YjItYTVjMjEzMGFhM2UxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiXC9ycEE0SEVVbXI1UkZ1djlPRHlBUnljRkFCdkRZdzBIWk05XC9mTm1HRktwVnVmbjZFZzlRaUlTTnBWXC9XVE9ONiJ9 dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Akhil, Thanks for the review and comments!=20 Knowing you are extremely busy. Here is my point in brief: I think placing the CPU synchronous crypto in the rte_security make sense, = as 1. rte_security contains inline crypto and lookaside crypto action type alr= eady, adding cpu_crypto action type is reasonable. 2. rte_security contains the security features may not supported by all dev= ices, such as crypto, ipsec, and PDCP. cpu_crypto follow this category, aga= in crypto. 3. placing CPU synchronous crypto API in rte_security is natural - as inlin= e mode works synchronously, too. However cryptodev doesn't. 4. placing CPU synchronous crypto API in rte_security helps boosting SW cry= pto performance, I have already provided a simple perf test inside the unit= test in the patchset for the user to try out - just comparing its output a= gainst DPDK crypto perf app output. 5. placing CPU synchronous crypto API in cryptodev will never serve HW look= aside crypto PMDs, as making them to work synchronously have huge performan= ce penalty. However Cryptodev framework's existing design is providing APIs= that will work in all crypto PMDs (rte_cryptodev_enqueue_burst / dequeue_b= urst for example), this does not fit in cryptodev's principle. 6. placing CPU synchronous crypto API in cryptodev confuses the user, as:=20 - the session created for async mode may not work in sync mode - both enqueue/dequeue and cpu_crypto_process does the same crypto process= ing, but one PMD may support only one API (set), the other may support anot= her, and the third PMD supports both. We have to provide another API to let= the user query which one to support which. - two completely different code paths for async/sync mode. 7. You said in the end of the email - placing CPU synchronous crypto API in= to rte_security is not acceptable as it does not do any rte_security stuff = - crypto isn't? You may call this a quibble, but in my idea, in the patchse= t both PMDs' implementations did offload the work to the CPU's special circ= uit designed dedicated to accelerate the crypto processing. To me cryptodev is the one CPU synchronous crypto API should not go into, r= te_security is. Regards, Fan > -----Original Message----- > From: Akhil Goyal [mailto:akhil.goyal@nxp.com] > Sent: Friday, October 11, 2019 2:24 PM > To: Ananyev, Konstantin ; 'dev@dpdk.org' > ; De Lara Guarch, Pablo ; > 'Thomas Monjalon' ; Zhang, Roy Fan > ; Doherty, Declan > Cc: 'Anoob Joseph' > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type a= nd > API >=20 > Hi Konstantin, >=20 > > > > Hi Akhil, > > > ..[snip] >=20 > > > > > > > OK let us assume that you have a separate structure. But I > > > > > > > have a few > > > > queries: > > > > > > > 1. how can multiple drivers use a same session > > > > > > > > > > > > As a short answer: they can't. > > > > > > It is pretty much the same approach as with rte_security - > > > > > > each device > > needs > > > > to > > > > > > create/init its own session. > > > > > > So upper layer would need to maintain its own array (or so) for= such > case. > > > > > > Though the question is why would you like to have same session > > > > > > over > > > > multiple > > > > > > SW backed devices? > > > > > > As it would be anyway just a synchronous function call that > > > > > > will be > > executed > > > > on > > > > > > the same cpu. > > > > > > > > > > I may have single FAT tunnel which may be distributed over > > > > > multiple Cores, and each core is affined to a different SW device= . > > > > > > > > If it is pure SW, then we don't need multiple devices for such scen= ario. > > > > Device in that case is pure abstraction that we can skip. > > > > > > Yes agreed, but that liberty is given to the application whether it > > > need multiple devices with single queue or a single device with multi= ple > queues. > > > I think that independence should not be broken in this new API. > > > > > > > > > So a single session may be accessed by multiple devices. > > > > > > > > > > One more example would be depending on packet sizes, I may > > > > > switch > > between > > > > > HW/SW PMDs with the same session. > > > > > > > > Sure, but then we'll have multiple sessions. > > > > > > No, the session will be same and it will have multiple private data > > > for each of > > the PMD. > > > > > > > BTW, we have same thing now - these private session pointers are > > > > just > > stored > > > > inside the same rte_crypto_sym_session. > > > > And if user wants to support this model, he would also need to > > > > store pair for each HW device anyway. > > > > > > Yes agreed, but how is that thing happening in your new struct, you > > > cannot > > support that. > > > > User can store all these info in his own struct. > > That's exactly what we have right now. > > Let say ipsec-secgw has to store for each IPsec SA: > > pointer to crypto-session and/or pointer to security session plus (for > > lookaside-devices) cdev_id_qp that allows it to extract dev_id + > > queue_id information. > > As I understand that works for now, as each ipsec_sa uses only one > > dev+queue. Though if someone would like to use multiple devices/queues > > for the same SA - he would need to have an array of these > pairs. > > So even right now rte_cryptodev_sym_session is not self-consistent and > > requires extra information to be maintained by user. >=20 > Why are you increasing the complexity for the user application. > The new APIs and struct should be such that it need to do minimum changes > in the stack so that stack is portable on multiple vendors. > You should try to hide as much complexity in the driver or lib to give th= e user > simple APIs. >=20 > Having a same session for multiple devices was added by Intel only for so= me > use cases. > And we had split that session create API into 2. Now if those are not use= ful > shall we move back to the single API. I think @Doherty, Declan and @De La= ra > Guarch, Pablo can comment on this. >=20 > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Can somebody use the scheduler pmd for scheduling the > > > > > > > different > > type > > > > of > > > > > > payloads for the same session? > > > > > > > > > > > > In theory yes. > > > > > > Though for that scheduler pmd should have inside it's > > > > > > rte_crypto_cpu_sym_session an array of pointers to the > > > > > > underlying devices sessions. > > > > > > > > > > > > > > > > > > > > With your proposal the APIs would be very specific to your > > > > > > > use case > > only. > > > > > > > > > > > > Yes in some way. > > > > > > I consider that API specific for SW backed crypto PMDs. > > > > > > I can hardly see how any 'real HW' PMDs (lksd-none, > > > > > > lksd-proto) will > > benefit > > > > > > from it. > > > > > > Current crypto-op API is very much HW oriented. > > > > > > Which is ok, that's for it was intended for, but I think we > > > > > > also need one > > that > > > > > > would be designed > > > > > > for SW backed implementation in mind. > > > > > > > > > > We may re-use your API for HW PMDs as well which do not have > > requirement > > > > of > > > > > Crypto-op/mbuf etc. > > > > > The return type of your new process API may have a status which > > > > > say > > > > 'processed' > > > > > Or can be say 'enqueued'. So if it is 'enqueued', we may have a > > > > > new API for > > > > raw > > > > > Bufs dequeue as well. > > > > > > > > > > This requirement can be for any hardware PMDs like QAT as well. > > > > > > > > I don't think it is a good idea to extend this API for async (looka= side) > devices. > > > > You'll need to: > > > > - provide dev_id and queue_id for each process(enqueue) and > > > > dequeuer operation. > > > > - provide IOVA for all buffers passing to that function (data > > > > buffers, digest, > > IV, > > > > aad). > > > > - On dequeue provide some way to associate dequed data and digest > > > > buffers with > > > > crypto-session that was used (and probably with mbuf). > > > > So most likely we'll end up with another just version of our > > > > current crypto-op structure. > > > > If you'd like to get rid of mbufs dependency within current > > > > crypto-op API that understandable, but I don't think we should > > > > have same API for both sync (CPU) and async > > > > (lookaside) cases. > > > > It doesn't seem feasible at all and voids whole purpose of that pat= ch. > > > > > > At this moment we are not much concerned about the dequeue API and > > > about > > the > > > HW PMD support. It is just that the new API should be generic enough > > > to be > > used in > > > some future scenarios as well. I am just highlighting the possible > > > usecases > > which can > > > be there in future. > > > > Sorry, but I strongly disagree with such approach. > > We should stop adding/modifying API 'just in case' and because 'it > > might be useful for some future HW'. > > Inside DPDK we already do have too many dev level APIs without any > > implementations. > > That's quite bad practice and very dis-orienting for end-users. > > I think to justify API additions/changes we need at least one proper > > implementation for it, or at least some strong evidence that people > > are really committed to support it in nearest future. > > BTW, that what TB agreed on, nearly a year ago. > > > > This new API (if we'll go ahead with it of course) would stay > > experimental for some time anyway to make sure we don't miss anything > > needed (I think for about a year time- frame). > > So if you guys *really* want to extend it support _async_ devices too > > - I am open for modifications/additions here. > > Though personally I think such addition would over-complicate things > > and we'll end up with another reincarnation of current crypto-op. > > We actually discussed it internally, and decided to drop that idea beca= use > of that. > > Again, my opinion - for lookaside devices it might be better to try to > > optimize current crypto-op path (remove mbuf requirement, probably add > > ability to group by session on enqueue/dequeue, etc.). >=20 > I agree that the new API is experimental and can be modified later. So no > issues in that, but we can keep some things in mind while defining APIs. > These were some comments from my side, if those are impacting the current > scenario, you can drop those. We will take care of those later. >=20 > > > > > > > > What is the issue that you face in making a dev-op for this new API. > > > Do you see > > any > > > performance impact with that? > > > > There are two main things: > > 1. user would need to maintain and provide for each process() call > > dev_id+queue_id. > > That's means extra (and totally unnecessary for SW) overhead. >=20 > You are using a crypto device for performing the processing, you must use > dev_id to identify which SW device it is. This is how the DPDK Framework > works. > . >=20 > > 2. yes I would expect some perf overhead too - it would be extra call o= r > branch. > > Again as it would be data-dependency - most likely cpu wouldn't be > > able to pipeline it efficiently: > > > > rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id, > > rte_crypto_sym_session *ses, ...) { > > struct rte_cryptodev *dev =3D &rte_cryptodevs[dev_id]; > > return (*dev->process)(sess->data[dev->driver_id, ...); } > > > > driver_specific_process(driver_specific_sym_session *sess) { > > return sess->process(sess, ...) ; > > } > > > > I didn't make any exact measurements but sure it would be slower than > just: > > session_udata->process(session->udata->sess, ...); Again it would be > > much more noticeable on low end cpus. > > Let say here: > > http://mails.dpdk.org/archives/dev/2019-September/144350.html > > Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev > > contents - I suppose we would have something similar here. > > I do realize that in majority of cases crypto is more expensive then > > RX/TX, but still. > > > > If it would be a really unavoidable tradeoff (support already existing > > API, or so) I wouldn't mind, but I don't see any real need for it right= now. >=20 > Calling session_udata->process(session->udata->sess, ...); from the > application and Application need to maintain for each PMD the process() A= PI > in its memory will make the application not portable to other vendors. >=20 > What we are doing here is defining another way to create sessions for the > same stuff that is already done. This make applications non-portable and > confusing for the application writer. >=20 > I would say you should do some profiling first. As you also mentioned cry= pto > workload is more Cycle consuming, it will not impact this case. >=20 >=20 > > > > > > > > > > > > > > That is why a dev-ops would be a better option. > > > > > > > > > > > > > > > > > > When you would add more functionality to this sync > > > > > > > API/struct, it will > > end > > > > up > > > > > > being the same API/struct. > > > > > > > > > > > > > > Let us see how close/ far we are from the existing APIs > > > > > > > when the > > actual > > > > > > implementation is done. > > > > > > > > > > > > > > > > I am not sure if that would be needed. > > > > > > > > > It would be internal to the driver that if synchronous > > > > > > > > > processing is > > > > > > > > supported(from feature flag) and > > > > > > > > > Have relevant fields in xform(the newly added ones which > > > > > > > > > are > > packed > > > > as > > > > > > per > > > > > > > > your suggestions) set, > > > > > > > > > It will create that type of session. > > > > > > > > > > > > > > > > > > > > > > > > > > > > + * Main points: > > > > > > > > > > + * - Current crypto-dev API is reasonably mature and > > > > > > > > > > + it is > > desirable > > > > > > > > > > + * to keep it unchanged (API/ABI stability). From ot= her > side, this > > > > > > > > > > + * new sync API is new one and probably would requir= e > extra > > > > changes. > > > > > > > > > > + * Having it as a new one allows to mark it as exper= imental, > > without > > > > > > > > > > + * affecting existing one. > > > > > > > > > > + * - Fully opaque cpu_sym_session structure gives more > flexibility > > > > > > > > > > + * to the PMD writers and again allows to avoid ABI > breakages > > in > > > > future. > > > > > > > > > > + * - process() function per set of xforms > > > > > > > > > > + * allows to expose different process() functions fo= r > different > > > > > > > > > > + * xform combinations. PMD writer can decide, does h= e > wants > > to > > > > > > > > > > + * push all supported algorithms into one process() > function, > > > > > > > > > > + * or spread it across several ones. > > > > > > > > > > + * I.E. More flexibility for PMD writer. > > > > > > > > > > > > > > > > > > Which process function should be chosen is internal to > > > > > > > > > PMD, how > > > > would > > > > > > that > > > > > > > > info > > > > > > > > > be visible to the application or the library. These will > > > > > > > > > get stored in > > the > > > > > > session > > > > > > > > private > > > > > > > > > data. It would be upto the PMD writer, to store the per > > > > > > > > > session > > process > > > > > > > > function in > > > > > > > > > the session private data. > > > > > > > > > > > > > > > > > > Process function would be a dev ops just like enc/deq > > > > > > > > > operations > > and it > > > > > > should > > > > > > > > call > > > > > > > > > The respective process API stored in the session private = data. > > > > > > > > > > > > > > > > That model (via devops) is possible, but has several > > > > > > > > drawbacks from > > my > > > > > > > > perspective: > > > > > > > > > > > > > > > > 1. It means we'll need to pass dev_id as a parameter to > > > > > > > > process() > > function. > > > > > > > > Though in fact dev_id is not a relevant information for us > > > > > > > > here (all we need is pointer to the session and pointer to > > > > > > > > the fuction to call) and I tried to avoid using it in data-= path > functions for that API. > > > > > > > > > > > > > > You have a single vdev, but someone may have multiple vdevs > > > > > > > for each > > > > thread, > > > > > > or may > > > > > > > Have same dev with multiple queues for each core. > > > > > > > > > > > > That's fine. As I said above it is a SW backed implementation. > > > > > > Each session has to be a separate entity that contains all > > > > > > necessary > > > > information > > > > > > (keys, alg/mode info, etc.) to process input buffers. > > > > > > Plus we need the actual function pointer to call. > > > > > > I just don't see what for we need a dev_id in that situation. > > > > > > > > > > To iterate the session private data in the session. > > > > > > > > > > > Again, here we don't need care about queues and their pinning t= o > cores. > > > > > > If let say someone would like to process buffers from the same > > > > > > IPsec SA > > on 2 > > > > > > different cores in parallel, he can just create 2 sessions for > > > > > > the same > > xform, > > > > > > give one to thread #1 and second to thread #2. > > > > > > After that both threads are free to call process(this_thread_se= s, ...) > at will. > > > > > > > > > > Say you have a 16core device to handle 100G of traffic on a singl= e > tunnel. > > > > > Will we make 16 sessions with same parameters? > > > > > > > > Absolutely same question we can ask for current crypto-op API. > > > > You have lookaside crypto-dev with 16 HW queues, each queue is > > > > serviced by different CPU. > > > > For the same SA, do you need a separate session per queue, or is > > > > it ok to > > reuse > > > > current one? > > > > AFAIK, right now this is a grey area not clearly defined. > > > > For crypto-devs I am aware - user can reuse the same session (as > > > > PMD uses it read-only). > > > > But again, right now I think it is not clearly defined and is > > > > implementation specific. > > > > > > User can use the same session, that is what I am also insisting, but > > > it may have > > separate > > > Session private data. Cryptodev session create API provide that > > > functionality > > and we can > > > Leverage that. > > > > rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which > > means we can't use the same rte_cryptodev_sym_session to hold sessions > > for both sync and async mode for the same device. Off course we can > > add a hard requirement that any driver that wants to support process() > > has to create sessions that can handle both process and > > enqueue/dequeue, but then again what for to create such overhead? > > > > BTW, to be honest, I don't consider current rte_cryptodev_sym_session > > construct for multiple device_ids: > > __extension__ struct { > > void *data; > > uint16_t refcnt; > > } sess_data[0]; > > /**< Driver specific session material, variable size */ > > > Yes I also feel the same. I was also not in favor of this when it was int= roduced. > Please go ahead and remove this. I have no issues with that. >=20 > > as an advantage. > > It looks too error prone for me: > > 1. Simultaneous session initialization/de-initialization for devices > > with the same driver_id is not possible. > > 2. It assumes that all device driver will be loaded before we start to > > create session pools. > > > > Right now it seems ok, as no-one requires such functionality, but I > > don't know how it will be in future. > > For me rte_security session model, where for each security context > > user have to create new session looks much more robust. > Agreed >=20 > > > > > > > > BTW, I can see a v2 to this RFC which is still based on security libr= ary. > > > > Yes, v2 was concentrated on fixing found issues, some code > > restructuring, i.e. - changes that would be needed anyway whatever API > aproach we'll choose. > > > > > When do you plan > > > To submit the patches for crypto based APIs. We have RC1 merge > > > deadline for > > this > > > patchset on 21st Oct. > > > > We'd like to start working on it ASAP, but it seems we still have a > > major disagreement about how this crypto-dev API should look like. > > Which makes me think - should we return to our original proposal via > > rte_security? > > It still looks to me like clean and straightforward way to enable this > > new API, and probably wouldn't cause that much controversy. > > What do you think? >=20 > I cannot spend more time discussing on this until RC1 date. I have some o= ther > stuff pending. > You can send the patches early next week with the approach that I > mentioned or else we can discuss this post RC1(which would mean deferring > to 20.02). >=20 > But moving back to security is not acceptable to me. The code should be p= ut > where it is intended and not where it is easy to put. You are not doing a= ny > rte_security stuff. >=20 >=20 > Regards, > Akhil