From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B06E1A3160 for ; Wed, 9 Oct 2019 15:43:14 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 3B0381E569; Wed, 9 Oct 2019 15:43:13 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id F23CA1D452 for ; Wed, 9 Oct 2019 15:43:10 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Oct 2019 06:43:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,276,1566889200"; d="scan'208";a="218629358" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by fmsmga004.fm.intel.com with ESMTP; 09 Oct 2019 06:43:06 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.164]) by IRSMSX107.ger.corp.intel.com ([169.254.10.7]) with mapi id 14.03.0439.000; Wed, 9 Oct 2019 14:43:07 +0100 From: "Ananyev, Konstantin" To: Akhil Goyal , "'dev@dpdk.org'" , "De Lara Guarch, Pablo" , 'Thomas Monjalon' , "Zhang, Roy Fan" CC: "Doherty, Declan" , 'Anoob Joseph' Thread-Topic: [RFC PATCH 1/9] security: introduce CPU Crypto action type and API Thread-Index: AQHVYm4Y+AedzaNgY0qWMAmu5GwNVqcbQpEAgAArCICAAuADAIAARA4AgAYiKoCAAbQS0IABqquAgAZiqNCAAPA4gIABtviQgAu3KZCAAoImAIAExwxQgAA3zQCAAZ1E0IADFFEAgAZGHRCAAsIfgIAAUayA Date: Wed, 9 Oct 2019 13:43:06 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258019197446B@irsmsx105.ger.corp.intel.com> References: <20190903154046.55992-1-roy.fan.zhang@intel.com> <20190903154046.55992-2-roy.fan.zhang@intel.com> <9F7182E3F746AB4EA17801C148F3C6043369D686@IRSMSX101.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580191926A17@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580191962CD5@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580191966116@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580191966C23@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258019196A767@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258019196D53D@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258019196F386@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB977258019197206C@irsmsx105.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNjc1NjI0MTgtNjYxZi00OTc5LTk0YTctZDMxYWI1Y2M0YjVhIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoicW1JeERScWc0Z2tiZTVJNHdzc0Y2c1JFQkxnMnFWODJ1NXZKR2N4YUU0QWtvR3paM0g1d2lySWFHYnVSRlBaTyJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Akhil, > > > > > > > > > > > > > > > > > This action type allows the burst of symm= etric crypto > > > > > > workload > > > > > > > > using > > > > > > > > > > > the > > > > > > > > > > > > > > > same > > > > > > > > > > > > > > > > > algorithm, key, and direction being proce= ssed by CPU > > > > cycles > > > > > > > > > > > > > synchronously. > > > > > > > > > > > > > > > > > This flexible action type does not requir= e external > > > > hardware > > > > > > > > > > > involvement, > > > > > > > > > > > > > > > > > having the crypto workload processed sync= hronously, > > > > and is > > > > > > > > more > > > > > > > > > > > > > > > performant > > > > > > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cy= cles on > > > > removed > > > > > > > > "async > > > > > > > > > > > > > mode > > > > > > > > > > > > > > > > > simulation" as well as 3 cacheline access= of the > > crypto > > > > ops. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Does that mean application will not call th= e > > > > > > > > cryptodev_enqueue_burst > > > > > > > > > > > and > > > > > > > > > > > > > > > corresponding dequeue burst. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, instead it just call > > > > rte_security_process_cpu_crypto_bulk(...) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It would be a new API something like proces= s_packets > > and > > > > it > > > > > > will > > > > > > > > have > > > > > > > > > > > the > > > > > > > > > > > > > > > crypto processed packets while returning from= the API? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, though the plan is that API will operate= on raw data > > > > buffers, > > > > > > > > not > > > > > > > > > > > mbufs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I still do not understand why we cannot do = with the > > > > > > conventional > > > > > > > > > > > crypto lib > > > > > > > > > > > > > > > only. > > > > > > > > > > > > > > > > As far as I can understand, you are not doi= ng any > > protocol > > > > > > > > processing > > > > > > > > > > > or > > > > > > > > > > > > > any > > > > > > > > > > > > > > > value add > > > > > > > > > > > > > > > > To the crypto processing. IMO, you just nee= d a > > > > synchronous > > > > > > > > crypto > > > > > > > > > > > > > processing > > > > > > > > > > > > > > > API which > > > > > > > > > > > > > > > > Can be defined in cryptodev, you don't need= to re- > > create a > > > > > > crypto > > > > > > > > > > > session > > > > > > > > > > > > > in > > > > > > > > > > > > > > > the name of > > > > > > > > > > > > > > > > Security session in the driver just to do a= synchronous > > > > > > processing. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I suppose your question is why not to have > > > > > > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) inste= ad? > > > > > > > > > > > > > > > The main reason is that would require disrupt= ive changes > > in > > > > > > existing > > > > > > > > > > > > > cryptodev > > > > > > > > > > > > > > > API > > > > > > > > > > > > > > > (would cause ABI/API breakage). > > > > > > > > > > > > > > > Session for RTE_SECURITY_ACTION_TYPE_CPU_CRY= PTO > > > > need > > > > > > > > some > > > > > > > > > > > extra > > > > > > > > > > > > > > > information > > > > > > > > > > > > > > > that normal crypto_sym_xform doesn't contain > > > > > > > > > > > > > > > (cipher offset from the start of the buffer, = might be > > > > something > > > > > > extra > > > > > > > > in > > > > > > > > > > > > > future). > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cipher offset will be part of rte_crypto_op. > > > > > > > > > > > > > > > > > > > > > > > > > > fill/read (+ alloc/free) is one of the main thing= s that > > slowdown > > > > > > current > > > > > > > > > > > crypto-op > > > > > > > > > > > > > approach. > > > > > > > > > > > > > That's why the general idea - have all data that = wouldn't > > change > > > > > > from > > > > > > > > packet > > > > > > > > > > > to > > > > > > > > > > > > > packet > > > > > > > > > > > > > included into the session and setup it once at se= ssion_init(). > > > > > > > > > > > > > > > > > > > > > > > > I agree that you cannot use crypto-op. > > > > > > > > > > > > You can have the new API in crypto. > > > > > > > > > > > > As per the current patch, you only need cipher_offs= et which > > you > > > > can > > > > > > have > > > > > > > > it as > > > > > > > > > > > a parameter until > > > > > > > > > > > > You get it approved in the crypto xform. I believe = it will be > > > > beneficial > > > > > > in > > > > > > > > case of > > > > > > > > > > > other crypto cases as well. > > > > > > > > > > > > We can have cipher offset at both places(crypto-op = and > > > > > > cipher_xform). It > > > > > > > > will > > > > > > > > > > > give flexibility to the user to > > > > > > > > > > > > override it. > > > > > > > > > > > > > > > > > > > > > > After having another thought on your proposal: > > > > > > > > > > > Probably we can introduce new rte_crypto_sym_xform_ty= pes > > for > > > > CPU > > > > > > > > related > > > > > > > > > > > stuff here? > > > > > > > > > > > > > > > > > > > > I also thought of adding new xforms, but that wont serv= e the > > purpose > > > > for > > > > > > > > may be all the cases. > > > > > > > > > > You would be needing all information currently availabl= e in the > > > > current > > > > > > > > xforms. > > > > > > > > > > So if you are adding new fields in the new xform, the s= ize will be > > more > > > > > > than > > > > > > > > that of the union of xforms. > > > > > > > > > > ABI breakage would still be there. > > > > > > > > > > > > > > > > > > > > If you think a valid compression of the AEAD xform can = be done, > > then > > > > > > that > > > > > > > > can be done for each of the > > > > > > > > > > Xforms and we can have a solution to this issue. > > > > > > > > > > > > > > > > > > I think that we can re-use iv.offset for our purposes (fo= r crypto > > offset). > > > > > > > > > So for now we can make that path work without any ABI bre= akage. > > > > > > > > > Fan, please feel free to correct me here, if I missed som= ething. > > > > > > > > > If in future we would need to add some extra information = it might > > > > > > > > > require ABI breakage, though by now I don't envision anyt= hing > > > > particular to > > > > > > > > add. > > > > > > > > > Anyway, if there is no objection to go that way, we can t= ry to make > > > > > > > > > these changes for v2. > > > > > > > > > > > > > > > > > > > > > > > > > Actually, after looking at it more deeply it appears not th= at easy as I > > > > thought > > > > > > it > > > > > > > > would be :) > > > > > > > > Below is a very draft version of proposed API additions. > > > > > > > > I think it avoids ABI breakages right now and provides enou= gh > > flexibility > > > > for > > > > > > > > future extensions (if any). > > > > > > > > For now, it doesn't address your comments about naming > > conventions > > > > > > (_CPU_ > > > > > > > > vs _SYNC_) , etc. > > > > > > > > but I suppose is comprehensive enough to provide a main ide= a > > beyond it. > > > > > > > > Akhil and other interested parties, please try to review an= d provide > > > > feedback > > > > > > > > ASAP, > > > > > > > > as related changes would take some time and we still like t= o hit 19.11 > > > > > > deadline. > > > > > > > > Konstantin > > > > > > > > > > > > > > > > diff --git a/lib/librte_cryptodev/rte_crypto_sym.h > > > > > > > > b/lib/librte_cryptodev/rte_crypto_sym.h > > > > > > > > index bc8da2466..c03069e23 100644 > > > > > > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h > > > > > > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h > > > > > > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[]; > > > > > > > > * > > > > > > > > * This structure contains data relating to Cipher (Encryp= tion and > > > > Decryption) > > > > > > > > * use to create a session. > > > > > > > > + * Actually I was wrong saying that we don't have free spa= ce inside > > > > xforms. > > > > > > > > + * Making key struct packed (see below) allow us to regain= 6B that > > could > > > > be > > > > > > > > + * used for future extensions. > > > > > > > > */ > > > > > > > > struct rte_crypto_cipher_xform { > > > > > > > > enum rte_crypto_cipher_operation op; > > > > > > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform { > > > > > > > > struct { > > > > > > > > const uint8_t *data; /**< pointer to key= data */ > > > > > > > > uint16_t length; /**< key length in = bytes */ > > > > > > > > - } key; > > > > > > > > + } __attribute__((__packed__)) key; > > > > > > > > + > > > > > > > > + /** > > > > > > > > + * offset for cipher to start within user provided= data buffer. > > > > > > > > + * Fan suggested another (and less space consuming = way) - > > > > > > > > + * reuse iv.offset space below, by changing: > > > > > > > > + * struct {uint16_t offset, length;} iv; > > > > > > > > + * to uunamed union: > > > > > > > > + * union { > > > > > > > > + * struct {uint16_t offset, length;} iv; > > > > > > > > + * struct {uint16_t iv_len, crypto_offset} cpu= _crypto_param; > > > > > > > > + * }; > > > > > > > > + * Both approaches seems ok to me in general. > > > > > > > > > > > > > > No strong opinions here. OK with this one. > > > > > > > > > > > > > > > + * Comments/suggestions are welcome. > > > > > > > > + */ > > > > > > > > + uint16_t offset; > > > > > > > > > > > > After another thought - it is probably a bit better to have off= set as a > > separate > > > > > > field. > > > > > > In that case we can use the same xforms to create both type of = sessions. > > > > > ok > > > > > > > > > > > > > > + > > > > > > > > + uint8_t reserved1[4]; > > > > > > > > + > > > > > > > > /**< Cipher key > > > > > > > > * > > > > > > > > * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operati= on, > > > > key.data > > > > > > will > > > > > > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform { > > > > > > > > struct { > > > > > > > > const uint8_t *data; /**< pointer to key= data */ > > > > > > > > uint16_t length; /**< key length in = bytes */ > > > > > > > > - } key; > > > > > > > > + } __attribute__((__packed__)) key; > > > > > > > > /**< Authentication key data. > > > > > > > > * The authentication key length MUST be less than = or equal to > > the > > > > > > > > * block size of the algorithm. It is the callers r= esponsibility to > > > > > > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform { > > > > > > > > * (for example RFC 2104, FIPS 198a). > > > > > > > > */ > > > > > > > > > > > > > > > > + uint8_t reserved1[6]; > > > > > > > > + > > > > > > > > struct { > > > > > > > > uint16_t offset; > > > > > > > > /**< Starting point for Initialisation Vect= or or Counter, > > > > > > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform { > > > > > > > > struct { > > > > > > > > const uint8_t *data; /**< pointer to key= data */ > > > > > > > > uint16_t length; /**< key length in = bytes */ > > > > > > > > - } key; > > > > > > > > + } __attribute__((__packed__)) key; > > > > > > > > + > > > > > > > > + /** offset for cipher to start within data buffer *= / > > > > > > > > + uint16_t cipher_offset; > > > > > > > > + > > > > > > > > + uint8_t reserved1[4]; > > > > > > > > > > > > > > > > struct { > > > > > > > > uint16_t offset; > > > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h > > > > > > > > b/lib/librte_cryptodev/rte_cryptodev.h > > > > > > > > index e175b838c..c0c7bfed7 100644 > > > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev.h > > > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h > > > > > > > > @@ -1272,6 +1272,101 @@ void * > > > > > > > > rte_cryptodev_sym_session_get_user_data( > > > > > > > > struct rte_cryptode= v_sym_session *sess); > > > > > > > > > > > > > > > > +/* > > > > > > > > + * After several thoughts decided not to try to squeeze CP= U_CRYPTO > > > > > > > > + * into existing rte_crypto_sym_session structure/API, but= instead > > > > > > > > + * introduce an extentsion to it via new fully opaque > > > > > > > > + * struct rte_crypto_cpu_sym_session and additional relate= d API. > > > > > > > > > > > > > > > > > > > > > What all things do we need to squeeze? > > > > > > > In this proposal I do not see the new struct cpu_sym_session = defined > > here. > > > > > > > > > > > > The plan is to have it totally opaque to the user, i.e. just: > > > > > > struct rte_crypto_cpu_sym_session; > > > > > > in public header files. > > > > > > > > > > > > > I believe you will have same lib API/struct for cpu_sym_sessi= on and > > > > > > sym_session. > > > > > > > > > > > > I thought about such way, but there are few things that looks c= lumsy to > > me: > > > > > > 1. Right now there is no 'type' (or so) field inside > > rte_cryptodev_sym_session, > > > > > > so it is not possible to easy distinguish what session do you h= ave: > > lksd_sym or > > > > > > cpu_sym. > > > > > > In theory, there is a hole of 4B inside rte_cryptodev_sym_sessi= on, so we > > can > > > > add > > > > > > some extra field > > > > > > here, but in that case we wouldn't be able to use the same xfo= rm for > > both > > > > > > lksd_sym or cpu_sym > > > > > > (which seems really plausible thing for me). > > > > > > 2. Majority of rte_cryptodev_sym_session fields I think are un= necessary > > for > > > > > > rte_crypto_cpu_sym_session: > > > > > > sess_data[], opaque_data, user_data, nb_drivers. > > > > > > All that consumes space, that could be used somewhere else inst= ead. > > > > > > 3. I am a bit reluctant to touch existing rte_cryptodev API - t= o avoid any > > > > > > breakages I can't foresee right now. > > > > > > From other side - if we'll add new functions/structs for cpu_sy= m_session > > we > > > > can > > > > > > mark it > > > > > > and keep it for some time as experimental, so further changes (= if needed) > > > > would > > > > > > still be possible. > > > > > > > > > > > > > > > > OK let us assume that you have a separate structure. But I have a= few > > queries: > > > > > 1. how can multiple drivers use a same session > > > > > > > > As a short answer: they can't. > > > > It is pretty much the same approach as with rte_security - each dev= ice needs > > to > > > > create/init its own session. > > > > So upper layer would need to maintain its own array (or so) for suc= h case. > > > > Though the question is why would you like to have same session over > > multiple > > > > SW backed devices? > > > > As it would be anyway just a synchronous function call that will be= executed > > on > > > > the same cpu. > > > > > > I may have single FAT tunnel which may be distributed over multiple > > > Cores, and each core is affined to a different SW device. > > > > If it is pure SW, then we don't need multiple devices for such scenario= . > > Device in that case is pure abstraction that we can skip. >=20 > Yes agreed, but that liberty is given to the application whether it need = multiple > devices with single queue or a single device with multiple queues. > I think that independence should not be broken in this new API. > > > > > So a single session may be accessed by multiple devices. > > > > > > One more example would be depending on packet sizes, I may switch bet= ween > > > HW/SW PMDs with the same session. > > > > Sure, but then we'll have multiple sessions. >=20 > No, the session will be same and it will have multiple private data for e= ach of the PMD. >=20 > > BTW, we have same thing now - these private session pointers are just s= tored > > inside the same rte_crypto_sym_session. > > And if user wants to support this model, he would also need to store > queue_id> > > pair for each HW device anyway. >=20 > Yes agreed, but how is that thing happening in your new struct, you canno= t support that. User can store all these info in his own struct. That's exactly what we have right now. Let say ipsec-secgw has to store for each IPsec SA: pointer to crypto-session and/or pointer to security session plus (for lookaside-devices) cdev_id_qp that allows it to extract dev_id + queue_id information. As I understand that works for now, as each ipsec_sa uses only one dev+queue. Though if someone would like to use multiple devices/queues for the same SA - he would need to have an array of these pairs= . So even right now rte_cryptodev_sym_session is not self-consistent and requires extra information to be maintained by user.=20 >=20 > > > > > > > > > > > > > > 2. Can somebody use the scheduler pmd for scheduling the differen= t type > > of > > > > payloads for the same session? > > > > > > > > In theory yes. > > > > Though for that scheduler pmd should have inside it's > > > > rte_crypto_cpu_sym_session an array of pointers to > > > > the underlying devices sessions. > > > > > > > > > > > > > > With your proposal the APIs would be very specific to your use ca= se only. > > > > > > > > Yes in some way. > > > > I consider that API specific for SW backed crypto PMDs. > > > > I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) wil= l benefit > > > > from it. > > > > Current crypto-op API is very much HW oriented. > > > > Which is ok, that's for it was intended for, but I think we also ne= ed one that > > > > would be designed > > > > for SW backed implementation in mind. > > > > > > We may re-use your API for HW PMDs as well which do not have requirem= ent > > of > > > Crypto-op/mbuf etc. > > > The return type of your new process API may have a status which say > > 'processed' > > > Or can be say 'enqueued'. So if it is 'enqueued', we may have a new = API for > > raw > > > Bufs dequeue as well. > > > > > > This requirement can be for any hardware PMDs like QAT as well. > > > > I don't think it is a good idea to extend this API for async (lookaside= ) devices. > > You'll need to: > > - provide dev_id and queue_id for each process(enqueue) and dequeuer > > operation. > > - provide IOVA for all buffers passing to that function (data buffers,= digest, IV, > > aad). > > - On dequeue provide some way to associate dequed data and digest buff= ers > > with > > crypto-session that was used (and probably with mbuf). > > So most likely we'll end up with another just version of our current c= rypto-op > > structure. > > If you'd like to get rid of mbufs dependency within current crypto-op A= PI that > > understandable, > > but I don't think we should have same API for both sync (CPU) and async > > (lookaside) cases. > > It doesn't seem feasible at all and voids whole purpose of that patch. >=20 > At this moment we are not much concerned about the dequeue API and about = the > HW PMD support. It is just that the new API should be generic enough to b= e used in > some future scenarios as well. I am just highlighting the possible usecas= es which can > be there in future. Sorry, but I strongly disagree with such approach. We should stop adding/modifying API 'just in case' and because 'it might be= useful for some future HW'. Inside DPDK we already do have too many dev level APIs without any implemen= tations. That's quite bad practice and very dis-orienting for end-users. I think to justify API additions/changes we need at least one proper implem= entation for it, or at least some strong evidence that people are really committed to suppor= t it in nearest future. BTW, that what TB agreed on, nearly a year ago. =20 This new API (if we'll go ahead with it of course) would stay experimental = for some time anyway to make sure we don't miss anything needed (I think for about a year time-f= rame). So if you guys *really* want to extend it support _async_ devices too - I am open for modifications/additions here. Though personally I think such addition would over-complicate things and we= 'll end up with another reincarnation of current crypto-op. We actually discussed it internally, and decided to drop that idea because = of that. =20 Again, my opinion - for lookaside devices it might be better to try to opti= mize current crypto-op path (remove mbuf requirement, probably add ability to group by session on enqueue/dequeue, etc.).=20 >=20 > What is the issue that you face in making a dev-op for this new API. Do y= ou see any > performance impact with that? There are two main things: 1. user would need to maintain and provide for each process() call dev_id+q= ueue_id. That's means extra (and totally unnecessary for SW) overhead.=20 2. yes I would expect some perf overhead too - it would be extra call or br= anch. Again as it would be data-dependency - most likely cpu wouldn't be able to = pipeline it efficiently: rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id, rte_crypto_sym_session= *ses, ...) { struct rte_cryptodev *dev =3D &rte_cryptodevs[dev_id]; return (*dev->process)(sess->data[dev->driver_id, ...); } driver_specific_process(driver_specific_sym_session *sess) { return sess->process(sess, ...) ; } I didn't make any exact measurements but sure it would be slower than just: session_udata->process(session->udata->sess, ...); Again it would be much more noticeable on low end cpus. Let say here: http://mails.dpdk.org/archives/dev/2019-September/144350.html Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev cont= ents - I suppose we would have something similar here. I do realize that in majority of cases crypto is more expensive then RX/TX,= but still.=20 If it would be a really unavoidable tradeoff (support already existing API,= or so) I wouldn't mind, but I don't see any real need for it right now. >=20 > > > > > That is why a dev-ops would be a better option. > > > > > > > > > > > > When you would add more functionality to this sync API/struct, it= will end > > up > > > > being the same API/struct. > > > > > > > > > > Let us see how close/ far we are from the existing APIs when the= actual > > > > implementation is done. > > > > > > > > > > > > I am not sure if that would be needed. > > > > > > > It would be internal to the driver that if synchronous proces= sing is > > > > > > supported(from feature flag) and > > > > > > > Have relevant fields in xform(the newly added ones which are = packed > > as > > > > per > > > > > > your suggestions) set, > > > > > > > It will create that type of session. > > > > > > > > > > > > > > > > > > > > > > + * Main points: > > > > > > > > + * - Current crypto-dev API is reasonably mature and it is= desirable > > > > > > > > + * to keep it unchanged (API/ABI stability). From other = side, this > > > > > > > > + * new sync API is new one and probably would require ex= tra > > changes. > > > > > > > > + * Having it as a new one allows to mark it as experimen= tal, without > > > > > > > > + * affecting existing one. > > > > > > > > + * - Fully opaque cpu_sym_session structure gives more fle= xibility > > > > > > > > + * to the PMD writers and again allows to avoid ABI brea= kages in > > future. > > > > > > > > + * - process() function per set of xforms > > > > > > > > + * allows to expose different process() functions for di= fferent > > > > > > > > + * xform combinations. PMD writer can decide, does he wa= nts to > > > > > > > > + * push all supported algorithms into one process() func= tion, > > > > > > > > + * or spread it across several ones. > > > > > > > > + * I.E. More flexibility for PMD writer. > > > > > > > > > > > > > > Which process function should be chosen is internal to PMD, h= ow > > would > > > > that > > > > > > info > > > > > > > be visible to the application or the library. These will get = stored in the > > > > session > > > > > > private > > > > > > > data. It would be upto the PMD writer, to store the per sessi= on process > > > > > > function in > > > > > > > the session private data. > > > > > > > > > > > > > > Process function would be a dev ops just like enc/deq operati= ons and it > > > > should > > > > > > call > > > > > > > The respective process API stored in the session private data= . > > > > > > > > > > > > That model (via devops) is possible, but has several drawbacks = from my > > > > > > perspective: > > > > > > > > > > > > 1. It means we'll need to pass dev_id as a parameter to process= () function. > > > > > > Though in fact dev_id is not a relevant information for us here > > > > > > (all we need is pointer to the session and pointer to the fucti= on to call) > > > > > > and I tried to avoid using it in data-path functions for that A= PI. > > > > > > > > > > You have a single vdev, but someone may have multiple vdevs for e= ach > > thread, > > > > or may > > > > > Have same dev with multiple queues for each core. > > > > > > > > That's fine. As I said above it is a SW backed implementation. > > > > Each session has to be a separate entity that contains all necessar= y > > information > > > > (keys, alg/mode info, etc.) to process input buffers. > > > > Plus we need the actual function pointer to call. > > > > I just don't see what for we need a dev_id in that situation. > > > > > > To iterate the session private data in the session. > > > > > > > Again, here we don't need care about queues and their pinning to co= res. > > > > If let say someone would like to process buffers from the same IPse= c SA on 2 > > > > different cores in parallel, he can just create 2 sessions for the = same xform, > > > > give one to thread #1 and second to thread #2. > > > > After that both threads are free to call process(this_thread_ses, .= ..) at will. > > > > > > Say you have a 16core device to handle 100G of traffic on a single tu= nnel. > > > Will we make 16 sessions with same parameters? > > > > Absolutely same question we can ask for current crypto-op API. > > You have lookaside crypto-dev with 16 HW queues, each queue is serviced= by > > different CPU. > > For the same SA, do you need a separate session per queue, or is it ok = to reuse > > current one? > > AFAIK, right now this is a grey area not clearly defined. > > For crypto-devs I am aware - user can reuse the same session (as PMD us= es it > > read-only). > > But again, right now I think it is not clearly defined and is implement= ation > > specific. >=20 > User can use the same session, that is what I am also insisting, but it m= ay have separate > Session private data. Cryptodev session create API provide that functiona= lity and we can > Leverage that. rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means= we can't use the same rte_cryptodev_sym_session to hold sessions for both sync and async= mode for the same device. Off course we can add a hard requirement that any driv= er that wants to support process() has to create sessions that can handle both process and = enqueue/dequeue, but then again what for to create such overhead? BTW, to be honest, I don't consider current rte_cryptodev_sym_session const= ruct for multiple device_ids: __extension__ struct { void *data; uint16_t refcnt; } sess_data[0]; /**< Driver specific session material, variable size */ as an advantage. It looks too error prone for me: 1. Simultaneous session initialization/de-initialization for devices with t= he same driver_id is not possible. 2. It assumes that all device driver will be loaded before we start to crea= te session pools. Right now it seems ok, as no-one requires such functionality, but I don't k= now how it will be in future. For me rte_security session model, where for each security context user hav= e to create new session looks much more robust. =20 >=20 > BTW, I can see a v2 to this RFC which is still based on security library. Yes, v2 was concentrated on fixing found issues, some code restructuring,=20 i.e. - changes that would be needed anyway whatever API aproach we'll choos= e. > When do you plan > To submit the patches for crypto based APIs. We have RC1 merge deadline f= or this > patchset on 21st Oct. We'd like to start working on it ASAP, but it seems we still have a major d= isagreement about how this crypto-dev API should look like. =20 Which makes me think - should we return to our original proposal via rte_se= curity? It still looks to me like clean and straightforward way to enable this new = API, and probably wouldn't cause that much controversy. What do you think?=20 >=20 > As per my understanding you only need a new dev-op for sync support. Sess= ion APIs > Will remain the same and you will have some extra fields packed in xform = structs. >=20 > The PMD will need to maintain a pointer to the per session process functi= on while creating > Session and will be used by the dev-op API at runtime without any extra c= heck at runtime. >=20 > > > > > > > > > > > > > > > > > > > > 2. As you pointed in that case it will be just one process() fu= nction per > > device. > > > > > > So if PMD would like to have several process() functions for di= fferent type > > of > > > > > > sessions > > > > > > (let say one per alg) first thing it has to do inside it's proc= ess() - read > > session > > > > data > > > > > > and > > > > > > based on that, do a jump/call to particular internal sub-routin= e. > > > > > > Something like: > > > > > > driver_id =3D get_pmd_driver_id(); > > > > > > priv_ses =3D ses->sess_data[driver_id]; > > > > > > Then either: > > > > > > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);b= reak;...} > > > > > > OR > > > > > > priv_ses->process(priv_sess, ...); > > > > > > > > > > > > to select and call the proper function. > > > > > > Looks like totally unnecessary overhead to me. > > > > > > Though if we'll have ability to query/extract some sort session= _ops based > > on > > > > the > > > > > > xform - > > > > > > we can avoid this extra de-refererence+jump/call thing. > > > > > > > > > > What is the issue in the priv_ses->process(); approach? > > > > > > > > Nothing at all. > > > > What I am saying that schema with dev_ops > > > > dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...) > > > > | > > > > |-> priv_ses->process(...) > > > > > > > > Has bigger overhead then just: > > > > process(ses,...); > > > > > > > > So what for to introduce extra-level of indirection here? > > > > > > Explained above. > > > > > > > > > > > > I don't understand what are you saving by not doing this. > > > > > In any case you would need to identify which session correspond t= o which > > > > process(). > > > > > > > > Yes, sure, but I think we can make user to store information that r= elationship, > > > > in a way he likes: store process() pointer for each session, or gro= up sessions > > > > that share the same process() somehow, or... > > > > > > So whatever relationship that user will make and store will make its = life > > complicated. > > > If we can hide that information in the driver, then what is the issue= in that and > > user > > > Will not need to worry. He would just call the process() and driver w= ill choose > > which > > > Process need to be called. > > > > Driver can do that at config/init time. > > Then at run-time we can avoid that choice at all and call already chose= n function. > > > > > > > > I think we should have a POC around this and see the difference in th= e cycle > > count. > > > IMO it would be negligible and we would end up making a generic API s= et > > which > > > can be used by others as well. > > > > > > > > > > > > For that you would be doing it somewhere in your data path. > > > > > > > > Why at data-path? > > > > Only once at session creation/initialization time. > > > > Or might be even once per group of sessions. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I am not sure if you would need a new session init API for th= is as > > nothing > > > > would > > > > > > be visible to > > > > > > > the app or lib. > > > > > > > > > > > > > > > + * - Not storing process() pointer inside the session - > > > > > > > > + * Allows user to choose does he want to store a process= () pointer > > > > > > > > + * per session, or per group of sessions for that device= that share > > > > > > > > + * the same input xforms. I.E. extra flexibility for the= user, > > > > > > > > + * plus allows us to keep cpu_sym_session totally opaque= , see > > above. > > > > > > > > > > > > > > If multiple sessions need to be processed via the same proces= s function, > > > > > > > PMD would save the same process in all the sessions, I don't = think there > > > > would > > > > > > > be any perf overhead with that. > > > > > > > > > > > > I think it would, see above. > > > > > > > > > > > > > > > > > > > > > + * Sketched usage model: > > > > > > > > + * .... > > > > > > > > + * /* control path, alloc/init session */ > > > > > > > > + * int32_t sz =3D rte_crypto_cpu_sym_session_size(dev_id, = &xform); > > > > > > > > + * struct rte_crypto_cpu_sym_session *ses =3D user_alloc(.= .., sz); > > > > > > > > + * rte_crypto_cpu_sym_process_t process =3D > > > > > > > > + * rte_crypto_cpu_sym_session_func(dev_id, &xform); > > > > > > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform); > > > > > > > > + * ... > > > > > > > > + * /* data-path*/ > > > > > > > > + * process(ses, ....); > > > > > > > > + * .... > > > > > > > > + * /* control path, termiante/free session */ > > > > > > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses); > > > > > > > > + */ > > > > > > > > + > > > > > > > > +/** > > > > > > > > + * vector structure, contains pointer to vector array and = the length > > > > > > > > + * of the array > > > > > > > > + */ > > > > > > > > +struct rte_crypto_vec { > > > > > > > > + struct iovec *vec; > > > > > > > > + uint32_t num; > > > > > > > > +}; > > > > > > > > + > > > > > > > > +/* > > > > > > > > + * Data-path bulk process crypto function. > > > > > > > > + */ > > > > > > > > +typedef void (*rte_crypto_cpu_sym_process_t)( > > > > > > > > + struct rte_crypto_cpu_sym_session *sess, > > > > > > > > + struct rte_crypto_vec buf[], void *iv[], vo= id *aad[], > > > > > > > > + void *digest[], int status[], uint32_t num)= ; > > > > > > > > +/* > > > > > > > > + * for given device return process function specific to in= put xforms > > > > > > > > + * on error - return NULL and set rte_errno value. > > > > > > > > + * Note that for same input xfroms for the same device sho= uld > > return > > > > > > > > + * the same process function. > > > > > > > > + */ > > > > > > > > +__rte_experimental > > > > > > > > +rte_crypto_cpu_sym_process_t > > > > > > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id, > > > > > > > > + const struct rte_crypto_sym_xform *= xforms); > > > > > > > > + > > > > > > > > +/* > > > > > > > > + * Return required session size in bytes for given set of = xforms. > > > > > > > > + * if xforms =3D=3D NULL, then return the max possible ses= sion size, > > > > > > > > + * that would fit session for any supported by the device = algorithm. > > > > > > > > + * if CPU mode is not supported at all, or requeted in xfo= rm > > > > > > > > + * algorithm is not supported, then return -ENOTSUP. > > > > > > > > + */ > > > > > > > > +__rte_experimental > > > > > > > > +int > > > > > > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id, > > > > > > > > + const struct rte_crypto_sym_xform *= xforms); > > > > > > > > + > > > > > > > > +/* > > > > > > > > + * Initialize session. > > > > > > > > + * It is caller responsibility to allocate enough space fo= r it. > > > > > > > > + * See rte_crypto_cpu_sym_session_size above. > > > > > > > > + */ > > > > > > > > +__rte_experimental > > > > > > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id, > > > > > > > > + struct rte_crypto_cpu_sym_session *= sess, > > > > > > > > + const struct rte_crypto_sym_xform *= xforms); > > > > > > > > + > > > > > > > > +__rte_experimental > > > > > > > > +void > > > > > > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id, > > > > > > > > + struct rte_crypto_cpu_sym_session *= sess); > > > > > > > > + > > > > > > > > + > > > > > > > > #ifdef __cplusplus > > > > > > > > } > > > > > > > > #endif > > > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h > > > > > > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h > > > > > > > > index defe05ea0..ed7e63fab 100644 > > > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h > > > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h > > > > > > > > @@ -310,6 +310,20 @@ typedef void > > > > > > (*cryptodev_sym_free_session_t)(struct > > > > > > > > rte_cryptodev *dev, > > > > > > > > typedef void (*cryptodev_asym_free_session_t)(struct rte_c= ryptodev > > > > *dev, > > > > > > > > struct rte_cryptodev_asym_session *sess); > > > > > > > > > > > > > > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct > > rte_cryptodev > > > > > > *dev, > > > > > > > > + const struct rte_crypto_sym_xform *= xforms); > > > > > > > > + > > > > > > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct > > rte_cryptodev > > > > > > *dev, > > > > > > > > + struct rte_crypto_cpu_sym_session *= sess, > > > > > > > > + const struct rte_crypto_sym_xform *= xforms); > > > > > > > > + > > > > > > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct > > rte_cryptodev > > > > > > *dev, > > > > > > > > + struct rte_crypto_cpu_sym_session *= sess); > > > > > > > > + > > > > > > > > +typedef rte_crypto_cpu_sym_process_t > > > > > > (*cryptodev_cpu_sym_session_func_t) > > > > > > > > ( > > > > > > > > + struct rte_cryptodev *dev, > > > > > > > > + const struct rte_crypto_sym_xform *= xforms); > > > > > > > > + > > > > > > > > /** Crypto device operations function pointer table */ > > > > > > > > struct rte_cryptodev_ops { > > > > > > > > cryptodev_configure_t dev_configure; /**< Config= ure device. > > */ > > > > > > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops { > > > > > > > > /**< Clear a Crypto sessions private data. */ > > > > > > > > cryptodev_asym_free_session_t asym_session_clear; > > > > > > > > /**< Clear a Crypto sessions private data. */ > > > > > > > > + > > > > > > > > + cryptodev_cpu_sym_session_size_t sym_cpu_session_ge= t_size; > > > > > > > > + cryptodev_cpu_sym_session_func_t sym_cpu_session_ge= t_func; > > > > > > > > + cryptodev_cpu_sym_session_init_t sym_cpu_session_in= it; > > > > > > > > + cryptodev_cpu_sym_session_fini_t sym_cpu_session_fi= ni; > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > > >