DPDK patches and discussions
 help / color / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Akhil Goyal <akhil.goyal@nxp.com>,
	"'dev@dpdk.org'" <dev@dpdk.org>,
	"De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>,
	'Thomas Monjalon' <thomas@monjalon.net>,
	"Zhang, Roy Fan" <roy.fan.zhang@intel.com>
Cc: "Doherty, Declan" <declan.doherty@intel.com>,
	'Anoob Joseph' <anoobj@marvell.com>
Subject: Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
Date: Wed, 9 Oct 2019 13:43:06 +0000
Message-ID: <2601191342CEEE43887BDE71AB977258019197446B@irsmsx105.ger.corp.intel.com> (raw)
In-Reply-To: <VE1PR04MB663968951288FB4C977A610FE6950@VE1PR04MB6639.eurprd04.prod.outlook.com>


Hi Akhil,

> > > > > > > > > > > > > > > > > This action type allows the burst of symmetric crypto
> > > > > > workload
> > > > > > > > using
> > > > > > > > > > > the
> > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > algorithm, key, and direction being processed by CPU
> > > > cycles
> > > > > > > > > > > > > synchronously.
> > > > > > > > > > > > > > > > > This flexible action type does not require external
> > > > hardware
> > > > > > > > > > > involvement,
> > > > > > > > > > > > > > > > > having the crypto workload processed synchronously,
> > > > and is
> > > > > > > > more
> > > > > > > > > > > > > > > performant
> > > > > > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on
> > > > removed
> > > > > > > > "async
> > > > > > > > > > > > > mode
> > > > > > > > > > > > > > > > > simulation" as well as 3 cacheline access of the
> > crypto
> > > > ops.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Does that mean application will not call the
> > > > > > > > cryptodev_enqueue_burst
> > > > > > > > > > > and
> > > > > > > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, instead it just call
> > > > rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It would be a new API something like process_packets
> > and
> > > > it
> > > > > > will
> > > > > > > > have
> > > > > > > > > > > the
> > > > > > > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, though the plan is that API will operate on raw data
> > > > buffers,
> > > > > > > > not
> > > > > > > > > > > mbufs.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I still do not understand why we cannot do with the
> > > > > > conventional
> > > > > > > > > > > crypto lib
> > > > > > > > > > > > > > > only.
> > > > > > > > > > > > > > > > As far as I can understand, you are not doing any
> > protocol
> > > > > > > > processing
> > > > > > > > > > > or
> > > > > > > > > > > > > any
> > > > > > > > > > > > > > > value add
> > > > > > > > > > > > > > > > To the crypto processing. IMO, you just need a
> > > > synchronous
> > > > > > > > crypto
> > > > > > > > > > > > > processing
> > > > > > > > > > > > > > > API which
> > > > > > > > > > > > > > > > Can be defined in cryptodev, you don't need to re-
> > create a
> > > > > > crypto
> > > > > > > > > > > session
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the name of
> > > > > > > > > > > > > > > > Security session in the driver just to do a synchronous
> > > > > > processing.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > > > > > > The main reason is that would require disruptive changes
> > in
> > > > > > existing
> > > > > > > > > > > > > cryptodev
> > > > > > > > > > > > > > > API
> > > > > > > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > need
> > > > > > > > some
> > > > > > > > > > > extra
> > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > > > > > > (cipher offset from the start of the buffer, might be
> > > > something
> > > > > > extra
> > > > > > > > in
> > > > > > > > > > > > > future).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > > > > > > >
> > > > > > > > > > > > > fill/read (+ alloc/free) is one of the main things that
> > slowdown
> > > > > > current
> > > > > > > > > > > crypto-op
> > > > > > > > > > > > > approach.
> > > > > > > > > > > > > That's why the general idea - have all data that wouldn't
> > change
> > > > > > from
> > > > > > > > packet
> > > > > > > > > > > to
> > > > > > > > > > > > > packet
> > > > > > > > > > > > > included into the session and setup it once at session_init().
> > > > > > > > > > > >
> > > > > > > > > > > > I agree that you cannot use crypto-op.
> > > > > > > > > > > > You can have the new API in crypto.
> > > > > > > > > > > > As per the current patch, you only need cipher_offset which
> > you
> > > > can
> > > > > > have
> > > > > > > > it as
> > > > > > > > > > > a parameter until
> > > > > > > > > > > > You get it approved in the crypto xform. I believe it will be
> > > > beneficial
> > > > > > in
> > > > > > > > case of
> > > > > > > > > > > other crypto cases as well.
> > > > > > > > > > > > We can have cipher offset at both places(crypto-op and
> > > > > > cipher_xform). It
> > > > > > > > will
> > > > > > > > > > > give flexibility to the user to
> > > > > > > > > > > > override it.
> > > > > > > > > > >
> > > > > > > > > > > After having another thought on your proposal:
> > > > > > > > > > > Probably we can introduce new rte_crypto_sym_xform_types
> > for
> > > > CPU
> > > > > > > > related
> > > > > > > > > > > stuff here?
> > > > > > > > > >
> > > > > > > > > > I also thought of adding new xforms, but that wont serve the
> > purpose
> > > > for
> > > > > > > > may be all the cases.
> > > > > > > > > > You would be needing all information currently available in the
> > > > current
> > > > > > > > xforms.
> > > > > > > > > > So if you are adding new fields in the new xform, the size will be
> > more
> > > > > > than
> > > > > > > > that of the union of xforms.
> > > > > > > > > > ABI breakage would still be there.
> > > > > > > > > >
> > > > > > > > > > If you think a valid compression of the AEAD xform can be done,
> > then
> > > > > > that
> > > > > > > > can be done for each of the
> > > > > > > > > > Xforms and we can have a solution to this issue.
> > > > > > > > >
> > > > > > > > > I think that we can re-use iv.offset for our purposes (for crypto
> > offset).
> > > > > > > > > So for now we can make that path work without any ABI breakage.
> > > > > > > > > Fan, please feel free to correct me here, if I missed something.
> > > > > > > > > If in future we would need to add some extra information it might
> > > > > > > > > require ABI breakage, though by now I don't envision anything
> > > > particular to
> > > > > > > > add.
> > > > > > > > > Anyway, if there is no objection to go that way, we can try to make
> > > > > > > > > these changes for v2.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Actually, after looking at it more deeply it appears not that easy as I
> > > > thought
> > > > > > it
> > > > > > > > would be :)
> > > > > > > > Below is a very draft version of proposed API additions.
> > > > > > > > I think it avoids ABI breakages right now and provides enough
> > flexibility
> > > > for
> > > > > > > > future extensions (if any).
> > > > > > > > For now, it doesn't address your comments about naming
> > conventions
> > > > > > (_CPU_
> > > > > > > > vs _SYNC_) , etc.
> > > > > > > > but I suppose is comprehensive enough to provide a main idea
> > beyond it.
> > > > > > > > Akhil and other interested parties, please try to review and provide
> > > > feedback
> > > > > > > > ASAP,
> > > > > > > > as related changes would take some time and we still like to hit 19.11
> > > > > > deadline.
> > > > > > > > Konstantin
> > > > > > > >
> > > > > > > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > index bc8da2466..c03069e23 100644
> > > > > > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > > > > > > >   *
> > > > > > > >   * This structure contains data relating to Cipher (Encryption and
> > > > Decryption)
> > > > > > > >   *  use to create a session.
> > > > > > > > + * Actually I was wrong saying that we don't have free space inside
> > > > xforms.
> > > > > > > > + * Making key struct packed (see below) allow us to regain 6B that
> > could
> > > > be
> > > > > > > > + * used for future extensions.
> > > > > > > >   */
> > > > > > > >  struct rte_crypto_cipher_xform {
> > > > > > > >         enum rte_crypto_cipher_operation op;
> > > > > > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > > > > > > >         struct {
> > > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > > -       } key;
> > > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +         * offset for cipher to start within user provided data buffer.
> > > > > > > > +        * Fan suggested another (and less space consuming way) -
> > > > > > > > +         * reuse iv.offset space below, by changing:
> > > > > > > > +        * struct {uint16_t offset, length;} iv;
> > > > > > > > +        * to uunamed union:
> > > > > > > > +        * union {
> > > > > > > > +        *      struct {uint16_t offset, length;} iv;
> > > > > > > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > > > > > > +        * };
> > > > > > > > +        * Both approaches seems ok to me in general.
> > > > > > >
> > > > > > > No strong opinions here. OK with this one.
> > > > > > >
> > > > > > > > +        * Comments/suggestions are welcome.
> > > > > > > > +         */
> > > > > > > > +       uint16_t offset;
> > > > > >
> > > > > > After another thought - it is probably a bit better to have offset as a
> > separate
> > > > > > field.
> > > > > > In that case we can use the same xforms to create both type of sessions.
> > > > > ok
> > > > > >
> > > > > > > > +
> > > > > > > > +       uint8_t reserved1[4];
> > > > > > > > +
> > > > > > > >         /**< Cipher key
> > > > > > > >          *
> > > > > > > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation,
> > > > key.data
> > > > > > will
> > > > > > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > > > > > > >         struct {
> > > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > > -       } key;
> > > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > >         /**< Authentication key data.
> > > > > > > >          * The authentication key length MUST be less than or equal to
> > the
> > > > > > > >          * block size of the algorithm. It is the callers responsibility to
> > > > > > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > > > > > > >          * (for example RFC 2104, FIPS 198a).
> > > > > > > >          */
> > > > > > > >
> > > > > > > > +       uint8_t reserved1[6];
> > > > > > > > +
> > > > > > > >         struct {
> > > > > > > >                 uint16_t offset;
> > > > > > > >                 /**< Starting point for Initialisation Vector or Counter,
> > > > > > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > > > > > > >         struct {
> > > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > > -       } key;
> > > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > > +
> > > > > > > > +       /** offset for cipher to start within data buffer */
> > > > > > > > +       uint16_t cipher_offset;
> > > > > > > > +
> > > > > > > > +       uint8_t reserved1[4];
> > > > > > > >
> > > > > > > >         struct {
> > > > > > > >                 uint16_t offset;
> > > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > index e175b838c..c0c7bfed7 100644
> > > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > @@ -1272,6 +1272,101 @@ void *
> > > > > > > >  rte_cryptodev_sym_session_get_user_data(
> > > > > > > >                                         struct rte_cryptodev_sym_session *sess);
> > > > > > > >
> > > > > > > > +/*
> > > > > > > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > > > > > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > > > > > > + * introduce an extentsion to it via new fully opaque
> > > > > > > > + * struct rte_crypto_cpu_sym_session and additional related API.
> > > > > > >
> > > > > > >
> > > > > > > What all things do we need to squeeze?
> > > > > > > In this proposal I do not see the new struct cpu_sym_session  defined
> > here.
> > > > > >
> > > > > > The plan is to have it totally opaque to the user, i.e. just:
> > > > > > struct rte_crypto_cpu_sym_session;
> > > > > > in public header files.
> > > > > >
> > > > > > > I believe you will have same lib API/struct for cpu_sym_session  and
> > > > > > sym_session.
> > > > > >
> > > > > > I thought about such way, but there are few things that looks clumsy to
> > me:
> > > > > > 1. Right now there is no 'type' (or so) field inside
> > rte_cryptodev_sym_session,
> > > > > > so it is not possible to easy distinguish what session do you have:
> > lksd_sym or
> > > > > > cpu_sym.
> > > > > > In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we
> > can
> > > > add
> > > > > > some extra field
> > > > > > here, but in that case  we wouldn't be able to use the same xform for
> > both
> > > > > > lksd_sym or cpu_sym
> > > > > > (which seems really plausible thing for me).
> > > > > > 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary
> > for
> > > > > > rte_crypto_cpu_sym_session:
> > > > > > sess_data[], opaque_data, user_data, nb_drivers.
> > > > > > All that consumes space, that could be used somewhere else instead.
> > > > > > 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> > > > > > breakages I can't foresee right now.
> > > > > > From other side - if we'll add new functions/structs for cpu_sym_session
> > we
> > > > can
> > > > > > mark it
> > > > > > and keep it for some time as experimental, so further changes (if needed)
> > > > would
> > > > > > still be possible.
> > > > > >
> > > > >
> > > > > OK let us assume that you have a separate structure. But I have a few
> > queries:
> > > > > 1. how can multiple drivers use a same session
> > > >
> > > > As a short answer: they can't.
> > > > It is pretty much the same approach as with rte_security - each device needs
> > to
> > > > create/init its own session.
> > > > So upper layer would need to maintain its own array (or so) for such case.
> > > > Though the question is why would you like to have same session over
> > multiple
> > > > SW backed devices?
> > > > As it would be anyway just a synchronous function call that will be executed
> > on
> > > > the same cpu.
> > >
> > > I may have single FAT tunnel which may be distributed over multiple
> > > Cores, and each core is affined to a different SW device.
> >
> > If it is pure SW, then we don't need multiple devices for such scenario.
> > Device in that case is pure abstraction that we can skip.
> 
> Yes agreed, but that liberty is given to the application whether it need multiple
> devices with single queue or a single device with multiple queues.
> I think that independence should not be broken in this new API.
> >
> > > So a single session may be accessed by multiple devices.
> > >
> > > One more example would be depending on packet sizes, I may switch between
> > > HW/SW PMDs with the same session.
> >
> > Sure, but then we'll have multiple sessions.
> 
> No, the session will be same and it will have multiple private data for each of the PMD.
> 
> > BTW, we have same thing now - these private session pointers are just stored
> > inside the same rte_crypto_sym_session.
> > And if user wants to support this model, he would also need to store <dev_id,
> > queue_id>
> > pair for each HW device anyway.
> 
> Yes agreed, but how is that thing happening in your new struct, you cannot support that.

User can store all these info in his own struct.
That's exactly what we have right now.
Let say ipsec-secgw has to store for each IPsec SA:
pointer to crypto-session and/or pointer to security session
plus (for lookaside-devices) cdev_id_qp that allows it to extract
dev_id + queue_id information.
As I understand that works for now, as each ipsec_sa uses only one
dev+queue. Though if someone would like to use multiple devices/queues
for the same SA - he would need to have an array of these <dev+queue> pairs.
So even right now rte_cryptodev_sym_session is not self-consistent and
requires extra information to be maintained by user. 

> 
> >
> > >
> > > >
> > > > > 2. Can somebody use the scheduler pmd for scheduling the different type
> > of
> > > > payloads for the same session?
> > > >
> > > > In theory yes.
> > > > Though for that scheduler pmd should have inside it's
> > > > rte_crypto_cpu_sym_session an array of pointers to
> > > > the underlying devices sessions.
> > > >
> > > > >
> > > > > With your proposal the APIs would be very specific to your use case only.
> > > >
> > > > Yes in some way.
> > > > I consider that API specific for SW backed crypto PMDs.
> > > > I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will benefit
> > > > from it.
> > > > Current crypto-op API is very much HW oriented.
> > > > Which is ok, that's for it was intended for, but I think we also need one that
> > > > would be designed
> > > > for SW backed implementation in mind.
> > >
> > > We may re-use your API for HW PMDs as well which do not have requirement
> > of
> > > Crypto-op/mbuf etc.
> > > The return type of your new process API may have a status which say
> > 'processed'
> > > Or can be say 'enqueued'. So if it is  'enqueued', we may have a new API for
> > raw
> > > Bufs dequeue as well.
> > >
> > > This requirement can be for any hardware PMDs like QAT as well.
> >
> > I don't think it is a good idea to extend this API for async (lookaside) devices.
> > You'll need to:
> >  - provide dev_id and queue_id for each process(enqueue) and dequeuer
> > operation.
> >  - provide IOVA for all buffers passing to that function (data buffers, digest, IV,
> > aad).
> >  - On dequeue provide some way to associate dequed data and digest buffers
> > with
> >    crypto-session that was used  (and probably with mbuf).
> >  So most likely we'll end up with another just version of our current crypto-op
> > structure.
> > If you'd like to get rid of mbufs dependency within current crypto-op API that
> > understandable,
> > but I don't think we should have same API for both sync (CPU) and async
> > (lookaside) cases.
> > It doesn't seem feasible at all and voids whole purpose of that patch.
> 
> At this moment we are not much concerned about the dequeue API and about the
> HW PMD support. It is just that the new API should be generic enough to be used in
> some future scenarios as well. I am just highlighting the possible usecases which can
> be there in future.

Sorry, but I strongly disagree with such approach.
We should stop adding/modifying API 'just in case' and because 'it might be useful for some future HW'.
Inside DPDK we already do have too many dev level APIs without any implementations.
That's quite bad practice and very dis-orienting for end-users.
I think to justify API additions/changes we need at least one proper implementation for it,
or at least some strong evidence that people are really committed to support it in nearest future.
BTW, that what TB agreed on, nearly a year ago.  

This new API (if we'll go ahead with it of course) would stay experimental for some time anyway
to make sure we don't miss anything needed (I think for about a year time-frame).
So if you guys *really* want to extend it support _async_ devices too -
I am open for modifications/additions here.
Though personally I think such addition would over-complicate things and we'll end up with
another reincarnation of current crypto-op.
We actually discussed it internally, and decided to drop that idea because of that.  
Again, my opinion - for lookaside devices it might be better to try to optimize
current crypto-op path (remove mbuf requirement, probably add  ability to
group by session on enqueue/dequeue, etc.). 

> 
> What is the issue that you face in making a dev-op for this new API. Do you see any
> performance impact with that?

There are two main things:
1. user would need to maintain and provide for each process() call dev_id+queue_id.
That's means extra (and totally unnecessary for SW) overhead. 
2. yes I would expect some perf overhead too - it would be extra call or branch.
Again as it would be data-dependency - most likely cpu wouldn't be able to  pipeline
it efficiently:

rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id, rte_crypto_sym_session *ses, ...)
{
     struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
     return (*dev->process)(sess->data[dev->driver_id, ...);
}

driver_specific_process(driver_specific_sym_session *sess)
{
   return sess->process(sess, ...) ;
}

I didn't make any exact measurements but sure it would be slower than just:
session_udata->process(session->udata->sess, ...);
Again it would be much more noticeable on low end cpus.
Let say here: http://mails.dpdk.org/archives/dev/2019-September/144350.html
Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev contents -
I suppose we would have something similar here.
I do realize that in majority of cases crypto is more expensive then RX/TX, but still. 

If it would be a really unavoidable tradeoff (support already existing API, or so)
I wouldn't mind, but I don't see any real need for it right now.

> 
> >
> > > That is why a dev-ops would be a better option.
> > >
> > > >
> > > > > When you would add more functionality to this sync API/struct, it will end
> > up
> > > > being the same API/struct.
> > > > >
> > > > > Let us  see how close/ far we are from the existing APIs when the actual
> > > > implementation is done.
> > > > >
> > > > > > > I am not sure if that would be needed.
> > > > > > > It would be internal to the driver that if synchronous processing is
> > > > > > supported(from feature flag) and
> > > > > > > Have relevant fields in xform(the newly added ones which are packed
> > as
> > > > per
> > > > > > your suggestions) set,
> > > > > > > It will create that type of session.
> > > > > > >
> > > > > > >
> > > > > > > > + * Main points:
> > > > > > > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > > > > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > > > > > + *   new sync API is new one and probably would require extra
> > changes.
> > > > > > > > + *   Having it as a new one allows to mark it as experimental, without
> > > > > > > > + *   affecting existing one.
> > > > > > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > > > > > + *   to the PMD writers and again allows to avoid ABI breakages in
> > future.
> > > > > > > > + * - process() function per set of xforms
> > > > > > > > + *   allows to expose different process() functions for different
> > > > > > > > + *   xform combinations. PMD writer can decide, does he wants to
> > > > > > > > + *   push all supported algorithms into one process() function,
> > > > > > > > + *   or spread it across several ones.
> > > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > > >
> > > > > > > Which process function should be chosen is internal to PMD, how
> > would
> > > > that
> > > > > > info
> > > > > > > be visible to the application or the library. These will get stored in the
> > > > session
> > > > > > private
> > > > > > > data. It would be upto the PMD writer, to store the per session process
> > > > > > function in
> > > > > > > the session private data.
> > > > > > >
> > > > > > > Process function would be a dev ops just like enc/deq operations and it
> > > > should
> > > > > > call
> > > > > > > The respective process API stored in the session private data.
> > > > > >
> > > > > > That model (via devops) is possible, but has several drawbacks from my
> > > > > > perspective:
> > > > > >
> > > > > > 1. It means we'll need to pass dev_id as a parameter to process() function.
> > > > > > Though in fact dev_id is not a relevant information for us here
> > > > > > (all we need is pointer to the session and pointer to the fuction to call)
> > > > > > and I tried to avoid using it in data-path functions for that API.
> > > > >
> > > > > You have a single vdev, but someone may have multiple vdevs for each
> > thread,
> > > > or may
> > > > > Have same dev with multiple queues for each core.
> > > >
> > > > That's fine. As I said above it is a SW backed implementation.
> > > > Each session has to be a separate entity that contains all necessary
> > information
> > > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > > Plus we need the actual function pointer to call.
> > > > I just don't see what for we need a dev_id in that situation.
> > >
> > > To iterate the session private data in the session.
> > >
> > > > Again, here we don't need care about queues and their pinning to cores.
> > > > If let say someone would like to process buffers from the same IPsec SA on 2
> > > > different cores in parallel, he can just create 2 sessions for the same xform,
> > > > give one to thread #1  and second to thread #2.
> > > > After that both threads are free to call process(this_thread_ses, ...) at will.
> > >
> > > Say you have a 16core device to handle 100G of traffic on a single tunnel.
> > > Will we make 16 sessions with same parameters?
> >
> > Absolutely same question we can ask for current crypto-op API.
> > You have lookaside crypto-dev with 16 HW queues, each queue is serviced by
> > different CPU.
> > For the same SA, do you need a separate session per queue, or is it ok to reuse
> > current one?
> > AFAIK, right now this is a grey area not clearly defined.
> > For crypto-devs I am aware - user can reuse the same session (as PMD uses it
> > read-only).
> > But again, right now I think it is not clearly defined and is implementation
> > specific.
> 
> User can use the same session, that is what I am also insisting, but it may have separate
> Session private data. Cryptodev session create API provide that functionality and we can
> Leverage that.

rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means we can't use
the same rte_cryptodev_sym_session to hold sessions for both sync and async mode
for the same device. Off course we can add a hard requirement that any driver that wants to
support process() has to create sessions that can handle both  process and enqueue/dequeue,
but then again  what for to create such overhead?

BTW, to be honest, I don't consider current rte_cryptodev_sym_session construct for multiple device_ids:
__extension__ struct {
                void *data;
                uint16_t refcnt;
        } sess_data[0];
        /**< Driver specific session material, variable size */

as an advantage.
It looks too error prone for me:
1. Simultaneous session initialization/de-initialization for devices with the same driver_id is not possible.
2. It assumes that all device driver will be loaded before we start to create session pools.

Right now it seems ok, as no-one requires such functionality, but I don't know how it will be in future.
For me rte_security session model, where for each security context user have to create new session
looks much more robust.
 
> 
> BTW, I can see a v2 to this RFC which is still based on security library.

Yes, v2 was concentrated on fixing found issues, some code restructuring, 
i.e. - changes that would be needed anyway whatever API aproach we'll choose.

> When do you plan
> To submit the patches for crypto based APIs. We have RC1 merge deadline for this
> patchset on 21st Oct.

We'd like to start working on it ASAP, but it seems we still have a major disagreement
about how this crypto-dev API should look like.  
Which makes me think - should we return to our original proposal via rte_security?
It still looks to me like clean and straightforward way to enable this new API,
and probably wouldn't cause that much controversy.
What do you think? 

> 
> As per my understanding you only need a new dev-op for sync support. Session APIs
> Will remain the same and you will have some extra fields packed in xform structs.
> 
> The PMD will need to maintain a pointer to the per session process function while creating
> Session and will be used by the dev-op API at runtime without any extra check at runtime.
> 
> >
> > >
> > > >
> > > > >
> > > > > > 2. As you pointed in that case it will be just one process() function per
> > device.
> > > > > > So if PMD would like to have several process() functions for different type
> > of
> > > > > > sessions
> > > > > > (let say one per alg) first thing it has to do inside it's process() - read
> > session
> > > > data
> > > > > > and
> > > > > > based on that, do a jump/call to particular internal sub-routine.
> > > > > > Something like:
> > > > > > driver_id = get_pmd_driver_id();
> > > > > > priv_ses = ses->sess_data[driver_id];
> > > > > > Then either:
> > > > > > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> > > > > > OR
> > > > > > priv_ses->process(priv_sess, ...);
> > > > > >
> > > > > > to select and call the proper function.
> > > > > > Looks like totally unnecessary overhead to me.
> > > > > > Though if we'll have ability to query/extract some sort session_ops based
> > on
> > > > the
> > > > > > xform -
> > > > > > we can avoid  this extra de-refererence+jump/call thing.
> > > > >
> > > > > What is the issue in the priv_ses->process(); approach?
> > > >
> > > > Nothing at all.
> > > > What I am saying that schema with dev_ops
> > > > dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...)
> > > >    |
> > > >    |-> priv_ses->process(...)
> > > >
> > > > Has bigger overhead then just:
> > > > process(ses,...);
> > > >
> > > > So what for to introduce extra-level of indirection here?
> > >
> > > Explained above.
> > >
> > > >
> > > > > I don't understand what are you saving by not doing this.
> > > > > In any case you would need to identify which session correspond to which
> > > > process().
> > > >
> > > > Yes, sure, but I think we can make user to store information that relationship,
> > > > in a way he likes: store process() pointer for each session, or group sessions
> > > > that share the same process() somehow, or...
> > >
> > > So whatever relationship that user will make and store will make its life
> > complicated.
> > > If we can hide that information in the driver, then what is the issue in that and
> > user
> > > Will not need to worry. He would just call the process() and driver will choose
> > which
> > > Process need to be called.
> >
> > Driver can do that at config/init time.
> > Then at run-time we can avoid that choice at all and call already chosen function.
> >
> > >
> > > I think we should have a POC around this and see the difference in the cycle
> > count.
> > > IMO it would be negligible and we would end up making a generic API set
> > which
> > > can be used by others as well.
> > >
> > > >
> > > > > For that you would be doing it somewhere in your data path.
> > > >
> > > > Why at data-path?
> > > > Only once at session creation/initialization time.
> > > > Or might be even once per group of sessions.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > I am not sure if you would need a new session init API for this as
> > nothing
> > > > would
> > > > > > be visible to
> > > > > > > the app or lib.
> > > > > > >
> > > > > > > > + * - Not storing process() pointer inside the session -
> > > > > > > > + *   Allows user to choose does he want to store a process() pointer
> > > > > > > > + *   per session, or per group of sessions for that device that share
> > > > > > > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > > > > > > + *   plus allows us to keep cpu_sym_session totally opaque, see
> > above.
> > > > > > >
> > > > > > > If multiple sessions need to be processed via the same process function,
> > > > > > > PMD would save the same process in all the sessions, I don't think there
> > > > would
> > > > > > > be any perf overhead with that.
> > > > > >
> > > > > > I think it would, see above.
> > > > > >
> > > > > > >
> > > > > > > > + * Sketched usage model:
> > > > > > > > + * ....
> > > > > > > > + * /* control path, alloc/init session */
> > > > > > > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > > > > > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > > > > > > + * rte_crypto_cpu_sym_process_t process =
> > > > > > > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > > > > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > > > > > > + * ...
> > > > > > > > + * /* data-path*/
> > > > > > > > + * process(ses, ....);
> > > > > > > > + * ....
> > > > > > > > + * /* control path, termiante/free session */
> > > > > > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +/**
> > > > > > > > + * vector structure, contains pointer to vector array and the length
> > > > > > > > + * of the array
> > > > > > > > + */
> > > > > > > > +struct rte_crypto_vec {
> > > > > > > > +       struct iovec *vec;
> > > > > > > > +       uint32_t num;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Data-path bulk process crypto function.
> > > > > > > > + */
> > > > > > > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > > > > > > +               struct rte_crypto_cpu_sym_session *sess,
> > > > > > > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > > > > > > +               void *digest[], int status[], uint32_t num);
> > > > > > > > +/*
> > > > > > > > + * for given device return process function specific to input xforms
> > > > > > > > + * on error - return NULL and set rte_errno value.
> > > > > > > > + * Note that for same input xfroms for the same device should
> > return
> > > > > > > > + * the same process function.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +rte_crypto_cpu_sym_process_t
> > > > > > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Return required session size in bytes for given set of xforms.
> > > > > > > > + * if xforms == NULL, then return the max possible session size,
> > > > > > > > + * that would fit session for any supported by the device algorithm.
> > > > > > > > + * if CPU mode is not supported at all, or requeted in xform
> > > > > > > > + * algorithm is not supported, then return -ENOTSUP.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +int
> > > > > > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Initialize session.
> > > > > > > > + * It is caller responsibility to allocate enough space for it.
> > > > > > > > + * See rte_crypto_cpu_sym_session_size above.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +__rte_experimental
> > > > > > > > +void
> > > > > > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > > > +
> > > > > > > > +
> > > > > > > >  #ifdef __cplusplus
> > > > > > > >  }
> > > > > > > >  #endif
> > > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > index defe05ea0..ed7e63fab 100644
> > > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > @@ -310,6 +310,20 @@ typedef void
> > > > > > (*cryptodev_sym_free_session_t)(struct
> > > > > > > > rte_cryptodev *dev,
> > > > > > > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev
> > > > *dev,
> > > > > > > >                 struct rte_cryptodev_asym_session *sess);
> > > > > > > >
> > > > > > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct
> > rte_cryptodev
> > > > > > *dev,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct
> > rte_cryptodev
> > > > > > *dev,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct
> > rte_cryptodev
> > > > > > *dev,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > > > +
> > > > > > > > +typedef rte_crypto_cpu_sym_process_t
> > > > > > (*cryptodev_cpu_sym_session_func_t)
> > > > > > > > (
> > > > > > > > +                       struct rte_cryptodev *dev,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > >  /** Crypto device operations function pointer table */
> > > > > > > >  struct rte_cryptodev_ops {
> > > > > > > >         cryptodev_configure_t dev_configure;    /**< Configure device.
> > */
> > > > > > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > > > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > > >         cryptodev_asym_free_session_t asym_session_clear;
> > > > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > > > +
> > > > > > > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > > > > > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > > > > > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > > > > > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > > > > > > >  };
> > > > > > > >
> > > > > > > >
> > > > > > > >


  reply index

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API Fan Zhang
2019-09-04 10:32   ` Akhil Goyal
2019-09-04 13:06     ` Zhang, Roy Fan
2019-09-06  9:01       ` Akhil Goyal
2019-09-06 13:12         ` Zhang, Roy Fan
2019-09-10 11:25           ` Akhil Goyal
2019-09-11 13:01             ` Ananyev, Konstantin
2019-09-06 13:27         ` Ananyev, Konstantin
2019-09-10 10:44           ` Akhil Goyal
2019-09-11 12:29             ` Ananyev, Konstantin
2019-09-12 14:12               ` Akhil Goyal
2019-09-16 14:53                 ` Ananyev, Konstantin
2019-09-16 15:08                   ` Ananyev, Konstantin
2019-09-17  6:02                   ` Akhil Goyal
2019-09-18  7:44                     ` Ananyev, Konstantin
2019-09-25 18:24                       ` Ananyev, Konstantin
2019-09-27  9:26                         ` Akhil Goyal
2019-09-30 12:22                           ` Ananyev, Konstantin
2019-09-30 13:43                             ` Akhil Goyal
2019-10-01 14:49                               ` Ananyev, Konstantin
2019-10-03 13:24                                 ` Akhil Goyal
2019-10-07 12:53                                   ` Ananyev, Konstantin
2019-10-09  7:20                                     ` Akhil Goyal
2019-10-09 13:43                                       ` Ananyev, Konstantin [this message]
2019-10-11 13:23                                         ` Akhil Goyal
2019-10-13 23:07                                           ` Zhang, Roy Fan
2019-10-14 11:10                                             ` Ananyev, Konstantin
2019-10-15 15:02                                               ` Akhil Goyal
2019-10-16 13:04                                                 ` Ananyev, Konstantin
2019-10-15 15:00                                             ` Akhil Goyal
2019-10-16 22:07                                           ` Ananyev, Konstantin
2019-10-17 12:49                                             ` Ananyev, Konstantin
2019-10-18 13:17                                             ` Akhil Goyal
2019-10-21 13:47                                               ` Ananyev, Konstantin
2019-10-22 13:31                                                 ` Akhil Goyal
2019-10-22 17:44                                                   ` Ananyev, Konstantin
2019-10-22 22:21                                                     ` Ananyev, Konstantin
2019-10-23 10:05                                                     ` Akhil Goyal
2019-10-30 14:23                                                       ` Ananyev, Konstantin
2019-11-01 13:53                                                         ` Akhil Goyal
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 2/9] crypto/aesni_gcm: add rte_security handler Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 3/9] app/test: add security cpu crypto autotest Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 4/9] app/test: add security cpu crypto perftest Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 5/9] crypto/aesni_mb: add rte_security handler Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 6/9] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 7/9] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 8/9] ipsec: add rte_security cpu_crypto action support Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 9/9] examples/ipsec-secgw: add security " Fan Zhang
2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API Fan Zhang
2019-09-18 12:45     ` Ananyev, Konstantin
2019-09-29  6:00     ` Hemant Agrawal
2019-09-29 16:59       ` Ananyev, Konstantin
2019-09-30  9:43         ` Hemant Agrawal
2019-10-01 15:27           ` Ananyev, Konstantin
2019-10-02  2:47             ` Hemant Agrawal
2019-09-06 13:13   ` [dpdk-dev] [PATCH 02/10] crypto/aesni_gcm: add rte_security handler Fan Zhang
2019-09-18 10:24     ` Ananyev, Konstantin
2019-09-06 13:13   ` [dpdk-dev] [PATCH 03/10] app/test: add security cpu crypto autotest Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 04/10] app/test: add security cpu crypto perftest Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 05/10] crypto/aesni_mb: add rte_security handler Fan Zhang
2019-09-18 15:20     ` Ananyev, Konstantin
2019-09-06 13:13   ` [dpdk-dev] [PATCH 06/10] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 07/10] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 08/10] ipsec: add rte_security cpu_crypto action support Fan Zhang
2019-09-26 23:20     ` Ananyev, Konstantin
2019-09-27 10:38     ` Ananyev, Konstantin
2019-09-06 13:13   ` [dpdk-dev] [PATCH 09/10] examples/ipsec-secgw: add security " Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 10/10] doc: update security cpu process description Fan Zhang
2019-09-09 12:43   ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Aaron Conole
2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 01/10] security: introduce CPU Crypto action type and API Fan Zhang
2019-10-08 13:42       ` Ananyev, Konstantin
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 02/10] crypto/aesni_gcm: add rte_security handler Fan Zhang
2019-10-08 13:44       ` Ananyev, Konstantin
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 03/10] app/test: add security cpu crypto autotest Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 04/10] app/test: add security cpu crypto perftest Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 05/10] crypto/aesni_mb: add rte_security handler Fan Zhang
2019-10-08 16:23       ` Ananyev, Konstantin
2019-10-09  8:29       ` Ananyev, Konstantin
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 06/10] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 07/10] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 08/10] ipsec: add rte_security cpu_crypto action support Fan Zhang
2019-10-08 23:28       ` Ananyev, Konstantin
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 09/10] examples/ipsec-secgw: add security " Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 10/10] doc: update security cpu process description Fan Zhang

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB977258019197446B@irsmsx105.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=akhil.goyal@nxp.com \
    --cc=anoobj@marvell.com \
    --cc=declan.doherty@intel.com \
    --cc=dev@dpdk.org \
    --cc=pablo.de.lara.guarch@intel.com \
    --cc=roy.fan.zhang@intel.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox