DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Gujjar, Abhinandan S" <abhinandan.gujjar@intel.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Doherty, Declan" <declan.doherty@intel.com>
Cc: "jerinj@marvell.com" <jerinj@marvell.com>,
	"Akhil.goyal@nxp.com" <akhil.goyal@nxp.com>,
	"Vangati, Narender" <narender.vangati@intel.com>, nd <nd@arm.com>,
	nd <nd@arm.com>
Subject: Re: [dpdk-dev] [v2 1/2] cryptodev: support enqueue callback functions
Date: Fri, 9 Oct 2020 14:40:29 +0000	[thread overview]
Message-ID: <BYAPR11MB3301CB4401FF52F3E8737C229A080@BYAPR11MB3301.namprd11.prod.outlook.com> (raw)
In-Reply-To: <MWHPR11MB18389204DD0D50F8D17D888AE80B0@MWHPR11MB1838.namprd11.prod.outlook.com>

Hi Abhinandan,

> 
> Hi Konstantin & Honnappa,
> 
> Thanks for all the inputs and feedback.
> 
> @Ananyev, Konstantin,
> I have measured the perf with and without callback on xeon. Here are the numbers:
> 
> ./app/dpdk-test-crypto-perf -l 6-7 --vdev="crypto_openssl0,socket_id=0,max_nb_sessions=128" -- --ptest throughput --devtype
> crypto_openssl --optype cipher-then-auth --cipher-algo aes-cbc --cipher-op encrypt --cipher-key-sz 16 --auth-algo sha1-hmac --auth-op
> generate --auth-key-sz 64 --digest-sz 12 --total-ops 10000000 --burst-sz 32 --buffer-sz 64
> 
> With callback(+ RCU - totally opaque to data-plane threads)
>     lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed Deq        MOps        Gbps  Cycles/Buf
>            7          64          32                10000000    10000000           0           0                       0.8129      0.4162     2694.09
>            7          64          32                10000000    10000000           0           0                       0.8152      0.4174     2686.31
>            7          64          32                10000000    10000000           0           0                       0.8198      0.4197     2671.48
> 
> Without callback:
>     lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed Deq        MOps        Gbps  Cycles/Buf
> 
>            7          64          32               10000000    10000000           0           0                        0.8234      0.4216     2659.81
>            7          64          32               10000000    10000000           0           0                        0.8247      0.4222     2655.63
>            7          64          32               10000000    10000000           0           0                        0.8123      0.4159     2695.90


Just to cofirm:
You implemented crypto enqueue callbacks using RCU QSBR online /offline
(as suggested below) and numbers above are for:
1) callback code in place and some dummy callback installed
2) callback code in place but no callbacks installed 
  
Is my understanding correct here?
Thanks
Konstantin

 
> 
> Regards
> Abhinandan
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Sent: Tuesday, September 29, 2020 2:33 PM
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gujjar,
> > Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org; Doherty, Declan
> > <declan.doherty@intel.com>
> > Cc: jerinj@marvell.com; Akhil.goyal@nxp.com; Vangati, Narender
> > <narender.vangati@intel.com>; nd <nd@arm.com>; nd <nd@arm.com>
> > Subject: RE: [dpdk-dev] [v2 1/2] cryptodev: support enqueue callback functions
> >
> >
> > >
> > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > +#ifdef RTE_CRYPTODEV_CALLBACKS int
> > > > > > > > > > > > > +rte_cryptodev_rcu_qsbr_add(uint8_t dev_id, struct
> > > > > > > > > > > > > +rte_rcu_qsbr
> > > > > > > > > > > > > +*qsbr) {
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +	struct rte_cryptodev *dev;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +	if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
> > > > > > > > > > > > > +		CDEV_LOG_ERR("Invalid dev_id=%" PRIu8,
> > > > dev_id);
> > > > > > > > > > > > > +		return -EINVAL;
> > > > > > > > > > > > > +	}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +	dev = &rte_crypto_devices[dev_id];
> > > > > > > > > > > > > +	dev->qsbr = qsbr;
> > > > > > > > > > > > > +	return 0;
> > > > > > > > > > > > > +}
> > > > > > > > > > > >
> > > > > > > > > > > > So if I understand your patch correctly you propose
> > > > > > > > > > > > a new working model for
> > > > > > > > > > > > crypto-devs:
> > > > > > > > > > > > 1. Control-plane has to allocate/setup rcu_qsbr and
> > > > > > > > > > > > do rte_cryptodev_rcu_qsbr_add().
> > > > > > > > > > > > 2. Data-plane has somehow to obtain pointer to that
> > > > > > > > > > > > rcu_qsbr and wrap
> > > > > > > > > > > > cryptodev_enqueue()
> > > > > > > > > > > >    with rcu_qsbr_quiescent()  or
> > > > > > rcu_qsbr_online()/rcu_qsbr_offline().
> > > > > > > > > > > Yes. I think, it is not a new model. It is same as RCU
> > > > > > > > > > > integration with
> > > > > > > > LPM.
> > > > > > > > > > > Please refer: https://patches.dpdk.org/cover/73673/
> > > > > > > > > >
> > > > > > > > > > I am talking about new working model for crypto-dev
> > > > > > enqueue/dequeue.
> > > > > > > > > > As I said above now it becomes data-plane thread responsibility
> > to:
> > > > > > > > > >  -somehow to obtain pointer to that rcu_qsbr for each
> > > > > > > > > > cryptodev it is
> > > > > > > > using.
> > > > > > > > > >  -call rcu sync functions (quiescent/online/offline) on
> > > > > > > > > > a regular
> > > > basis.
> > > > > > > > > It is not on regular basis. When data plane comes up, they
> > > > > > > > > report
> > > > online.
> > > > > > > > > They report quiescent when they are done with critical
> > > > > > > > > section or shared
> > > > > > > > structure.
> > > > > > > >
> > > > > > > > I understand that, but it means all existing apps have to be
> > > > > > > > changed that
> > > > > > way.
> > > > > > > >
> > > > > > > > > All though, there is some dataplane changes involved here,
> > > > > > > > > I don't think, it
> > > > > > > > is major.
> > > > > > > >
> > > > > > > > I still think our goal here should be to make no visible
> > > > > > > > changes to the dataplane.
> > > > > > > > I.E. all necessary data-plane changes need to be hidden
> > > > > > > > inside CB invocation part.
> > > > > > > Please note that this is being implemented using the memory
> > > > > > > reclamation framework documented at
> > > > > > > https://doc.dpdk.org/guides/prog_guide/rcu_lib.html#resource-r
> > > > > > > ecla
> > > > > > > mati
> > > > > > > on-framework-for-dpdk
> > > > > > >
> > > > > > > While using RCU there are couple of trade-offs that
> > > > > > > applications have to
> > > > > > consider:
> > > > > > > 1) Performance - reporting the quiescent state too often
> > > > > > > results in performance impact on data plane
> > > > > > > 2) Amount of outstanding memory to reclaim - reporting less
> > > > > > > often results in more outstanding memory to reclaim
> > > > > > >
> > > > > > > Hence, the quiescent state reporting is left to the application.
> > > > > > > The application decides how often it reports the quiescent
> > > > > > > state and has control
> > > > > > over the data plane performance and the outstanding memory to
> > reclaim.
> > > > > > >
> > > > > > > When you say "new working model for crypto-dev
> > > > > > > enqueue/dequeue",
> > > > > > >
> > > > > > > 1) are you comparing these with existing crypto-dev
> > > > > > > enqueue/dequeue
> > > > > > APIs? If yes, these are new APIs, it is not breaking anything.
> > > > > > > 2) are you comparing these with existing call back functions
> > > > > > > in ethdev enqueue/dequeue APIs? If yes, agree that this is a
> > > > > > > new model. But, it is
> > > > > > possible to support what ethdev supports along with the RCU
> > > > > > method used in this patch.
> > > > > >
> > > > > > What I am talking about:
> > > > > >
> > > > > > Existing cryptodev enqueue/dequeue model doesn't require for the
> > > > > > user to manage any RCU QSBR state manually.
> > > > > > I believe that addition of ability to add/remove enqueue/dequeue
> > > > > > callbacks shouldn't change existing working model.
> > > > > > I think that adding/removing such callbacks has to be opaque to
> > > > > > the user DP code and shouldn't require user to change it. Same
> > > > > > as we have now for ethdev callback implementation.
> > > > > The ethdev callback implementation conveniently leaves the problem
> > > > > of
> > > > freeing memory to the user to resolve, it does not handle the issue.
> > > > > Hence, it "looks" to be opaque to the DP code. However, if the
> > > > > application has to implement a safe way to free the call back
> > > > > memory, its
> > > > DP is affected based on call backs are being used or not.
> > > >
> > > > Yes, I think that's big drawback in initial ethdev callback
> > > > implementation - it simply ignores DP/CP sync problem completely.
> > > > Though I think it is possible to have both here:
> > > >  keep callback "opaque" to DP code and provide some sync mechanism
> > > > between DP/CP.
> > > > Hopefully one day we can fix ethdev callbacks too.
> > > The solution we develop can be used in ethdev too.
> > >
> > > >
> > > > > > I think that forcing DP code to be aware that callbacks are
> > > > > > present or not and to modify its behaviour depending on that
> > > > > > nearly voids the purpose of having callbacks at all.
> > > > > > In that case DP can just invoke callback function directly from
> > > > > > it's
> > > > codepath .
> > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > Note that now data-plane thread would have to do that
> > > > > > > > > > always
> > > > > > > > > > - even if there are now callbacks installed for that
> > > > > > > > > > cryptodev queue
> > > > > > right now.
> > > > > > > > > > All that changes behaviour of existing apps and I
> > > > > > > > > > presume would reduce adoption of  that fature.
> > > > > > > If I understand this correct, you are talking about a case
> > > > > > > where in the application might be registering/unregistering
> > > > > > > multiple times during its lifetime. In this case, yes, the
> > > > > > > application might be reporting the
> > > > > > quiescent state even when it has not registered the call backs.
> > > > > > But, it has the flexibility to not report it if it implements additional logic.
> > > > > > > Note that we are assuming that the application has to report
> > > > > > > quiescent state only for using callback functions. Most
> > > > > > > probably the application has
> > > > > > other requirements to use RCU.
> > > > > > > Why not support what is done for ethdev call back functions
> > > > > > > along with
> > > > > > providing RCU method?
> > > > > > >
> > > > > > > > > There is always trade off involved!
> > > > > > > > > In the previous patch, you suggested that some lazy app
> > > > > > > > > may not free up the memory allocated by add cb. For such
> > > > > > > > > apps, this patch has sync mechanism with some additional
> > > > > > > > > cost of CP & DP
> > > > changes.
> > > > > > > >
> > > > > > > > Sigh, it is not about laziness of the app.
> > > > > > > > The problem with current ethedev cb mechanism and yours V1
> > > > > > > > (which was just a clone of it) - CP doesn't know when it is
> > > > > > > > safe after CB removal to free related memory.
> > > > > > > >
> > > > > > > > > > I still think all this callback mechanism should be
> > > > > > > > > > totally opaque to data-plane threads - user shouldn't
> > > > > > > > > > change his app code depending on would some
> > > > > > > > > > enqueue/dequeue callbacks be
> > > > > > installed or not.
> > > > > > > > > I am not sure, how that can be implemented with existing
> > > > > > > > > RCU
> > > > design.
> > > > > > > >
> > > > > > > > As I said below the simplest way - with calling rcu
> > > > > > > > onine/offline inside CB invocation block.
> > > > > > > > That's why I asked you - did you try that approach and what
> > > > > > > > is the perf numbers?
> > > > > > > > I presume with no callbacks installed the perf change should
> > > > > > > > be nearly
> > > > > > zero.
> > > > > > > >
> > > > > > > > > @Honnappa Nagarahalli, Do you have any suggestions?
> > > > > > > Reporting quiescent state in the call back functions has
> > > > > > > several
> > > > > > disadvantages:
> > > > > > > 1) it will have performance impacts and the impacts will
> > > > > > > increase as the
> > > > > > number of data plane threads increase.
> > > > > > > 2) It will require additional configuration parameters to
> > > > > > > control how often the quiescent state is reported to control
> > > > > > > the performance
> > > > impact.
> > > > > > > 3) Does not take advantage of the fact that most probably the
> > > > > > > application is using RCU already
> > > > > > > 4) There are few difficulties as well, please see below.
> > > > > >
> > > > > > I suggested Abhinandan to use RCU library because it is already
> > > > > > there, and I thought it would be good not to re-implement the wheel.
> > > > > > Though if you feel librte_rcu doesn't match that task - fine,
> > > > > > let's do it without librte_rcu.
> > > > > > After all, what we need here - just an atomic ref count per
> > > > > > queue that we are going to increment at entering and leaving
> > > > > > list of callbacks inside enqueue/dequeue.
> > > > > Ok, looks like I missed the point that a queue is used by a single
> > > > > data plane
> > > > thread.
> > > > > Along with ref count increment you need the memory orderings to
> > > > > avoid
> > > > race conditions. These will be the same ones used in RCU.
> > > > > On the control plane, you need to read this counter and poll for
> > > > > the
> > > > counter updates. All this is same cost as in RCU.
> > > >
> > > > Agree.
> > > >
> > > > > To control the cost, you
> > > > > will have to control the rate of quiescent state reporting and
> > > > > might have to
> > > > expose this as a configuration parameter.
> > > > >
> > > > > The other important information you have to consider is if the
> > > > > thread is making any blocking calls, which may be in some other
> > > > > library. The thread is supposed to call rcu_qsbr_thread_offline
> > > > > API before calling a
> > > > blocking call. This allows the RCU to know that this particular
> > > > thread will not report quiescent state. The cryptodev library might not have
> > that information.
> > > > >
> > > > > If you want to go ahead with this design, you can still use RCU
> > > > > with single thread configuration (like you have mentioned below)
> > > > > and hide the
> > > > details from the application.
> > > >
> > > > Yes,  same thought here -  use rcu_qsbr online/offline for DP part
> > > > and hide actual sync details inside callback code.
> > > We can give it a try. If we can have the performance numbers, we can
> > > decide about how to control how often these APIs are called on the data
> > plane.
> >
> > To avoid misunderstanding: I am talking about calling online/offline with every
> > cryptodev_enqueue() traversal over CB list.
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > That seems quite a big change and I don't think it
> > > > > > > > > > > > is acceptable for most users.
> > > > > > > > > > > > From my perspective adding/installing call-backs to
> > > > > > > > > > > > the dev has to be opaque to the data-plane code.
> > > > > > > > > > > > Also note that different callbacks can be installed
> > > > > > > > > > > > by different entities (libs) and might have no idea
> > > > > > > > > > > > about each
> > > > other.
> > > > > > > > > > > > That's why I thought it would be better to make all
> > > > > > > > > > > > this RCU stuff internal inside cryptodev:
> > > > > > > > > > > >     hide all this rcu_qsbr allocation/setup inside
> > > > > > > > > > > > cryptod somehow to
> > > > > > > > > > obtain pointer to that rcu_qsbr ev init/queue setup
> > > > > > > > > > > >     invoke rcu_qsbr_online()/rcu_qsbr_offline()
> > > > > > > > > > > > inside
> > > > > > > > > > cryptodev_enqueue().
> > > > > > > This will bring in the application related information such as
> > > > > > > the thread ID
> > > > > > into the library.
> > > > > >
> > > > > > I don't think it would.
> > > > > > Cryptodev enqueue/dequeue functions are not supposed to be
> > > > > > thread safe (same as rx/tx burst).
> > > > > > So we can always use RCU with just one thread(thread_id = 0).
> > > > > Agree, the memory that needs to be freed is accessed by a single
> > > > > thread
> > > > on the data plane. RCU with one thread would suffice.
> > > > >
> > > > > > But as I said above - if you feel RCU lib is an overhead here,
> > > > > > that's fine - I think it would be easy enough to do without librte_rcu.
> > > > > >
> > > > > > > If the same API calls are being made from multiple data plane
> > > > > > > threads, you need a way to configure that information to the
> > > > > > > library. So, it is better to leave those details for the application to
> > handle.
> > > > > > >
> > > > > > > > > > > I have already tried exploring above stuffs. There are
> > > > > > > > > > > too many
> > > > > > > > constraints.
> > > > > > > > > > > The changes don't fit in, as per RCU design.
> > > > > > > > > >
> > > > > > > > > > Hmm could you be more specific here - what constraints
> > > > > > > > > > are you referring to?
> > > > > > > > > >
> > > > > > > > > > > Moreover, having rcu api under enqueue_burst() will
> > > > > > > > > > > affect the
> > > > > > > > > > performance.
> > > > > > > > > >
> > > > > > > > > > It most likely will. Though my expectation it will
> > > > > > > > > > affect performance only when some callbacks are
> > > > > > > > > > installed. My thought
> > > > > > here:
> > > > > > > > > > callback function by itself will affect cryptdev_enqueue
> > > > > > > > > > performance anyway,
> > > > > > > > > With existing callback design, I have measured the
> > > > > > > > > performance(with
> > > > > > > > crypto perf test) on xeon.
> > > > > > > > > It was almost negligible and same was shared with Declan.
> > > > > > > >
> > > > > > > > I am asking about different thing: did you try alternate
> > > > > > > > approach I described, that wouldn't require changes in the
> > > > > > > > user data-
> > > > plane code.
> > > > > > > >
> > > > > > > > > That is one of the reasons, I didn't want to add to many
> > > > > > > > > stuffs in to the
> > > > > > > > callback.
> > > > > > > > > The best part of existing design is crypto lib is not much modified.
> > > > > > > > > The changes are either pushed to CP or DP.
> > > > > > > > >
> > > > > > > > > so adding extra overhead for sync is probably ok here.
> > > > > > > >
> > > > > > > > I think that extra overhead when callbacks are present is
> > > > > > > > expected and probably acceptable.
> > > > > > > > Changes in the upper-layer data-plane code - probably not.
> > > > > > > >
> > > > > > > > > > Though for situation when no callbacks are installed -
> > > > > > > > > > perfomance should be left unaffected (or impact should
> > > > > > > > > > be as small
> > > > > > as possible).
> > > > > > > > > >
> > > > > > > > > > > The changes are more on control plane side, which is one
> > time.
> > > > > > > > > > > The data plane changes are minimal.
> > > > > > > > > >
> > > > > > > > > > I still think upper layer data-plane code should stay
> > > > > > > > > > unaffected (zero changes).
> > > > > > > > > >
> > > > > > > > > > > >
> > > > >
> > > > > <snip>

  reply	other threads:[~2020-10-09 14:40 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-08  7:10 Abhinandan Gujjar
2020-09-09 16:21 ` Gujjar, Abhinandan S
2020-09-09 22:48   ` Honnappa Nagarahalli
2020-09-11  8:33     ` Gujjar, Abhinandan S
2020-09-16 13:39 ` Ananyev, Konstantin
2020-09-16 15:17   ` Gujjar, Abhinandan S
2020-09-17 13:24     ` Ananyev, Konstantin
2020-09-21 10:59       ` Gujjar, Abhinandan S
2020-09-21 11:45         ` Ananyev, Konstantin
2020-09-23  3:25           ` Honnappa Nagarahalli
2020-09-23 11:58             ` Ananyev, Konstantin
2020-09-23 22:41               ` Honnappa Nagarahalli
2020-09-24 17:06                 ` Ananyev, Konstantin
2020-09-29  5:14                   ` Honnappa Nagarahalli
2020-09-29  9:03                     ` Ananyev, Konstantin
2020-10-08 17:36                       ` Gujjar, Abhinandan S
2020-10-09 14:40                         ` Ananyev, Konstantin [this message]
2020-10-09 16:51                           ` Gujjar, Abhinandan S
2020-10-11 23:02                             ` Ananyev, Konstantin
2020-10-12  6:47                               ` Gujjar, Abhinandan S
2020-10-12 13:02                                 ` Ananyev, Konstantin
2020-10-19 22:04                                   ` Honnappa Nagarahalli
2020-10-20  9:25                                     ` Gujjar, Abhinandan S
2020-09-24  5:17 ` Honnappa Nagarahalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR11MB3301CB4401FF52F3E8737C229A080@BYAPR11MB3301.namprd11.prod.outlook.com \
    --to=konstantin.ananyev@intel.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=abhinandan.gujjar@intel.com \
    --cc=akhil.goyal@nxp.com \
    --cc=declan.doherty@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=narender.vangati@intel.com \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).