DPDK patches and discussions
 help / color / mirror / Atom feed
From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"Gujjar, Abhinandan S" <abhinandan.gujjar@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Doherty,  Declan" <declan.doherty@intel.com>
Cc: "jerinj@marvell.com" <jerinj@marvell.com>,
	"Akhil.goyal@nxp.com" <akhil.goyal@nxp.com>,
	"Vangati, Narender" <narender.vangati@intel.com>, nd <nd@arm.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	nd <nd@arm.com>
Subject: Re: [dpdk-dev] [v2 1/2] cryptodev: support enqueue callback functions
Date: Wed, 23 Sep 2020 22:41:06 +0000	[thread overview]
Message-ID: <DBAPR08MB581492EEC7314057462689C298380@DBAPR08MB5814.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <DM6PR11MB330896B2863AEA0F2D24B4B39A380@DM6PR11MB3308.namprd11.prod.outlook.com>

<snip>

> > > > > > > >
> > > > > > > > +#ifdef RTE_CRYPTODEV_CALLBACKS int
> > > > > > > > +rte_cryptodev_rcu_qsbr_add(uint8_t dev_id, struct
> > > > > > > > +rte_rcu_qsbr
> > > > > > > > +*qsbr) {
> > > > > > > > +
> > > > > > > > +	struct rte_cryptodev *dev;
> > > > > > > > +
> > > > > > > > +	if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
> > > > > > > > +		CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
> > > > > > > > +		return -EINVAL;
> > > > > > > > +	}
> > > > > > > > +
> > > > > > > > +	dev = &rte_crypto_devices[dev_id];
> > > > > > > > +	dev->qsbr = qsbr;
> > > > > > > > +	return 0;
> > > > > > > > +}
> > > > > > >
> > > > > > > So if I understand your patch correctly you propose a new
> > > > > > > working model for
> > > > > > > crypto-devs:
> > > > > > > 1. Control-plane has to allocate/setup rcu_qsbr and do
> > > > > > > rte_cryptodev_rcu_qsbr_add().
> > > > > > > 2. Data-plane has somehow to obtain pointer to that rcu_qsbr
> > > > > > > and wrap
> > > > > > > cryptodev_enqueue()
> > > > > > >    with rcu_qsbr_quiescent()  or
> rcu_qsbr_online()/rcu_qsbr_offline().
> > > > > > Yes. I think, it is not a new model. It is same as RCU
> > > > > > integration with
> > > LPM.
> > > > > > Please refer: https://patches.dpdk.org/cover/73673/
> > > > >
> > > > > I am talking about new working model for crypto-dev
> enqueue/dequeue.
> > > > > As I said above now it becomes data-plane thread responsibility to:
> > > > >  -somehow to obtain pointer to that rcu_qsbr for each cryptodev
> > > > > it is
> > > using.
> > > > >  -call rcu sync functions (quiescent/online/offline) on a regular basis.
> > > > It is not on regular basis. When data plane comes up, they report online.
> > > > They report quiescent when they are done with critical section or
> > > > shared
> > > structure.
> > >
> > > I understand that, but it means all existing apps have to be changed that
> way.
> > >
> > > > All though, there is some dataplane changes involved here, I don't
> > > > think, it
> > > is major.
> > >
> > > I still think our goal here should be to make no visible changes to
> > > the dataplane.
> > > I.E. all necessary data-plane changes need to be hidden inside CB
> > > invocation part.
> > Please note that this is being implemented using the memory
> > reclamation framework documented at
> > https://doc.dpdk.org/guides/prog_guide/rcu_lib.html#resource-reclamati
> > on-framework-for-dpdk
> >
> > While using RCU there are couple of trade-offs that applications have to
> consider:
> > 1) Performance - reporting the quiescent state too often results in
> > performance impact on data plane
> > 2) Amount of outstanding memory to reclaim - reporting less often
> > results in more outstanding memory to reclaim
> >
> > Hence, the quiescent state reporting is left to the application. The
> > application decides how often it reports the quiescent state and has control
> over the data plane performance and the outstanding memory to reclaim.
> >
> > When you say "new working model for crypto-dev enqueue/dequeue",
> >
> > 1) are you comparing these with existing crypto-dev enqueue/dequeue
> APIs? If yes, these are new APIs, it is not breaking anything.
> > 2) are you comparing these with existing call back functions in ethdev
> > enqueue/dequeue APIs? If yes, agree that this is a new model. But, it is
> possible to support what ethdev supports along with the RCU method used
> in this patch.
> 
> What I am talking about:
> 
> Existing cryptodev enqueue/dequeue model doesn't require for the user to
> manage any RCU QSBR state manually.
> I believe that addition of ability to add/remove enqueue/dequeue callbacks
> shouldn't change existing working model.
> I think that adding/removing such callbacks has to be opaque to the user DP
> code and shouldn't require user to change it. Same as we have now for
> ethdev callback implementation.
The ethdev callback implementation conveniently leaves the problem of freeing memory to the user to resolve, it does not handle the issue. Hence, it "looks" to be opaque to the DP code. However, if the application has to implement a safe way to free the call back memory, its DP is affected based on call backs are being used or not.

> I think that forcing DP code to be aware that callbacks are present or not and
> to modify its behaviour depending on that nearly voids the purpose of having
> callbacks at all.
> In that case DP can just invoke callback function directly from it's codepath .
> 
> > >
> > > >
> > > > > Note that now data-plane thread would have to do that always -
> > > > > even if there are now callbacks installed for that cryptodev queue
> right now.
> > > > > All that changes behaviour of existing apps and I presume would
> > > > > reduce adoption of  that fature.
> > If I understand this correct, you are talking about a case where in
> > the application might be registering/unregistering multiple times
> > during its lifetime. In this case, yes, the application might be reporting the
> quiescent state even when it has not registered the call backs. But, it has the
> flexibility to not report it if it implements additional logic.
> > Note that we are assuming that the application has to report quiescent
> > state only for using callback functions. Most probably the application has
> other requirements to use RCU.
> > Why not support what is done for ethdev call back functions along with
> providing RCU method?
> >
> > > > There is always trade off involved!
> > > > In the previous patch, you suggested that some lazy app may not
> > > > free up the memory allocated by add cb. For such apps, this patch
> > > > has sync mechanism with some additional cost of CP & DP changes.
> > >
> > > Sigh, it is not about laziness of the app.
> > > The problem with current ethedev cb mechanism and yours V1 (which
> > > was just a clone of it) - CP doesn't know when it is safe after CB
> > > removal to free related memory.
> > >
> > > > > I still think all this callback mechanism should be totally
> > > > > opaque to data-plane threads - user shouldn't change his app
> > > > > code depending on would some enqueue/dequeue callbacks be
> installed or not.
> > > > I am not sure, how that can be implemented with existing RCU design.
> > >
> > > As I said below the simplest way - with calling rcu onine/offline
> > > inside CB invocation block.
> > > That's why I asked you - did you try that approach and what is the
> > > perf numbers?
> > > I presume with no callbacks installed the perf change should be nearly
> zero.
> > >
> > > > @Honnappa Nagarahalli, Do you have any suggestions?
> > Reporting quiescent state in the call back functions has several
> disadvantages:
> > 1) it will have performance impacts and the impacts will increase as the
> number of data plane threads increase.
> > 2) It will require additional configuration parameters to control how
> > often the quiescent state is reported to control the performance impact.
> > 3) Does not take advantage of the fact that most probably the
> > application is using RCU already
> > 4) There are few difficulties as well, please see below.
> 
> I suggested Abhinandan to use RCU library because it is already there, and I
> thought it would be good not to re-implement the wheel.
> Though if you feel librte_rcu doesn't match that task - fine, let's do it without
> librte_rcu.
> After all, what we need here - just an atomic ref count per queue that we are
> going to increment at entering and leaving list of callbacks inside
> enqueue/dequeue.
Ok, looks like I missed the point that a queue is used by a single data plane thread.
Along with ref count increment you need the memory orderings to avoid race conditions. These will be the same ones used in RCU.
On the control plane, you need to read this counter and poll for the counter updates. All this is same cost as in RCU. To control the cost, you will have to control the rate of quiescent state reporting and might have to expose this as a configuration parameter.

The other important information you have to consider is if the thread is making any blocking calls, which may be in some other library. The thread is supposed to call rcu_qsbr_thread_offline API before calling a blocking call. This allows the RCU to know that this particular thread will not report quiescent state. The cryptodev library might not have that information.

If you want to go ahead with this design, you can still use RCU with single thread configuration (like you have mentioned below) and hide the details from the application.

> 
> >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > That seems quite a big change and I don't think it is
> > > > > > > acceptable for most users.
> > > > > > > From my perspective adding/installing call-backs to the dev
> > > > > > > has to be opaque to the data-plane code.
> > > > > > > Also note that different callbacks can be installed by
> > > > > > > different entities (libs) and might have no idea about each other.
> > > > > > > That's why I thought it would be better to make all this RCU
> > > > > > > stuff internal inside cryptodev:
> > > > > > >     hide all this rcu_qsbr allocation/setup inside cryptod
> > > > > > > somehow to
> > > > > obtain pointer to that rcu_qsbr ev init/queue setup
> > > > > > >     invoke rcu_qsbr_online()/rcu_qsbr_offline() inside
> > > > > cryptodev_enqueue().
> > This will bring in the application related information such as the thread ID
> into the library.
> 
> I don't think it would.
> Cryptodev enqueue/dequeue functions are not supposed to be thread safe
> (same as rx/tx burst).
> So we can always use RCU with just one thread(thread_id = 0).
Agree, the memory that needs to be freed is accessed by a single thread on the data plane. RCU with one thread would suffice.

> But as I said above - if you feel RCU lib is an overhead here, that's fine - I
> think it would be easy enough to do without librte_rcu.
> 
> > If the same API calls are being made from multiple data plane threads,
> > you need a way to configure that information to the library. So, it is
> > better to leave those details for the application to handle.
> >
> > > > > > I have already tried exploring above stuffs. There are too
> > > > > > many
> > > constraints.
> > > > > > The changes don't fit in, as per RCU design.
> > > > >
> > > > > Hmm could you be more specific here - what constraints are you
> > > > > referring to?
> > > > >
> > > > > > Moreover, having rcu api under enqueue_burst() will affect the
> > > > > performance.
> > > > >
> > > > > It most likely will. Though my expectation it will affect
> > > > > performance only when some callbacks are installed. My thought
> here:
> > > > > callback function by itself will affect cryptdev_enqueue
> > > > > performance anyway,
> > > > With existing callback design, I have measured the
> > > > performance(with
> > > crypto perf test) on xeon.
> > > > It was almost negligible and same was shared with Declan.
> > >
> > > I am asking about different thing: did you try alternate approach I
> > > described, that wouldn't require changes in the user data-plane code.
> > >
> > > > That is one of the reasons, I didn't want to add to many stuffs in
> > > > to the
> > > callback.
> > > > The best part of existing design is crypto lib is not much modified.
> > > > The changes are either pushed to CP or DP.
> > > >
> > > > so adding extra overhead for sync is probably ok here.
> > >
> > > I think that extra overhead when callbacks are present is expected
> > > and probably acceptable.
> > > Changes in the upper-layer data-plane code - probably not.
> > >
> > > > > Though for situation when no callbacks are installed -
> > > > > perfomance should be left unaffected (or impact should be as small
> as possible).
> > > > >
> > > > > > The changes are more on control plane side, which is one time.
> > > > > > The data plane changes are minimal.
> > > > >
> > > > > I still think upper layer data-plane code should stay unaffected
> > > > > (zero changes).
> > > > >
> > > > > > >

<snip>

  reply	other threads:[~2020-09-23 22:41 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-08  7:10 Abhinandan Gujjar
2020-09-09 16:21 ` Gujjar, Abhinandan S
2020-09-09 22:48   ` Honnappa Nagarahalli
2020-09-11  8:33     ` Gujjar, Abhinandan S
2020-09-16 13:39 ` Ananyev, Konstantin
2020-09-16 15:17   ` Gujjar, Abhinandan S
2020-09-17 13:24     ` Ananyev, Konstantin
2020-09-21 10:59       ` Gujjar, Abhinandan S
2020-09-21 11:45         ` Ananyev, Konstantin
2020-09-23  3:25           ` Honnappa Nagarahalli
2020-09-23 11:58             ` Ananyev, Konstantin
2020-09-23 22:41               ` Honnappa Nagarahalli [this message]
2020-09-24 17:06                 ` Ananyev, Konstantin
2020-09-29  5:14                   ` Honnappa Nagarahalli
2020-09-29  9:03                     ` Ananyev, Konstantin
2020-10-08 17:36                       ` Gujjar, Abhinandan S
2020-10-09 14:40                         ` Ananyev, Konstantin
2020-10-09 16:51                           ` Gujjar, Abhinandan S
2020-10-11 23:02                             ` Ananyev, Konstantin
2020-10-12  6:47                               ` Gujjar, Abhinandan S
2020-10-12 13:02                                 ` Ananyev, Konstantin
2020-10-19 22:04                                   ` Honnappa Nagarahalli
2020-10-20  9:25                                     ` Gujjar, Abhinandan S
2020-09-24  5:17 ` Honnappa Nagarahalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DBAPR08MB581492EEC7314057462689C298380@DBAPR08MB5814.eurprd08.prod.outlook.com \
    --to=honnappa.nagarahalli@arm.com \
    --cc=abhinandan.gujjar@intel.com \
    --cc=akhil.goyal@nxp.com \
    --cc=declan.doherty@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=narender.vangati@intel.com \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).