Re: [dpdk-dev] [PATCH v2] doc/compress: clarify error handling on data-plane

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Shally Verma <shallyv@marvell.com>
To: "Trahe, Fiona" <fiona.trahe@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "akhil.goyal@nxp.com" <akhil.goyal@nxp.com>,
	Ashish Gupta <ashishg@marvell.com>,
	"Daly, Lee" <lee.daly@intel.com>, Sunila Sahu <ssahu@marvell.com>,
	"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2] doc/compress: clarify error handling on data-plane
Date: Tue, 7 May 2019 17:14:50 +0000	[thread overview]
Message-ID: <BN6PR1801MB20524947F09A7520B79F318FAD310@BN6PR1801MB2052.namprd18.prod.outlook.com> (raw)
In-Reply-To: <348A99DA5F5B7549AA880327E580B4358974F5BC@IRSMSX101.ger.corp.intel.com>



> -----Original Message-----
> From: Trahe, Fiona <fiona.trahe@intel.com>
> Sent: Tuesday, April 30, 2019 10:04 PM
> To: Shally Verma <shallyv@marvell.com>; dev@dpdk.org
> Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee
> <lee.daly@intel.com>; Sunila Sahu <ssahu@marvell.com>; stable@dpdk.org;
> Trahe, Fiona <fiona.trahe@intel.com>
> Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plane
> 
> Hi Shally,
> 
> 
> > -----Original Message-----
> > From: Shally Verma [mailto:shallyv@marvell.com]
> > Sent: Thursday, April 18, 2019 1:12 PM
> > To: Trahe, Fiona <fiona.trahe@intel.com>; dev@dpdk.org
> > Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee
> > <lee.daly@intel.com>; Sunila Sahu <ssahu@marvell.com>;
> stable@dpdk.org
> > Subject: RE: [PATCH v2] doc/compress: clarify error handling on
> > data-plane
> >
> > Hi Fiona,
> >
> >
> > > -----Original Message-----
> > > From: Fiona Trahe <fiona.trahe@intel.com>
> > > Sent: Tuesday, April 9, 2019 8:26 PM
> > > To: dev@dpdk.org
> > > Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>;
> > > lee.daly@intel.com; Sunila Sahu <ssahu@marvell.com>; Shally Verma
> > > <shallyv@marvell.com>; Fiona Trahe <fiona.trahe@intel.com>;
> > > stable@dpdk.org
> > > Subject: [PATCH v2] doc/compress: clarify error handling on
> > > data-plane
> > >
> > > Fixed some typos and clarified which errors should be returned when
> > > and why on the enqueue and dequeue APIs.
> > >
> > > Fixes: a584d3bea902 ("doc: add compressdev library guide")
> > > cc: stable@dpdk.org
> > >
> > > Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
> > > ---
> > > v2 changes:
> > >  - changed "0 or undefined" to just "undefined" as 0 is superfluous.
> > >
> > >
> > >  doc/guides/prog_guide/compressdev.rst | 46
> > > ++++++++++++++++++++++++++++++++---
> > >  1 file changed, 43 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/doc/guides/prog_guide/compressdev.rst
> > > b/doc/guides/prog_guide/compressdev.rst
> > > index ad9703753..c700dd103 100644
> > > --- a/doc/guides/prog_guide/compressdev.rst
> > > +++ b/doc/guides/prog_guide/compressdev.rst
> > > @@ -201,7 +201,7 @@ for stateful processing of ops.
> > >  Operation Status
> > >  ~~~~~~~~~~~~~~~~
> > >  Each operation carries a status information updated by PMD after it
> > > is processed.
> > > -following are currently supported status:
> > > +Following are currently supported:
> > >
> > >  - RTE_COMP_OP_STATUS_SUCCESS,
> > >      Operation is successfully completed @@ -227,14 +227,54 @@
> > > following are currently supported status:
> > >      is not an error case. Output data up to op.produced can be used and
> > >      next op in the stream should continue on from op.consumed+1.
> > >
> > > +Operation status after enqueue / dequeue
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +Some of the above values will only arise in the op after an
> > > +``rte_compressdev_enqueue_burst()``, some only after an
> > > +``rte_compressdev_dequeue_burst()``. For optimal performance on the
> > > +data-plane an application is not expected to check the
> > > +``op.status`` of all ops after both enqueue and dequeue, it should
> > > +be sufficient to only check after dequeue. To facilitate this
> > > +optimisation, most errors which may reasonably be expected to occur
> > > +in a production environment will be
> > > returned by the PMD on the ``dequeue``.
> > > +op.status may hold the following values after dequeue:
> > > +
> > > +- RTE_COMP_OP_STATUS_SUCCESS
> > > +- RTE_COMP_OP_STATUS_ERROR
> > > +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED
> > > +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE
> > > +
> > > +There are some exceptions whereby errors can occur on the
> ``enqueue``.
> > > +For any error which can occur in a production environment and can
> > > +be successful after a retry with the same op the PMD may return the
> > > +error on the enqueue.
> > This statement looks bit confusing.
> > Seems like we are trying to add a description regarding op status
> > check even after the enqueue call unlike current scenario, where app
> > only check for it after dequeue?
> [Fiona] The line following this explains that there is no need to check
> op.status in this case.
> Maybe it's not obvious that the application SHOULD check that all ops are
> enqueued?
> I can reword as:
> The application should always check the value returned by the enqueue.
> If less than the full burst is enqueued there's no need for the application to
> check op.status of any or every op - it can simply retry from the return
> value+1 in a later enqueue and expect success.
> 
 I agree to purpose of patch but have these confusions when I read description above:

My understand is , if op status is INVALID_ARGS or any ERROR which is permanent in nature,
Then nb_enqd return will be less than actually passed. Regardless of whatever reason, if any time app gets nb_enqd < actually passed, then app should check status of nb_enqd + 1th op to find
exact cause of failure and then either attempt re-enqueue Or correct op preparation or take any other appropriate action.
Also, STATUS_ERROR is very generic, it can be when queue is full in which case app can re-attempt an enqueue of same op OR
It can also indicate any irrecoverable error on enqueue, in which app just probably has to reset everything. For such kind of case, it might not be possible for PMD design
to even push it into completion queue for an app to dequeue .  I would suggest  add another status code type which reflect permanent error condition i.e. irrecoverable error code
which tells an app to perform PMD qp reset/re-init to recover and simplify description just to state an expected APP behavior to avoid infinite loop condition.
It is then an app choice whether or not to check for op status for error after enqueue depending on whether its running in production environment or dev environment.

Thanks
Shally


> If this isn't clear, can you suggest other wording - or expand on what's not
> unclear.
> 
> 
> > So if less than the full burst is enqueued there's no
> > > +need for the application to check op.status - the application can
> > > +simply retry in a later enqueue and expect success. Though the
> > > +application
> > > is not expected to check for these, the values are as follows:
> > > +
> > > +- RTE_COMP_OP_STATUS_NOT_PROCESSED  - could occur if a hardware
> > > device's queue is full, after a dequeue a retry of the enqueue can
> > > be successful.
> > > +
> > > +- RTE_COMP_OP_STATUS_ERROR - could occur due to out-of-memory
> or
> > > other transient condition which could clear after a time.
> > > +
> > > +Other errors may also occur on an ``enqueue``, but they are only
> > > +expected to arise during development. As a retry with the same op
> > > +won't be successful, if a performant application wants to avoid
> > > +checking op.status on the enqueue it should ensure these never
> > > +arise in a production environment, e.g. by checking device
> > > +capabilities and validating
> > > input parameters before sending operations. Examples are:
> > > +
> > > +- RTE_COMP_OP_STATUS_INVALID_ARGS
> > > +- RTE_COMP_OP_STATUS_ERROR (if due to a condition which is not
> > > +transient)
> > How does app identify error is transient or permanent?
> [Fiona] The app doesn't - it's up to the PMD to decide what the appropriate
> return is using the logic described above. If the PMD encounters an error
> that's not transient, and can be reasonably expected in a production
> environment, then it should forward the error to the dequeue - where it
> expects the application to check for it.
> Should I add this note at the end of this section?
> 
> >
> > > +- RTE_COMP_OP_STATUS_INVALID_STATE
> > > +
> > > +If an application doesn't safeguard against these AND doesn't check
> > > +the op.status of the next op which was not enqueued, but just
> > > +retries, it could
> > > result in an infinite loop.
> > > +
> > May be , you are trying to insist whenever the number_enqueued ops <
> > number actually enqueued , then caller should check for status of
> > num_enqueued + 1th op to know exact reason, and take corrective
> measure before re-enqueuing it for following status condition:
> > 1. INVALID_ARGS
> > 2. INVALID_STATE
> > 3 ERROR
> >
> > Is that correct?
> [Fiona] Only if the application has a slow-path for debug in non-production
> code.
> In production code, for high-performance, I'm saying it's reasonable not to
> do this check.
> 
> The reason for calling this out is we did have cases where errors were being
> returned on the enqueue, But the performance tool was not checking
> op.status of enqueued+1 and so getting into an infinite loop.
> Seems reasonable that a performant application would also not check.
> We've removed those cases from QAT PMD, and behave according to above
> logic.
> But  it's not described explicitly in this detail on the API - so think it's worth
> calling out this expectation.
> Does it make sense?
> 
> 
> >
> >
> > >  Produced, Consumed And Operation Status
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > >  - If status is RTE_COMP_OP_STATUS_SUCCESS,
> > >      consumed = amount of data read from input buffer, and
> > >      produced = amount of data written in destination buffer
> > > -- If status is RTE_COMP_OP_STATUS_FAILURE,
> > > -    consumed = produced = 0 or undefined
> > > +- If status is RTE_COMP_OP_STATUS_ERROR,
> > > +    consumed = produced = undefined
> > >  - If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED,
> > >      consumed = 0 and
> > >      produced = usually 0, but in decompression cases a PMD may
> > > return > 0
> > > --
> > Acked.
> >
> > > 2.13.6

next prev parent reply	other threads:[~2019-05-07 17:14 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-08 16:14 [dpdk-dev] [PATCH] " Fiona Trahe
2019-04-08 16:14 ` Fiona Trahe
2019-04-09 14:55 ` [dpdk-dev] [PATCH v2] " Fiona Trahe
2019-04-09 14:55   ` Fiona Trahe
2019-04-18 12:12   ` Shally Verma
2019-04-18 12:12     ` Shally Verma
2019-04-30 16:33     ` Trahe, Fiona
2019-04-30 16:33       ` Trahe, Fiona
2019-05-07 17:14       ` Shally Verma [this message]
2019-05-07 17:14         ` Shally Verma
2019-05-07 18:24         ` Trahe, Fiona
2019-05-07 18:24           ` Trahe, Fiona
2019-05-08 12:41           ` Shally Verma
2019-05-08 12:41             ` Shally Verma
2019-05-08 14:00             ` Trahe, Fiona
2019-05-08 14:00               ` Trahe, Fiona
2019-05-09  8:58               ` Akhil Goyal
2019-05-09  8:58                 ` Akhil Goyal
2019-05-09 10:44   ` Shally Verma
2019-05-09 10:44     ` Shally Verma
2019-05-14 15:29     ` Trahe, Fiona
2019-05-14 15:29       ` Trahe, Fiona
2019-05-14 15:37       ` Shally Verma
2019-05-14 15:37         ` Shally Verma
2019-05-15 11:16   ` [dpdk-dev] [PATCH v3] " Fiona Trahe
2019-05-15 11:16     ` Fiona Trahe
2019-05-15 11:43     ` Jozwiak, TomaszX
2019-05-15 11:43       ` Jozwiak, TomaszX
2019-05-15 12:03     ` Shally Verma
2019-05-15 12:03       ` Shally Verma
2019-06-19 14:57       ` Akhil Goyal
2019-04-16 15:02 [dpdk-dev] [PATCH v2] " Akhil Goyal
2019-04-16 15:02 ` Akhil Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN6PR1801MB20524947F09A7520B79F318FAD310@BN6PR1801MB2052.namprd18.prod.outlook.com \
    --to=shallyv@marvell.com \
    --cc=akhil.goyal@nxp.com \
    --cc=ashishg@marvell.com \
    --cc=dev@dpdk.org \
    --cc=fiona.trahe@intel.com \
    --cc=lee.daly@intel.com \
    --cc=ssahu@marvell.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).