From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by dpdk.space (Postfix) with ESMTP id 22D80A0679
	for <public@inbox.dpdk.org>; Tue, 30 Apr 2019 18:33:38 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 3A58858FA;
	Tue, 30 Apr 2019 18:33:37 +0200 (CEST)
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
 by dpdk.org (Postfix) with ESMTP id 47732326C;
 Tue, 30 Apr 2019 18:33:35 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga007.jf.intel.com ([10.7.209.58])
 by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 30 Apr 2019 09:33:34 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.60,414,1549958400"; d="scan'208";a="135720898"
Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159])
 by orsmga007.jf.intel.com with ESMTP; 30 Apr 2019 09:33:32 -0700
Received: from irsmsx101.ger.corp.intel.com ([169.254.1.115]) by
 IRSMSX104.ger.corp.intel.com ([169.254.5.44]) with mapi id 14.03.0415.000;
 Tue, 30 Apr 2019 17:33:31 +0100
From: "Trahe, Fiona" <fiona.trahe@intel.com>
To: Shally Verma <shallyv@marvell.com>, "dev@dpdk.org" <dev@dpdk.org>
CC: "akhil.goyal@nxp.com" <akhil.goyal@nxp.com>, Ashish Gupta
 <ashishg@marvell.com>, "Daly, Lee" <lee.daly@intel.com>, Sunila Sahu
 <ssahu@marvell.com>, "stable@dpdk.org" <stable@dpdk.org>, "Trahe, Fiona"
 <fiona.trahe@intel.com>
Thread-Topic: [PATCH v2] doc/compress: clarify error handling on data-plane
Thread-Index: AQHU7uRoniPOIsNkTEyu55+FfYvFQqZB0WaAgBMsX4A=
Date: Tue, 30 Apr 2019 16:33:30 +0000
Message-ID:
 <348A99DA5F5B7549AA880327E580B4358974F5BC@IRSMSX101.ger.corp.intel.com>
References: <1554740072-11898-1-git-send-email-fiona.trahe@intel.com>
 <1554821747-27868-1-git-send-email-fiona.trahe@intel.com>
 <BN6PR1801MB20521B48C169DA254FF78019AD260@BN6PR1801MB2052.namprd18.prod.outlook.com>
In-Reply-To: <BN6PR1801MB20521B48C169DA254FF78019AD260@BN6PR1801MB2052.namprd18.prod.outlook.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNTBjY2U3NWUtYWNjYS00MTkxLWJiOTQtZDhmZDBiMzg3NWQxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiQUZnT0ttQjdCQW0wZWdKUVNxZ0loMHRXTjBxTHVOQ2ZcL1l4dmZrRFprMlJ3V1ZScDhNMzZIdCtOdXIrR1BSOGIifQ==
x-ctpclassification: CTP_NT
dlp-product: dlpe-windows
dlp-version: 11.0.600.7
dlp-reaction: no-action
x-originating-ip: [163.33.239.181]
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v2] doc/compress: clarify error handling on
	data-plane
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>
Message-ID: <20190430163330.Zdy7CYuiJHLATMzcQ2n_cNJgMfB6eAja68KpNph-M8A@z>

Hi Shally,


> -----Original Message-----
> From: Shally Verma [mailto:shallyv@marvell.com]
> Sent: Thursday, April 18, 2019 1:12 PM
> To: Trahe, Fiona <fiona.trahe@intel.com>; dev@dpdk.org
> Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee <l=
ee.daly@intel.com>; Sunila
> Sahu <ssahu@marvell.com>; stable@dpdk.org
> Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plan=
e
>=20
> Hi Fiona,
>=20
>=20
> > -----Original Message-----
> > From: Fiona Trahe <fiona.trahe@intel.com>
> > Sent: Tuesday, April 9, 2019 8:26 PM
> > To: dev@dpdk.org
> > Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>;
> > lee.daly@intel.com; Sunila Sahu <ssahu@marvell.com>; Shally Verma
> > <shallyv@marvell.com>; Fiona Trahe <fiona.trahe@intel.com>;
> > stable@dpdk.org
> > Subject: [PATCH v2] doc/compress: clarify error handling on data-plane
> >
> > Fixed some typos and clarified which errors should be returned when and
> > why on the enqueue and dequeue APIs.
> >
> > Fixes: a584d3bea902 ("doc: add compressdev library guide")
> > cc: stable@dpdk.org
> >
> > Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
> > ---
> > v2 changes:
> >  - changed "0 or undefined" to just "undefined" as 0 is superfluous.
> >
> >
> >  doc/guides/prog_guide/compressdev.rst | 46
> > ++++++++++++++++++++++++++++++++---
> >  1 file changed, 43 insertions(+), 3 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/compressdev.rst
> > b/doc/guides/prog_guide/compressdev.rst
> > index ad9703753..c700dd103 100644
> > --- a/doc/guides/prog_guide/compressdev.rst
> > +++ b/doc/guides/prog_guide/compressdev.rst
> > @@ -201,7 +201,7 @@ for stateful processing of ops.
> >  Operation Status
> >  ~~~~~~~~~~~~~~~~
> >  Each operation carries a status information updated by PMD after it is
> > processed.
> > -following are currently supported status:
> > +Following are currently supported:
> >
> >  - RTE_COMP_OP_STATUS_SUCCESS,
> >      Operation is successfully completed @@ -227,14 +227,54 @@ followin=
g are
> > currently supported status:
> >      is not an error case. Output data up to op.produced can be used an=
d
> >      next op in the stream should continue on from op.consumed+1.
> >
> > +Operation status after enqueue / dequeue
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +Some of the above values will only arise in the op after an
> > +``rte_compressdev_enqueue_burst()``, some only after an
> > +``rte_compressdev_dequeue_burst()``. For optimal performance on the
> > +data-plane an application is not expected to check the ``op.status`` o=
f
> > +all ops after both enqueue and dequeue, it should be sufficient to onl=
y
> > +check after dequeue. To facilitate this optimisation, most errors whic=
h
> > +may reasonably be expected to occur in a production environment will b=
e
> > returned by the PMD on the ``dequeue``.
> > +op.status may hold the following values after dequeue:
> > +
> > +- RTE_COMP_OP_STATUS_SUCCESS
> > +- RTE_COMP_OP_STATUS_ERROR
> > +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED
> > +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE
> > +
> > +There are some exceptions whereby errors can occur on the ``enqueue``.
> > +For any error which can occur in a production environment and can be
> > +successful after a retry with the same op the PMD may return the error
> > +on the enqueue.
> This statement looks bit confusing.
> Seems like we are trying to add a description regarding op status check e=
ven after the enqueue call unlike
> current scenario, where app only check for it
> after dequeue?
[Fiona] The line following this explains that there is no need to check op.=
status in this case.
Maybe it's not obvious that the application SHOULD check that all ops are e=
nqueued?
I can reword as:
The application should always check the value returned by the enqueue.
If less than the full burst is enqueued there's no
need for the application to check op.status of any or every op - it can
simply retry from the return value+1 in a later enqueue and expect success.

If this isn't clear, can you suggest other wording - or expand on what's no=
t unclear.

=20
> So if less than the full burst is enqueued there's no
> > +need for the application to check op.status - the application can
> > +simply retry in a later enqueue and expect success. Though the applica=
tion
> > is not expected to check for these, the values are as follows:
> > +
> > +- RTE_COMP_OP_STATUS_NOT_PROCESSED  - could occur if a hardware
> > device's queue is full, after a dequeue a retry of the enqueue can be
> > successful.
> > +
> > +- RTE_COMP_OP_STATUS_ERROR - could occur due to out-of-memory or
> > other transient condition which could clear after a time.
> > +
> > +Other errors may also occur on an ``enqueue``, but they are only
> > +expected to arise during development. As a retry with the same op won'=
t
> > +be successful, if a performant application wants to avoid checking
> > +op.status on the enqueue it should ensure these never arise in a
> > +production environment, e.g. by checking device capabilities and valid=
ating
> > input parameters before sending operations. Examples are:
> > +
> > +- RTE_COMP_OP_STATUS_INVALID_ARGS
> > +- RTE_COMP_OP_STATUS_ERROR (if due to a condition which is not
> > +transient)
> How does app identify error is transient or permanent?
[Fiona] The app doesn't - it's up to the PMD to decide what the appropriate
return is using the logic described above. If the PMD encounters an error t=
hat's=20
not transient, and can be reasonably expected in a production environment,
then it should forward the error to the dequeue - where it expects the appl=
ication
to check for it.
Should I add this note at the end of this section?

>=20
> > +- RTE_COMP_OP_STATUS_INVALID_STATE
> > +
> > +If an application doesn't safeguard against these AND doesn't check th=
e
> > +op.status of the next op which was not enqueued, but just retries, it =
could
> > result in an infinite loop.
> > +
> May be , you are trying to insist whenever the number_enqueued ops < numb=
er actually enqueued , then
> caller should check for status of num_enqueued + 1th op to know exact rea=
son, and
> take corrective measure before re-enqueuing it for following status condi=
tion:
> 1. INVALID_ARGS
> 2. INVALID_STATE
> 3 ERROR
>=20
> Is that correct?
[Fiona] Only if the application has a slow-path for debug in non-production=
 code.
In production code, for high-performance, I'm saying it's reasonable not to=
 do this check.

The reason for calling this out is we did have cases where errors were bein=
g returned on the enqueue,
But the performance tool was not checking op.status of enqueued+1 and so ge=
tting into an infinite loop.
Seems reasonable that a performant application would also not check.
We've removed those cases from QAT PMD, and behave according to above logic=
.
But  it's not described explicitly in this detail on the API - so think it'=
s worth calling out this expectation.
Does it make sense?

=20
>=20
>=20
> >  Produced, Consumed And Operation Status
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> >  - If status is RTE_COMP_OP_STATUS_SUCCESS,
> >      consumed =3D amount of data read from input buffer, and
> >      produced =3D amount of data written in destination buffer
> > -- If status is RTE_COMP_OP_STATUS_FAILURE,
> > -    consumed =3D produced =3D 0 or undefined
> > +- If status is RTE_COMP_OP_STATUS_ERROR,
> > +    consumed =3D produced =3D undefined
> >  - If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED,
> >      consumed =3D 0 and
> >      produced =3D usually 0, but in decompression cases a PMD may retur=
n > 0
> > --
> Acked.
>=20
> > 2.13.6