From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 1E896A0096 for ; Tue, 7 May 2019 19:15:01 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0EFAA2BD5; Tue, 7 May 2019 19:15:00 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by dpdk.org (Postfix) with ESMTP id 8A971293B; Tue, 7 May 2019 19:14:58 +0200 (CEST) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x47H7nQp010286; Tue, 7 May 2019 10:14:57 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=rd3VQYUNGWgrxybE8sQwOe5ase08dyD00/4JjQIjTiw=; b=s2M5WT4gwbHjfdSht9X5KLzfBqsjllEnNPAngkVTcrw0xllclg415Tpo3w6gSN0M+UWy vg/jkUOMP/rkXmfbBM95VsQ3BcO6DHCk7PaYUUgDg8b10HJF8mwJEalILD9Ff1jHWvRP c9xekLp5bmzW2UU9YAd0xFNHUtjG+PEAeCqEjggg0YPbgfkY5zu35c+WU92iB4v5e5V6 gRTa89U32fXDw//aiNlqdM/HYmcaUs4Vsr4NrdlLFc/xj0iGZ2A210hn7hP0l4LKiuxs J0apYkh1/tSr79eK/gtXcAfJVwRiALaaBFBejGm+ONROBGsLbc3VCJfGLY9/xLHd/eC8 9w== Received: from sc-exch01.marvell.com ([199.233.58.181]) by mx0a-0016f401.pphosted.com with ESMTP id 2sb40c363s-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 07 May 2019 10:14:57 -0700 Received: from SC-EXCH01.marvell.com (10.93.176.81) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Tue, 7 May 2019 10:14:56 -0700 Received: from NAM01-BY2-obe.outbound.protection.outlook.com (104.47.34.53) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Tue, 7 May 2019 10:14:56 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector1-marvell-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rd3VQYUNGWgrxybE8sQwOe5ase08dyD00/4JjQIjTiw=; b=cpsZ8F2fyzctb1ksTOCXUPTjnjwBuNiYSESCbnVpeR58z33a+U4XQ72po69Rj3P7tRscbw7gy2dgfpc3zYTUPiFmURL43eohZuADbFYG4SHAlV+E7KP8JjmAtiPNeuVnCdK48PfZSIOPe7yohseLz7z7YvWunEOxws/aRAQ7IAo= Received: from BN6PR1801MB2052.namprd18.prod.outlook.com (10.161.157.11) by BN6PR1801MB1905.namprd18.prod.outlook.com (10.161.154.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1856.12; Tue, 7 May 2019 17:14:51 +0000 Received: from BN6PR1801MB2052.namprd18.prod.outlook.com ([fe80::99fe:90b9:ca17:a552]) by BN6PR1801MB2052.namprd18.prod.outlook.com ([fe80::99fe:90b9:ca17:a552%6]) with mapi id 15.20.1856.012; Tue, 7 May 2019 17:14:51 +0000 From: Shally Verma To: "Trahe, Fiona" , "dev@dpdk.org" CC: "akhil.goyal@nxp.com" , Ashish Gupta , "Daly, Lee" , Sunila Sahu , "stable@dpdk.org" Thread-Topic: [PATCH v2] doc/compress: clarify error handling on data-plane Thread-Index: AQHU7uRcY1b0dFOFhkSSkiDYaczK7aZB3cywgBMpTgCACwagIA== Date: Tue, 7 May 2019 17:14:50 +0000 Message-ID: References: <1554740072-11898-1-git-send-email-fiona.trahe@intel.com> <1554821747-27868-1-git-send-email-fiona.trahe@intel.com> <348A99DA5F5B7549AA880327E580B4358974F5BC@IRSMSX101.ger.corp.intel.com> In-Reply-To: <348A99DA5F5B7549AA880327E580B4358974F5BC@IRSMSX101.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [117.98.153.69] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 45b612d1-fac2-4caf-ae31-08d6d30f83f2 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600141)(711020)(4605104)(2017052603328)(7193020); SRVR:BN6PR1801MB1905; x-ms-traffictypediagnostic: BN6PR1801MB1905: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 0030839EEE x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(376002)(396003)(136003)(39860400002)(346002)(51234002)(13464003)(199004)(189003)(68736007)(53936002)(256004)(3846002)(6116002)(2906002)(2501003)(74316002)(8936002)(305945005)(86362001)(64756008)(66446008)(66946007)(66476007)(73956011)(76116006)(66556008)(52536014)(7736002)(81166006)(81156014)(8676002)(229853002)(14454004)(478600001)(476003)(6436002)(5660300002)(66066001)(99286004)(71200400001)(316002)(71190400001)(7696005)(110136005)(54906003)(14444005)(11346002)(446003)(4326008)(6246003)(25786009)(486006)(102836004)(55016002)(33656002)(9686003)(6506007)(76176011)(26005)(186003)(53546011); DIR:OUT; SFP:1101; SCL:1; SRVR:BN6PR1801MB1905; H:BN6PR1801MB2052.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: hjGFtKbZtVjkMDWwKlPdmeERTEPwUpFZ8qnzH6wslD3mrwgTlJeS/b0nzjdkUPB9KDa9JVKZGUcIXkIP3yp7yE7md+a3rlV8nvMv/RlmHivtWShE5tICQERxoRSPrIxVsxEYAP76FuPQwlzIxdv3dpYc20+OK6+lBuaHFnsCqQvxbvYUDZqLYBwL1Tl7FYDVOqBoH42Onpawp2eGa9U7kgI7tXrbnmTF8BpltaC6bkF6Bk3B8TyN8MpiNUy2JgMa7Omf2jTHx6wnsAHcbgvn9RvAM9m1FoAppdrz25ESswtJWm4bj+pz8MjkBrF9wDh3O9K3ZU465lZMfEZ/Gngorb/vRFCk6a3ORG9Znjlik/tt2Wn7dtetMLygkKwY5XeLUW8+luDbyWgLzEkIH1eA2OA6HjYcD+7s+j3ntEdiPJY= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 45b612d1-fac2-4caf-ae31-08d6d30f83f2 X-MS-Exchange-CrossTenant-originalarrivaltime: 07 May 2019 17:14:50.7965 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR1801MB1905 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-05-07_09:, , signatures=0 Subject: Re: [dpdk-dev] [PATCH v2] doc/compress: clarify error handling on data-plane X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190507171450.33JOBfr_ibp-s-evpkTR7ORNBoTwyE-H2cUl1fNe4I8@z> > -----Original Message----- > From: Trahe, Fiona > Sent: Tuesday, April 30, 2019 10:04 PM > To: Shally Verma ; dev@dpdk.org > Cc: akhil.goyal@nxp.com; Ashish Gupta ; Daly, Lee > ; Sunila Sahu ; stable@dpdk.org; > Trahe, Fiona > Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plan= e >=20 > Hi Shally, >=20 >=20 > > -----Original Message----- > > From: Shally Verma [mailto:shallyv@marvell.com] > > Sent: Thursday, April 18, 2019 1:12 PM > > To: Trahe, Fiona ; dev@dpdk.org > > Cc: akhil.goyal@nxp.com; Ashish Gupta ; Daly, Lee > > ; Sunila Sahu ; > stable@dpdk.org > > Subject: RE: [PATCH v2] doc/compress: clarify error handling on > > data-plane > > > > Hi Fiona, > > > > > > > -----Original Message----- > > > From: Fiona Trahe > > > Sent: Tuesday, April 9, 2019 8:26 PM > > > To: dev@dpdk.org > > > Cc: akhil.goyal@nxp.com; Ashish Gupta ; > > > lee.daly@intel.com; Sunila Sahu ; Shally Verma > > > ; Fiona Trahe ; > > > stable@dpdk.org > > > Subject: [PATCH v2] doc/compress: clarify error handling on > > > data-plane > > > > > > Fixed some typos and clarified which errors should be returned when > > > and why on the enqueue and dequeue APIs. > > > > > > Fixes: a584d3bea902 ("doc: add compressdev library guide") > > > cc: stable@dpdk.org > > > > > > Signed-off-by: Fiona Trahe > > > --- > > > v2 changes: > > > - changed "0 or undefined" to just "undefined" as 0 is superfluous. > > > > > > > > > doc/guides/prog_guide/compressdev.rst | 46 > > > ++++++++++++++++++++++++++++++++--- > > > 1 file changed, 43 insertions(+), 3 deletions(-) > > > > > > diff --git a/doc/guides/prog_guide/compressdev.rst > > > b/doc/guides/prog_guide/compressdev.rst > > > index ad9703753..c700dd103 100644 > > > --- a/doc/guides/prog_guide/compressdev.rst > > > +++ b/doc/guides/prog_guide/compressdev.rst > > > @@ -201,7 +201,7 @@ for stateful processing of ops. > > > Operation Status > > > ~~~~~~~~~~~~~~~~ > > > Each operation carries a status information updated by PMD after it > > > is processed. > > > -following are currently supported status: > > > +Following are currently supported: > > > > > > - RTE_COMP_OP_STATUS_SUCCESS, > > > Operation is successfully completed @@ -227,14 +227,54 @@ > > > following are currently supported status: > > > is not an error case. Output data up to op.produced can be used = and > > > next op in the stream should continue on from op.consumed+1. > > > > > > +Operation status after enqueue / dequeue > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > +Some of the above values will only arise in the op after an > > > +``rte_compressdev_enqueue_burst()``, some only after an > > > +``rte_compressdev_dequeue_burst()``. For optimal performance on the > > > +data-plane an application is not expected to check the > > > +``op.status`` of all ops after both enqueue and dequeue, it should > > > +be sufficient to only check after dequeue. To facilitate this > > > +optimisation, most errors which may reasonably be expected to occur > > > +in a production environment will be > > > returned by the PMD on the ``dequeue``. > > > +op.status may hold the following values after dequeue: > > > + > > > +- RTE_COMP_OP_STATUS_SUCCESS > > > +- RTE_COMP_OP_STATUS_ERROR > > > +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED > > > +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE > > > + > > > +There are some exceptions whereby errors can occur on the > ``enqueue``. > > > +For any error which can occur in a production environment and can > > > +be successful after a retry with the same op the PMD may return the > > > +error on the enqueue. > > This statement looks bit confusing. > > Seems like we are trying to add a description regarding op status > > check even after the enqueue call unlike current scenario, where app > > only check for it after dequeue? > [Fiona] The line following this explains that there is no need to check > op.status in this case. > Maybe it's not obvious that the application SHOULD check that all ops are > enqueued? > I can reword as: > The application should always check the value returned by the enqueue. > If less than the full burst is enqueued there's no need for the applicati= on to > check op.status of any or every op - it can simply retry from the return > value+1 in a later enqueue and expect success. >=20 I agree to purpose of patch but have these confusions when I read descript= ion above: My understand is , if op status is INVALID_ARGS or any ERROR which is perma= nent in nature, Then nb_enqd return will be less than actually passed. Regardless of whatev= er reason, if any time app gets nb_enqd < actually passed, then app should = check status of nb_enqd + 1th op to find exact cause of failure and then either attempt re-enqueue Or correct op pre= paration or take any other appropriate action. Also, STATUS_ERROR is very generic, it can be when queue is full in which c= ase app can re-attempt an enqueue of same op OR It can also indicate any irrecoverable error on enqueue, in which app just = probably has to reset everything. For such kind of case, it might not be po= ssible for PMD design to even push it into completion queue for an app to dequeue . I would sugg= est add another status code type which reflect permanent error condition i= .e. irrecoverable error code which tells an app to perform PMD qp reset/re-init to recover and simplify = description just to state an expected APP behavior to avoid infinite loop c= ondition. It is then an app choice whether or not to check for op status for error af= ter enqueue depending on whether its running in production environment or d= ev environment. Thanks Shally > If this isn't clear, can you suggest other wording - or expand on what's = not > unclear. >=20 >=20 > > So if less than the full burst is enqueued there's no > > > +need for the application to check op.status - the application can > > > +simply retry in a later enqueue and expect success. Though the > > > +application > > > is not expected to check for these, the values are as follows: > > > + > > > +- RTE_COMP_OP_STATUS_NOT_PROCESSED - could occur if a hardware > > > device's queue is full, after a dequeue a retry of the enqueue can > > > be successful. > > > + > > > +- RTE_COMP_OP_STATUS_ERROR - could occur due to out-of-memory > or > > > other transient condition which could clear after a time. > > > + > > > +Other errors may also occur on an ``enqueue``, but they are only > > > +expected to arise during development. As a retry with the same op > > > +won't be successful, if a performant application wants to avoid > > > +checking op.status on the enqueue it should ensure these never > > > +arise in a production environment, e.g. by checking device > > > +capabilities and validating > > > input parameters before sending operations. Examples are: > > > + > > > +- RTE_COMP_OP_STATUS_INVALID_ARGS > > > +- RTE_COMP_OP_STATUS_ERROR (if due to a condition which is not > > > +transient) > > How does app identify error is transient or permanent? > [Fiona] The app doesn't - it's up to the PMD to decide what the appropria= te > return is using the logic described above. If the PMD encounters an error > that's not transient, and can be reasonably expected in a production > environment, then it should forward the error to the dequeue - where it > expects the application to check for it. > Should I add this note at the end of this section? >=20 > > > > > +- RTE_COMP_OP_STATUS_INVALID_STATE > > > + > > > +If an application doesn't safeguard against these AND doesn't check > > > +the op.status of the next op which was not enqueued, but just > > > +retries, it could > > > result in an infinite loop. > > > + > > May be , you are trying to insist whenever the number_enqueued ops < > > number actually enqueued , then caller should check for status of > > num_enqueued + 1th op to know exact reason, and take corrective > measure before re-enqueuing it for following status condition: > > 1. INVALID_ARGS > > 2. INVALID_STATE > > 3 ERROR > > > > Is that correct? > [Fiona] Only if the application has a slow-path for debug in non-producti= on > code. > In production code, for high-performance, I'm saying it's reasonable not = to > do this check. >=20 > The reason for calling this out is we did have cases where errors were be= ing > returned on the enqueue, But the performance tool was not checking > op.status of enqueued+1 and so getting into an infinite loop. > Seems reasonable that a performant application would also not check. > We've removed those cases from QAT PMD, and behave according to above > logic. > But it's not described explicitly in this detail on the API - so think i= t's worth > calling out this expectation. > Does it make sense? >=20 >=20 > > > > > > > Produced, Consumed And Operation Status > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > - If status is RTE_COMP_OP_STATUS_SUCCESS, > > > consumed =3D amount of data read from input buffer, and > > > produced =3D amount of data written in destination buffer > > > -- If status is RTE_COMP_OP_STATUS_FAILURE, > > > - consumed =3D produced =3D 0 or undefined > > > +- If status is RTE_COMP_OP_STATUS_ERROR, > > > + consumed =3D produced =3D undefined > > > - If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED, > > > consumed =3D 0 and > > > produced =3D usually 0, but in decompression cases a PMD may > > > return > 0 > > > -- > > Acked. > > > > > 2.13.6