From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-AM5-obe.outbound.protection.outlook.com (mail-eopbgr00053.outbound.protection.outlook.com [40.107.0.53]) by dpdk.org (Postfix) with ESMTP id 4D1771B16C for ; Fri, 16 Feb 2018 22:22:03 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nxp.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=IRuf2SdIhDJ2w9tRhpU1q7nCNgPabUY47dFqlAFYiow=; b=tbvOINT3+rUFX3vVQizpP9wdaJKLUtT9ef0inV59MBI9vcakKfk6M2+4C89jTaUQyTWxu29kpXgEZGj1lYl/a6/cDscIIYBBCWmXuP3iW1pUtEanEAWjrXr2a98VxmEHbrGvdU3os4xWdvpD5SPMmWUhhwlImhj28qcuXTsMYn0= Received: from AM0PR0402MB3842.eurprd04.prod.outlook.com (52.133.39.138) by AM0PR0402MB3716.eurprd04.prod.outlook.com (52.133.38.156) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.485.10; Fri, 16 Feb 2018 21:22:00 +0000 Received: from AM0PR0402MB3842.eurprd04.prod.outlook.com ([fe80::28a2:ee3e:4f18:5f86]) by AM0PR0402MB3842.eurprd04.prod.outlook.com ([fe80::28a2:ee3e:4f18:5f86%13]) with mapi id 15.20.0485.015; Fri, 16 Feb 2018 21:21:58 +0000 From: Ahmed Mansour To: "Trahe, Fiona" , "Verma, Shally" , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHg== Date: Fri, 16 Feb 2018 21:21:58 +0000 Message-ID: References: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B4358931F82B@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589321277@IRSMSX101.ger.corp.intel.com> Accept-Language: en-CA, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=ahmed.mansour@nxp.com; x-originating-ip: [192.88.168.1] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM0PR0402MB3716; 6:4csr3QZAs/RKbIG0rJgrjWUgFu6vEqNR37PvOUQmoA/dztG8b1KnHkLJAsmTiOx3HS10tunRTsiE8lV8ivrsz7N07H8sNvS2y75coNTIL6km66AFRGQTqvFO9AzhFoJX6pboz0dqYE/xTUcwBKxu2Y0WUrIcVOVbAHWSnXtjRPA5JZFlxS1O8wsd+3MxzppJ3XDJmIeXm2ZmNpZsi7Y1nAVrg+888P0SF/EPrtAtmSCTEtdOx2U5EDKqUvhPqhZQPVwfZBrBQTeTd/NINX5HhXURH7Zey9cPjgmZ9l/9P4x/K0ffRH65fhwWIed+F2u7sg3GFigAfOgPyVM8uoitOg+VT/Tk1C7vM3TuiJszLWqHeICYL6x4VCF3J+7d8raO; 5:4ZTav01eXdGVHOGfL92/S1+74VyRrrN/MV8BiMBr0ausAnsDLP17EWxbcgXAsm/+KxXtcZTHf9IqdJg/ERZ2McoWbm+y5S+4qPYY1ZbPdiJqWmmnIwA/Hb0smSBdm3cVpKEc89CRPwtB7mGUMgijesbFzxZX+0vcaWExUwaUqSk=; 24:dNlRjIIT+DmqHnpnQvj7su162om6G5YomvEOmXzu4pNbayCaGV+EVIiS6B2V3QwZtkJohQ2Jc+hTiah2ALVy5ALRtoZPPv2innmx4lf7kxU=; 7:O00wRl4iBYOF894jEhE7+ZFOOlaPagv3yDiIWX01igE0w654d3APUfXacguIzMviUQp/qk7aQTYsC+AdPRr2n//qG9G8moiy+iqD4Gu0znfVyChENi4iOTHQamJs/h/2uZGmUnZt0tpv7E8oxPGL2wqFB+BI8Nqtt8yQHyp7j+wWQdJbd10yRaQYCzmApnFMxr3ziZDyHflVqkb8zzy8UJNZaxVWPabhTw1x8S4EydKztMYsGdXAnbzZWkzb5TWP x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(39860400002)(39380400002)(396003)(346002)(376002)(366004)(189003)(199004)(57704003)(13464003)(478600001)(81166006)(59450400001)(6246003)(8676002)(105586002)(53946003)(26005)(102836004)(4326008)(76176011)(97736004)(33656002)(561944003)(6436002)(53546011)(16200700003)(6506007)(316002)(5250100002)(99286004)(106356001)(186003)(7696005)(54906003)(2900100001)(53936002)(66066001)(9686003)(2906002)(93886005)(55016002)(2501003)(25786009)(68736007)(3280700002)(86362001)(3846002)(110136005)(81156014)(7736002)(5660300001)(305945005)(6116002)(14454004)(3660700001)(8936002)(229853002)(74316002)(5890100001)(579004)(559001)(569006); DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR0402MB3716; H:AM0PR0402MB3842.eurprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: cca452ec-f0b7-49b6-6ac2-08d575834fed x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(3008032)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603307)(7153060)(7193020); SRVR:AM0PR0402MB3716; x-ms-traffictypediagnostic: AM0PR0402MB3716: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(244540007438412)(20558992708506)(278428928389397)(271806183753584)(185117386973197)(211171220733660)(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040501)(2401047)(5005006)(8121501046)(3231101)(2400082)(944501161)(3002001)(93006095)(93001095)(10201501046)(6055026)(6041288)(20161123562045)(20161123564045)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(6072148)(201708071742011); SRVR:AM0PR0402MB3716; BCL:0; PCL:0; RULEID:; SRVR:AM0PR0402MB3716; x-forefront-prvs: 0585417D7B received-spf: None (protection.outlook.com: nxp.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: 8cSx2+UtRtCvJpiy8WdhZr8RS7GbDqcf66ygYFrKiVtqEn5dEV8a7hu6/kvy41G512DrWhLYnyb09nbcAMvxLg== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-Network-Message-Id: cca452ec-f0b7-49b6-6ac2-08d575834fed X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Feb 2018 21:21:58.4293 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR0402MB3716 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Feb 2018 21:22:03 -0000 >> -----Original Message-----=0A= >> From: Verma, Shally [mailto:Shally.Verma@cavium.com]=0A= >> Sent: Friday, February 16, 2018 7:17 AM=0A= >> To: Ahmed Mansour ; Trahe, Fiona ;=0A= >> dev@dpdk.org=0A= >> Cc: Athreya, Narayana Prasad ; Gupta,= Ashish=0A= >> ; Sahu, Sunila ; De Lar= a Guarch, Pablo=0A= >> ; Challa, Mahipal ; Jain, Deepak K=0A= >> ; Hemant Agrawal ; Roy = Pledge=0A= >> ; Youri Querry =0A= >> Subject: RE: [RFC v2] doc compression API for DPDK=0A= >>=0A= >> Hi Fiona, Ahmed=0A= >>=0A= >>> -----Original Message-----=0A= >>> From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]=0A= >>> Sent: 16 February 2018 02:40=0A= >>> To: Trahe, Fiona ; Verma, Shally ; dev@dpdk.org=0A= >>> Cc: Athreya, Narayana Prasad ; Gupta= , Ashish=0A= >> ; Sahu, Sunila=0A= >>> ; De Lara Guarch, Pablo ; Challa,=0A= >> Mahipal=0A= >>> ; Jain, Deepak K ; = Hemant Agrawal=0A= >> ; Roy=0A= >>> Pledge ; Youri Querry =0A= >>> Subject: Re: [RFC v2] doc compression API for DPDK=0A= >>>=0A= >>> On 2/15/2018 1:47 PM, Trahe, Fiona wrote:=0A= >>>> Hi Shally, Ahmed,=0A= >>>> Sorry for the delay in replying,=0A= >>>> Comments below=0A= >>>>=0A= >>>>> -----Original Message-----=0A= >>>>> From: Verma, Shally [mailto:Shally.Verma@cavium.com]=0A= >>>>> Sent: Wednesday, February 14, 2018 7:41 AM=0A= >>>>> To: Ahmed Mansour ; Trahe, Fiona ;=0A= >>>>> dev@dpdk.org=0A= >>>>> Cc: Athreya, Narayana Prasad ; Gup= ta, Ashish=0A= >>>>> ; Sahu, Sunila ; De = Lara Guarch, Pablo=0A= >>>>> ; Challa, Mahipal ; Jain, Deepak K=0A= >>>>> ; Hemant Agrawal ; R= oy Pledge=0A= >>>>> ; Youri Querry =0A= >>>>> Subject: RE: [RFC v2] doc compression API for DPDK=0A= >>>>>=0A= >>>>> Hi Ahmed,=0A= >>>>>=0A= >>>>>> -----Original Message-----=0A= >>>>>> From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]=0A= >>>>>> Sent: 02 February 2018 01:53=0A= >>>>>> To: Trahe, Fiona ; Verma, Shally ;=0A= >> dev@dpdk.org=0A= >>>>>> Cc: Athreya, Narayana Prasad ; Gu= pta, Ashish=0A= >>>>> ; Sahu, Sunila=0A= >>>>>> ; De Lara Guarch, Pablo ; Challa,=0A= >>>>> Mahipal=0A= >>>>>> ; Jain, Deepak K ; Hemant Agrawal=0A= >>>>> ; Roy=0A= >>>>>> Pledge ; Youri Querry = =0A= >>>>>> Subject: Re: [RFC v2] doc compression API for DPDK=0A= >>>>>>=0A= >>>>>> On 1/31/2018 2:03 PM, Trahe, Fiona wrote:=0A= >>>>>>> Hi Ahmed, Shally,=0A= >>>>>>>=0A= >>>>>>> ///snip///=0A= >>>>>>>>>>>>> D.1.1 Stateless and OUT_OF_SPACE=0A= >>>>>>>>>>>>> ------------------------------------------------=0A= >>>>>>>>>>>>> OUT_OF_SPACE is a condition when output buffer runs out of sp= ace=0A= >>>>>>>>>> and=0A= >>>>>>>>>>>> where PMD still has more data to produce. If PMD run into such= =0A= >>>>>>>>>> condition,=0A= >>>>>>>>>>>> then it's an error condition in stateless processing.=0A= >>>>>>>>>>>>> In such case, PMD resets itself and return with status=0A= >>>>>>>>>>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=3Dconsumed=3D0= =0A= >>>>>>>>>> i.e.=0A= >>>>>>>>>>>> no input read, no output written.=0A= >>>>>>>>>>>>> Application can resubmit an full input with larger output buf= fer size.=0A= >>>>>>>>>>>> [Ahmed] Can we add an option to allow the user to read the dat= a that=0A= >>>>>>>>>> was=0A= >>>>>>>>>>>> produced while still reporting OUT_OF_SPACE? this is mainly us= eful for=0A= >>>>>>>>>>>> decompression applications doing search.=0A= >>>>>>>>>>> [Shally] It is there but applicable for stateful operation type= (please refer to=0A= >>>>>>>>>> handling out_of_space under=0A= >>>>>>>>>>> "Stateful Section").=0A= >>>>>>>>>>> By definition, "stateless" here means that application (such as= IPCOMP)=0A= >>>>>>>>>> knows maximum output size=0A= >>>>>>>>>>> guaranteedly and ensure that uncompressed data size cannot grow= more=0A= >>>>>>>>>> than provided output buffer.=0A= >>>>>>>>>>> Such apps can submit an op with type =3D STATELESS and provide = full input,=0A= >>>>>>>>>> then PMD assume it has=0A= >>>>>>>>>>> sufficient input and output and thus doesn't need to maintain a= ny contexts=0A= >>>>>>>>>> after op is processed.=0A= >>>>>>>>>>> If application doesn't know about max output size, then it shou= ld process it=0A= >>>>>>>>>> as stateful op i.e. setup op=0A= >>>>>>>>>>> with type =3D STATEFUL and attach a stream so that PMD can main= tain=0A= >>>>>>>>>> relevant context to handle such=0A= >>>>>>>>>>> condition.=0A= >>>>>>>>>> [Fiona] There may be an alternative that's useful for Ahmed, whi= le still=0A= >>>>>>>>>> respecting the stateless concept.=0A= >>>>>>>>>> In Stateless case where a PMD reports OUT_OF_SPACE in decompress= ion=0A= >>>>>>>>>> case=0A= >>>>>>>>>> it could also return consumed=3D0, produced =3D x, where x>0. X = indicates the=0A= >>>>>>>>>> amount of valid data which has=0A= >>>>>>>>>> been written to the output buffer. It is not complete, but if a= n application=0A= >>>>>>>>>> wants to search it it may be sufficient.=0A= >>>>>>>>>> If the application still wants the data it must resubmit the who= le input with a=0A= >>>>>>>>>> bigger output buffer, and=0A= >>>>>>>>>> decompression will be repeated from the start, it=0A= >>>>>>>>>> cannot expect to continue on as the PMD has not maintained stat= e, history=0A= >>>>>>>>>> or data.=0A= >>>>>>>>>> I don't think there would be any need to indicate this in capabi= lities, PMDs=0A= >>>>>>>>>> which cannot provide this=0A= >>>>>>>>>> functionality would always return produced=3Dconsumed=3D0, while= PMDs which=0A= >>>>>>>>>> can could set produced > 0.=0A= >>>>>>>>>> If this works for you both, we could consider a similar case for= compression.=0A= >>>>>>>>>>=0A= >>>>>>>>> [Shally] Sounds Fine to me. Though then in that case, consume sho= uld also be updated to=0A= >> actual=0A= >>>>>>>> consumed by PMD.=0A= >>>>>>>>> Setting consumed =3D 0 with produced > 0 doesn't correlate.=0A= >>>>>>>> [Ahmed]I like Fiona's suggestion, but I also do not like the impli= cation=0A= >>>>>>>> of returning consumed =3D 0. At the same time returning consumed = =3D y=0A= >>>>>>>> implies to the user that it can proceed from the middle. I prefer = the=0A= >>>>>>>> consumed =3D 0 implementation, but I think a different return is n= eeded to=0A= >>>>>>>> distinguish it from OUT_OF_SPACE that the use can recover from. Pe= rhaps=0A= >>>>>>>> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also al= lows=0A= >>>>>>>> future PMD implementations to provide recover-ability even in STAT= ELESS=0A= >>>>>>>> mode if they so wish. In this model STATELESS or STATEFUL would be= a=0A= >>>>>>>> hint for the PMD implementation to make optimizations for each cas= e, but=0A= >>>>>>>> it does not force the PMD implementation to limit functionality if= it=0A= >>>>>>>> can provide recover-ability.=0A= >>>>>>> [Fiona] So you're suggesting the following:=0A= >>>>>>> OUT_OF_SPACE - returned only on stateful operation. Not an error. O= p.produced=0A= >>>>>>> can be used and next op in stream should continue on from op.co= nsumed+1.=0A= >>>>>>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation.=0A= >>>>>>> Error condition, no recovery possible.=0A= >>>>>>> consumed=3Dproduced=3D0. Application must resubmit all input da= ta with=0A= >>>>>>> a bigger output buffer.=0A= >>>>>>> OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, so= me recovery possible.=0A= >>>>>>> - consumed =3D 0, produced > 0. Application must resubmit all = input data with=0A= >>>>>>> a bigger output buffer. However in decompression case, data= up to produced=0A= >>>>>>> in dst buffer may be inspected/searched. Never happens in c= ompression=0A= >>>>>>> case as output data would be meaningless.=0A= >>>>>>> - consumed > 0, produced > 0. PMD has stored relevant state an= d history and so=0A= >>>>>>> can convert to stateful, using op.produced and continuing f= rom consumed+1.=0A= >>>>>>> I don't expect our PMDs to use this last case, but maybe this works= for others?=0A= >>>>>>> I'm not convinced it's not just adding complexity. It sounds like a= version of stateful=0A= >>>>>>> without a stream, and maybe less efficient?=0A= >>>>>>> If so should it respect the FLUSH flag? Which would have been FULL = or FINAL in the op.=0A= >>>>>>> Or treat it as FLUSH_NONE or SYNC? I don't know why an application = would not=0A= >>>>>>> simply have submitted a STATEFUL request if this is the behaviour i= t wants?=0A= >>>>>> [Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entire= ly=0A= >>>>>> and replacing it with=0A= >>>>>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation.=0A= >>>>>> Error condition, no recovery possible.=0A= >>>>>> - consumed=3D0 produced=3Damount of data produced. Applicatio= n must=0A= >>>>>> resubmit all input data with=0A= >>>>>> a bigger output buffer to process all of the op=0A= >>>>>> OUT_OF_SPACE_RECOVERABLE - Normally returned on stateful operation.= Not=0A= >>>>>> an error. Op.produced=0A= >>>>>> can be used and next op in stream should continue on from op.cons= umed+1.=0A= >>>>>> - consumed > 0, produced > 0. PMD has stored relevant state = and=0A= >>>>>> history and so=0A= >>>>>> can continue using op.produced and continuing from consum= ed+1.=0A= >>>>>>=0A= >>>>>> We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in ou= r=0A= >>>>>> implementation either.=0A= >>>>>>=0A= >>>>>> Regardless of speculative future PMDs. The more important aspect of = this=0A= >>>>>> for today is that the return status clearly determines=0A= >>>>>> the meaning of "consumed". If it is RECOVERABLE then consumed is=0A= >>>>>> meaningful. if it is TERMINATED then consumed in meaningless.=0A= >>>>>> This way we take away the ambiguity of having OUT_OF_SPACE mean two= =0A= >>>>>> different user work flows.=0A= >>>>>>=0A= >>>>>> A speculative future PMD may be designed to return RECOVERABLE for= =0A= >>>>>> stateless ops that are attached to streams.=0A= >>>>>> A future PMD may look to see if an op has a stream is attached and w= rite=0A= >>>>>> out the state there and go into recoverable mode.=0A= >>>>>> in essence this leaves the choice up to the implementation and allow= s=0A= >>>>>> the PMD to take advantage of stateless optimizations=0A= >>>>>> so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump= =0A= >>>>>> context as soon as it fully processes an op. It will only=0A= >>>>>> write context out in cases where the op chokes.=0A= >>>>>> This futuristic PMD should ignore the FLUSH since this STATELESS mod= e as=0A= >>>>>> indicated by the user and optimize=0A= >>>>> [Shally] IMO, it looks okay to have two separate return code TERMINAT= ED and RECOVERABLE with=0A= >>>>> definition as you mentioned and seem doable.=0A= >>>>> So then it mean all following conditions:=0A= >>>>> a. stateless with flush =3D full/final, no stream pointer provided , = PMD can return TERMINATED i.e.=0A= >> user=0A= >>>>> has to start all over again, it's a failure (as in current definition= )=0A= >>>>> b. stateless with flush =3D full/final, stream pointer provided, here= it's up to PMD to return either=0A= >>>>> TERMINATED or RECOVERABLE depending upon its ability (note if Recover= able, then PMD will=0A= >> maintain=0A= >>>>> states in stream pointer)=0A= >>>>> c. stateful with flush =3D full / NO_SYNC, stream pointer always ther= e, PMD will=0A= >>>>> TERMINATED/RECOVERABLE depending on STATEFUL_COMPRESSION/DECOMPRESSIO= N feature=0A= >> flag=0A= >>>>> enabled or not=0A= >>>> [Fiona] I don't think the flush flag is relevant - it could be out of = space on any flush flag, and if out of=0A= >> space=0A= >>>> should ignore the flush flag.=0A= >>>> Is there a need for TERMINATED? - I didn't think it would ever need to= be returned in stateful case.=0A= >>>> Why the ref to feature flag? If a PMD doesn't support a feature I thi= nk it should fail the op - not with=0A= >>>> out-of space, but unsupported or similar. Or it would fail on stream = creation.=0A= >>> [Ahmed] Agreed with Fiona. The flush flag only matters on success. By= =0A= >>> definition the PMD should return OUT_OF_SPACE_RECOVERABLE in stateful= =0A= >>> mode when it runs out of space.=0A= >>> @Shally If the user did not provide a stream, then the PMD should=0A= >>> probably return TERMINATED every time. I am not sure we should make a= =0A= >>> "really smart" PMD which returns RECOVERABLE even if no stream pointer= =0A= >>> was given. In that case the PMD must give some ID back to the caller=0A= >>> that the caller can use to "recover" the op. I am not sure how it would= =0A= >>> be implemented in the PMD and when does the PMD decide to retire stream= s=0A= >>> belonging to dead ops that the caller decided not to "recover".=0A= >>>>> and one more exception case is:=0A= >>>>> d. stateless with flush =3D full, no stream pointer provided, PMD can= return RECOVERABLE i.e. PMD=0A= >>>>> internally maintained that state somehow and consumed & produced > 0,= so user can start=0A= >> consumed+1=0A= >>>>> but there's restriction on user not to alter or change op until it is= fully processed?!=0A= >>>> [Fiona] Why the need for this case?=0A= >>>> There's always a restriction on user not to alter or change op until i= t is fully processed.=0A= >>>> If a PMD can do this - why doesn't it create a stream when that API is= called - and then it's same as b?=0A= >>> [Ahmed] Agreed. The user should not touch an op once enqueued until the= y=0A= >>> receive it in dequeue. We ignore the flush in stateless mode. We assume= =0A= >>> it to be final every time.=0A= >> [Shally] Agreed and am not in favour of supporting such implementation e= ither. Just listed out different=0A= >> possibilities up here to better visualise Ahmed requirements/applicabili= ty of TERMINATED and=0A= >> RECOVERABLE.=0A= >>=0A= >>>>> API currently takes care of case a and c, and case b can be supported= if specification accept another=0A= >>>>> proposal which mention optional usage of stream with stateless.=0A= >>>> [Fiona] API has this, but as we agreed, not optional to call the creat= e_stream() with an op_type=0A= >>>> parameter (stateful/stateless). PMD can return NULL or provide a strea= m, if the latter then that=0A= >>>> stream must be attached to ops.=0A= >>>>=0A= >>>> Until then API takes no difference to=0A= >>>>> case b and c i.e. we can have op such as,=0A= >>>>> - type=3D stateful with flush =3D full/final, stream pointer provided= , PMD can return=0A= >>>>> TERMINATED/RECOVERABLE according to its ability=0A= >>>>>=0A= >>>>> Case d , is something exceptional, if there's requirement in PMDs to = support it, then believe it will be=0A= >>>>> doable with concept of different return code.=0A= >>>>>=0A= >>>> [Fiona] That's not quite how I understood it. Can it be simpler and on= ly following cases?=0A= >>>> a. stateless with flush =3D full/final, no stream pointer provided , P= MD can return TERMINATED i.e. user=0A= >>>> has to start all over again, it's a failure (as in current definit= ion).=0A= >>>> consumed =3D 0, produced=3Damount of data produced. This is usuall= y 0, but in decompression=0A= >>>> case a PMD may return > 0 and application may find it useful to in= spect that data.=0A= >>>> b. stateless with flush =3D full/final, stream pointer provided, here = it's up to PMD to return either=0A= >>>> TERMINATED or RECOVERABLE depending upon its ability (note if Reco= verable, then PMD will=0A= >> maintain=0A= >>>> states in stream pointer)=0A= >>>> c. stateful with flush =3D any, stream pointer always there, PMD will = return RECOVERABLE.=0A= >>>> op.produced can be used and next op in stream should continue on f= rom op.consumed+1.=0A= >>>> Consumed=3D0, produced=3D0 is an unusual but allowed case. I'm not= sure if it could ever happen, but=0A= >>>> no need to change state to TERMINATED in this case. There may be u= seful state/history=0A= >>>> stored in the PMD, even though no output produced yet.=0A= >>> [Ahmed] Agreed=0A= >> [Shally] Sounds good.=0A= >>=0A= >>>>>>>>>>>>> D.2 Compression API Stateful operation=0A= >>>>>>>>>>>>> ----------------------------------------------------------=0A= >>>>>>>>>>>>> A Stateful operation in DPDK compression means application i= nvokes=0A= >>>>>>>>>>>> enqueue burst() multiple times to process related chunk of dat= a either=0A= >>>>>>>>>>>> because=0A= >>>>>>>>>>>>> - Application broke data into several ops, and/or=0A= >>>>>>>>>>>>> - PMD ran into out_of_space situation during input processing= =0A= >>>>>>>>>>>>>=0A= >>>>>>>>>>>>> In case of either one or all of the above conditions, PMD is = required to=0A= >>>>>>>>>>>> maintain state of op across enque_burst() calls and=0A= >>>>>>>>>>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin wi= th=0A= >>>>>>>>>>>> flush value =3D RTE_COMP_NO/SYNC_FLUSH and end at flush value= =0A= >>>>>>>>>>>> RTE_COMP_FULL/FINAL_FLUSH.=0A= >>>>>>>>>>>>> D.2.1 Stateful operation state maintenance=0A= >>>>>>>>>>>>> -------------------------------------------------------------= --=0A= >>>>>>>>>>>>> It is always an ideal expectation from application that it sh= ould parse=0A= >>>>>>>>>>>> through all related chunk of source data making its mbuf-chain= and=0A= >>>>>>>>>> enqueue=0A= >>>>>>>>>>>> it for stateless processing.=0A= >>>>>>>>>>>>> However, if it need to break it into several enqueue_burst() = calls, then=0A= >>>>>>>>>> an=0A= >>>>>>>>>>>> expected call flow would be something like:=0A= >>>>>>>>>>>>> enqueue_burst( |op.no_flush |)=0A= >>>>>>>>>>>> [Ahmed] The work is now in flight to the PMD.The user will cal= l dequeue=0A= >>>>>>>>>>>> burst in a loop until all ops are received. Is this correct?= =0A= >>>>>>>>>>>>=0A= >>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next=0A= >>>>>>>>>>> [Shally] Yes. Ideally every submitted op need to be dequeued. H= owever=0A= >>>>>>>>>> this illustration is specifically in=0A= >>>>>>>>>>> context of stateful op processing to reflect if a stream is bro= ken into=0A= >>>>>>>>>> chunks, then each chunk should be=0A= >>>>>>>>>>> submitted as one op at-a-time with type =3D STATEFUL and need t= o be=0A= >>>>>>>>>> dequeued first before next chunk is=0A= >>>>>>>>>>> enqueued.=0A= >>>>>>>>>>>=0A= >>>>>>>>>>>>> enqueue_burst( |op.no_flush |)=0A= >>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next=0A= >>>>>>>>>>>>> enqueue_burst( |op.full_flush |)=0A= >>>>>>>>>>>> [Ahmed] Why now allow multiple work items in flight? I underst= and that=0A= >>>>>>>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we just= =0A= >>>>>>>>>> distinguish=0A= >>>>>>>>>>>> the response in exception cases?=0A= >>>>>>>>>>> [Shally] Multiples ops are allowed in flight, however condition= is each op in=0A= >>>>>>>>>> such case is independent of=0A= >>>>>>>>>>> each other i.e. belong to different streams altogether.=0A= >>>>>>>>>>> Earlier (as part of RFC v1 doc) we did consider the proposal to= process all=0A= >>>>>>>>>> related chunks of data in single=0A= >>>>>>>>>>> burst by passing them as ops array but later found that as not-= so-useful for=0A= >>>>>>>>>> PMD handling for various=0A= >>>>>>>>>>> reasons. You may please refer to RFC v1 doc review comments for= same.=0A= >>>>>>>>>> [Fiona] Agree with Shally. In summary, as only one op can be pro= cessed at a=0A= >>>>>>>>>> time, since each needs the=0A= >>>>>>>>>> state of the previous, to allow more than 1 op to be in-flight a= t a time would=0A= >>>>>>>>>> force PMDs to implement internal queueing and exception handling= for=0A= >>>>>>>>>> OUT_OF_SPACE conditions you mention.=0A= >>>>>>>> [Ahmed] But we are putting the ops on qps which would make them=0A= >>>>>>>> sequential. Handling OUT_OF_SPACE conditions would be a little bit= more=0A= >>>>>>>> complex but doable.=0A= >>>>>>> [Fiona] In my opinion this is not doable, could be very inefficient= .=0A= >>>>>>> There may be many streams.=0A= >>>>>>> The PMD would have to have an internal queue per stream so=0A= >>>>>>> it could adjust the next src offset and length in the OUT_OF_SPACE = case.=0A= >>>>>>> And this may ripple back though all subsequent ops in the stream as= each=0A= >>>>>>> source len is increased and its dst buffer is not big enough.=0A= >>>>>> [Ahmed] Regarding multi op OUT_OF_SPACE handling.=0A= >>>>>> The caller would still need to adjust=0A= >>>>>> the src length/output buffer as you say. The PMD cannot handle=0A= >>>>>> OUT_OF_SPACE internally.=0A= >>>>>> After OUT_OF_SPACE occurs, the PMD should reject all ops in this str= eam=0A= >>>>>> until it gets explicit=0A= >>>>>> confirmation from the caller to continue working on this stream. Any= ops=0A= >>>>>> received by=0A= >>>>>> the PMD should be returned to the caller with status STREAM_PAUSED s= ince=0A= >>>>>> the caller did not=0A= >>>>>> explicitly acknowledge that it has solved the OUT_OF_SPACE issue.=0A= >>>>>> These semantics can be enabled by adding a new function to the API= =0A= >>>>>> perhaps stream_resume().=0A= >>>>>> This allows the caller to indicate that it acknowledges that it has = seen=0A= >>>>>> the issue and this op=0A= >>>>>> should be used to resolve the issue. Implementations that do not sup= port=0A= >>>>>> this mode of use=0A= >>>>>> can push back immediately after one op is in flight. Implementations= =0A= >>>>>> that support this use=0A= >>>>>> mode can allow many ops from the same session=0A= >>>>>>=0A= >>>>> [Shally] Is it still in context of having single burst where all op b= elongs to one stream? If yes, I would=0A= >> still=0A= >>>>> say it would add an overhead to PMDs especially if it is expected to = work closer to HW (which I think=0A= >> is=0A= >>>>> the case with DPDK PMD).=0A= >>>>> Though your approach is doable but why this all cannot be in a layer = above PMD? i.e. a layer above=0A= >> PMD=0A= >>>>> can either pass one-op at a time with burst size =3D 1 OR can make ch= ained mbuf of input and output=0A= >> and=0A= >>>>> pass than as one op.=0A= >>>>> Is it just to ease applications of chained mbuf burden or do you see = any performance /use-case=0A= >>>>> impacting aspect also?=0A= >>>>>=0A= >>>>> if it is in context where each op belong to different stream in a bur= st, then why do we need=0A= >>>>> stream_pause and resume? It is a expectations from app to pass more o= utput buffer with consumed=0A= >> + 1=0A= >>>>> from next call onwards as it has already=0A= >>>>> seen OUT_OF_SPACE.=0A= >>> [Ahmed] Yes, this would add extra overhead to the PMD. Our PMD=0A= >>> implementation rejects all ops that belong to a stream that has entered= =0A= >>> "RECOVERABLE" state for one reason or another. The caller must=0A= >>> acknowledge explicitly that it has received news of the problem before= =0A= >>> the PMD allows this stream to exit "RECOVERABLE" state. I agree with yo= u=0A= >>> that implementing this functionality in the software layer above the PM= D=0A= >>> is a bad idea since the latency reductions are lost.=0A= >> [Shally] Just reiterating, I rather meant other way around i.e. I see it= easier to put all such complexity in a=0A= >> layer above PMD.=0A= >>=0A= >>> This setup is useful in latency sensitive applications where the latenc= y=0A= >>> of buffering multiple ops into one op is significant. We found latency= =0A= >>> makes a significant difference in search applications where the PMD=0A= >>> competes with software decompression.=0A= > [Fiona] I see, so when all goes well, you get best-case latency, but when= =0A= > out-of-space occurs latency will probably be worse.=0A= [Ahmed] This is exactly right. This use mode assumes out-of-space is a=0A= rare occurrence. Recovering from it should take similar time to=0A= synchronous implementations. The caller gets OUT_OF_SPACE_RECOVERABLE in=0A= both sync and async use. The caller can fix up the op and send it back=0A= to the PMD to continue work just as would be done in sync. Nonetheless,=0A= the added complexity is not justifiable if out-of-space is very common=0A= since the recoverable state will be the limiting factor that forces=0A= synchronicity.=0A= >>>> [Fiona] I still have concerns with this and would not want to support = in our PMD.=0A= >>>> TO make sure I understand, you want to send a burst of ops, with sever= al from same stream.=0A= >>>> If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not proces= s any=0A= >>>> subsequent ops in that stream.=0A= >>>> Should it return them in a dequeue_burst() with status still NOT_PROCE= SSED?=0A= >>>> Or somehow drop them? How?=0A= >>>> While still processing ops form other streams.=0A= >>> [Ahmed] This is exactly correct. It should return them with=0A= >>> NOT_PROCESSED. Yes, the PMD should continue processing other streams.= =0A= >>>> As we want to offload each op to hardware with as little CPU processin= g as possible we=0A= >>>> would not want to open up each op to see which stream it's attached to= and=0A= >>>> make decisions to do per-stream storage, or drop it, or bypass hw and = dequeue without processing.=0A= >>> [Ahmed] I think I might have missed your point here, but I will try to= =0A= >>> answer. There is no need to "cushion" ops in DPDK. DPDK should send ops= =0A= >>> to the PMD and the PMD should reject until stream_continue() is called.= =0A= >>> The next op to be sent by the user will have a special marker in it to= =0A= >>> inform the PMD to continue working on this stream. Alternatively the=0A= >>> DPDK layer can be made "smarter" to fail during the enqueue by checking= =0A= >>> the stream and its state, but like you say this adds additional CPU=0A= >>> overhead during the enqueue.=0A= >>> I am curious. In a simple synchronous use case. How do we prevent users= =0A= >> >from putting multiple ops in flight that belong to a single stream? Do= =0A= >>> we just currently say it is undefined behavior? Otherwise we would have= =0A= >>> to check the stream and incur the CPU overhead.=0A= > [Fiona] We don't do anything to prevent it. It's undefined. IMO on data p= ath in=0A= > DPDK model we expect good behaviour and don't have to error check for thi= ngs like this.=0A= [Ahmed] This makes sense. We also assume good behavior.=0A= > In our PMD if we got a burst of 20 ops, we allocate 20 spaces on the hw q= , then =0A= > build and send those messages. If we found an op from a stream which alre= ady=0A= > had one inflight, we'd have to hold that back, store in a sw stream-speci= fic holding queue,=0A= > only send 19 to hw. We cannot send multiple ops from same stream to=0A= > the hw as it fans them out and does them in parallel.=0A= > Once the enqueue_burst() returns, there is no processing =0A= > context which would spot that the first has completed=0A= > and send the next op to the hw. On a dequeue_burst() we would spot this, = =0A= > in that context could process the next op in the stream.=0A= > On out of space, instead of processing the next op we would have to trans= fer=0A= > all unprocessed ops from the stream to the dequeue result.=0A= > Some parts of this are doable, but seems likely to add a lot more latency= , =0A= > we'd need to add extra threads and timers to move ops from the sw=0A= > queue to the hw q to get any benefit, and these constructs would add =0A= > context switching and CPU cycles. So we prefer to push this responsibilit= y=0A= > to above the API and it can achieve similar.=0A= [Ahmed] I see what you mean. Our workflow is almost exactly the same=0A= with our hardware, but the fanning out is done by the hardware based on=0A= the stream and ops that belong to the same stream are never allowed to=0A= go out of order. Otherwise the data would be corrupted. Likewise the=0A= hardware is responsible for checking the state of the stream and=0A= returning frames as NOT_PROCESSED to the software=0A= >>>> Maybe we could add a capability if this behaviour is important for you= ?=0A= >>>> e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ?=0A= >>>> Our PMD would set this to 0. And expect no more than one op from a sta= teful stream=0A= >>>> to be in flight at any time.=0A= >>> [Ahmed] That makes sense. This way the different DPDK implementations d= o=0A= >>> not have to add extra checking for unsupported cases.=0A= >> [Shally] @ahmed, If I summarise your use-case, this is how to want to PM= D to support?=0A= >> - a burst *carry only one stream* and all ops then assumed to be belong = to that stream? (please note,=0A= >> here burst is not carrying more than one stream)=0A= [Ahmed] No. In this use case the caller sets up an op and enqueues a=0A= single op. Then before the response comes back from the PMD the caller=0A= enqueues a second op on the same stream.=0A= >> -PMD will submit one op at a time to HW?=0A= [Ahmed] I misunderstood what PMD means. I used it throughout to mean the=0A= HW. I used DPDK to mean the software implementation that talks to the=0A= hardware.=0A= The software will submit all ops immediately. The hardware has to figure=0A= out what to do with the ops depending on what stream they belong to.=0A= >> -if processed successfully, push it back to completion queue with status= =3D SUCCESS. If failed or run to=0A= >> into OUT_OF_SPACE, then push it to completion queue with status =3D FAIL= URE/=0A= >> OUT_OF_SPACE_RECOVERABLE and rest with status =3D NOT_PROCESSED and retu= rn with enqueue count=0A= >> =3D total # of ops submitted originally with burst?=0A= [Ahmed] This is exactly what I had in mind. all ops will be submitted to=0A= the HW. The HW will put all of them on the completion queue with the=0A= correct status exactly as you say.=0A= >> -app assumes all have been enqueued, so it go and dequeue all ops=0A= >> -on seeing an op with OUT_OF_SPACE_RECOVERABLE, app resubmit a burst of = ops with call to=0A= >> stream_continue/resume API starting from op which encountered OUT_OF_SPA= CE and others as=0A= >> NOT_PROCESSED with updated input and output buffer?=0A= [Ahmed] Correct this is what we do today in our proprietary API.=0A= >> -repeat until *all* are dequeued with status =3D SUCCESS or *any* with s= tatus =3D FAILURE? If anytime=0A= >> failure is seen, then app start whole processing all over again or just = drop this burst?!=0A= [Ahmed] The app has the choice on how to proceed. If the issue is=0A= recoverable then the application can continue this stream from where it=0A= stopped. if the failure is unrecoverable then the application should=0A= first fix the problem and start from the beginning of the stream.=0A= >> If all of above is true, then I think we should add another API such as = rte_comp_enque_single_stream()=0A= >> which will be functional under Feature Flag =3D ALLOW_ENQUEUE_MULTIPLE_S= TATEFUL_OPS or better=0A= >> name is SUPPORT_ENQUEUE_SINGLE_STREAM?!=0A= [Ahmed] The main advantage in async use is lost if we force all related=0A= ops to be in the same burst. if we do that, then we might as well merge=0A= all the ops into one op. That would reduce the overhead.=0A= The use mode I am proposing is only useful in cases where the data=0A= becomes available after the first enqueue occurred. I want to allow the=0A= caller to enqueue the second set of data as soon as it is available=0A= regardless of whether or not the HW has already started working on the=0A= first op inflight.=0A= > [Fiona] Am curious about Ahmed's response to this. I didn't get that a bu= rst should carry only one stream=0A= > Or get how this makes a difference? As there can be many enqueue_burst() = calls done before an dequeue_burst() =0A= > Maybe you're thinking the enqueue_burst() would be a blocking call that w= ould not return until all the ops=0A= > had been processed? This would turn it into a synchronous call which isn'= t the intent.=0A= [Ahmed] Agreed, a blocking or even a buffering software layer that baby=0A= sits the hardware does not fundamentally change the parameters of the=0A= system as a whole. It just moves workflow management complexity down=0A= into the DPDK software layer. Rather there are real latency and=0A= throughput advantages (because of caching) that I want to expose.=0A= =0A= /// snip ///=0A= =0A=