From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0054.outbound.protection.outlook.com [104.47.0.54]) by dpdk.org (Postfix) with ESMTP id D28BE1B2C0 for ; Thu, 15 Feb 2018 22:09:55 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nxp.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=OMJu/3vY6X6zqKAO/Uv+6e/if/nkoR/BppMj8JhSKTo=; b=hR/DKsGDR27h407Kg73ClldW1F4k1LG7n66td4bmEyInyj0oD+sMWFS0oo/UTzCJBIh6D3ktVAfj91iLelDgpcWMklThTqLkmrzOUt5mYy+VxaH4WRv7igKCrrbhIdYpah2pMFAE9nFO6fhA7gojzmREIGYd/9+5DCh00ZffVpE= Received: from AM0PR0402MB3842.eurprd04.prod.outlook.com (52.133.39.138) by AM0PR0402MB3682.eurprd04.prod.outlook.com (52.133.38.146) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.485.10; Thu, 15 Feb 2018 21:09:51 +0000 Received: from AM0PR0402MB3842.eurprd04.prod.outlook.com ([fe80::28a2:ee3e:4f18:5f86]) by AM0PR0402MB3842.eurprd04.prod.outlook.com ([fe80::28a2:ee3e:4f18:5f86%13]) with mapi id 15.20.0485.015; Thu, 15 Feb 2018 21:09:35 +0000 From: Ahmed Mansour To: "Trahe, Fiona" , "Verma, Shally" , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHg== Date: Thu, 15 Feb 2018 21:09:35 +0000 Message-ID: References: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B4358931F82B@IRSMSX101.ger.corp.intel.com> Accept-Language: en-CA, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=ahmed.mansour@nxp.com; x-originating-ip: [192.88.168.1] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM0PR0402MB3682; 6:adX5lpwwrJf+hxeMfini0jecYkSQR/wPR9ADt0z+NYBJnzF8b8DpGjy22kIxdg/YnEtGCuSp1OEQaWh4R1jP0PPp509zhmRYWpoKnDJMax03GT8K8J4fpqapln/4VG4MpQazQuI7FA0/71k1sJcwXpn2s4rQheOlRzGkggW+4szSQPxxq1R48X7iId26ETqQzblk2iDX1WL9aAfmSQ7izSkvVc25d36SHC8bEJVCbz1cqHGDiLBcx9Ez0MioRdPNXD3ASO6eiVWdUfba4dMNmYartku7XBV2FaLJHAirYH7CogZLgiedkelttL0F/6JwTltIT2dVF8FlLoqv+33sv4sYMl5+gMH2VOW2PFa98utvtCly17lAotMmZi9ummJV; 5:O6RCJDZVJo5qUu4/J+VA1KuDOYkvKjASRS9zHAY4DyPkgbnxa6NM4G2g1t+fbRjxIxWoowGaaj9Es2C7v2gUB0rOICjerWkZV6fuvtR3gHEBMpdWgQztsjDvvClZIjB0oj4gbb1TpLKtmbS0CyuZViLptMg/50OaMQnj5myROV4=; 24:9oKCJEYJCPKh7wx0B87syFd508p7OdNql5OejZ9U46IAy6cYL66StjoVX/3r/Zl/xWndoUbo7CEizF3zWHv1scq1YMzkMDQX5JPjH50J6sk=; 7:/7V7V2epeHF/e/nkOVhCzE5PXPvw9Moae3egYaXw2jCNUPpZrAdw/4NOiBLC+9Zuemd1x0oE93Hf6SYcaVva+FA04/Uj7jhiDb22uViT/rYxJbRLUT3xv3vBUJrw4OSLeoM0Yeh84KvxcbPs0wvtTIs2D55fF4zeIN/1VTEqBSJmqZEUGYbphTFJfXEoS9az0NXziewRhZfb9hj3Vl/xZyUlgC52w7nc3gvLI1PInL14aS7xP7S3HFj/A/8uS1Ct x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(39860400002)(396003)(39380400002)(376002)(366004)(346002)(189003)(199004)(51444003)(13464003)(57704003)(68736007)(33656002)(14454004)(478600001)(7736002)(561944003)(2906002)(4326008)(305945005)(106356001)(110136005)(316002)(54906003)(74316002)(81166006)(25786009)(66066001)(8936002)(2900100001)(8676002)(93886005)(5660300001)(59450400001)(7696005)(76176011)(3660700001)(3846002)(2501003)(53936002)(53546011)(6506007)(102836004)(5250100002)(26005)(5890100001)(6116002)(6246003)(6436002)(81156014)(229853002)(9686003)(86362001)(105586002)(53946003)(3280700002)(97736004)(55016002)(99286004)(186003)(559001)(579004); DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR0402MB3682; H:AM0PR0402MB3842.eurprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 765c2988-5c0a-44f3-09d4-08d574b86a85 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(3008032)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603307)(7153060)(7193020); SRVR:AM0PR0402MB3682; x-ms-traffictypediagnostic: AM0PR0402MB3682: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(60795455431006)(20558992708506)(278428928389397)(185117386973197)(211171220733660)(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040501)(2401047)(5005006)(8121501046)(93006095)(93001095)(10201501046)(3231101)(944501161)(3002001)(6055026)(6041288)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123558120)(20161123564045)(6072148)(201708071742011); SRVR:AM0PR0402MB3682; BCL:0; PCL:0; RULEID:; SRVR:AM0PR0402MB3682; x-forefront-prvs: 058441C12A received-spf: None (protection.outlook.com: nxp.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: kwHvtcKnxCV89baY5OeoPpANiLJMlZRxQfqzSc8jZ0K4U/BycAK7PwAUUHkJH1nRQUdM0nHYg7B+Ews17hSfOw== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-Network-Message-Id: 765c2988-5c0a-44f3-09d4-08d574b86a85 X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Feb 2018 21:09:35.1763 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR0402MB3682 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Feb 2018 21:09:56 -0000 On 2/15/2018 1:47 PM, Trahe, Fiona wrote:=0A= > Hi Shally, Ahmed, =0A= > Sorry for the delay in replying,=0A= > Comments below=0A= >=0A= >> -----Original Message-----=0A= >> From: Verma, Shally [mailto:Shally.Verma@cavium.com]=0A= >> Sent: Wednesday, February 14, 2018 7:41 AM=0A= >> To: Ahmed Mansour ; Trahe, Fiona ;=0A= >> dev@dpdk.org=0A= >> Cc: Athreya, Narayana Prasad ; Gupta,= Ashish=0A= >> ; Sahu, Sunila ; De Lar= a Guarch, Pablo=0A= >> ; Challa, Mahipal ; Jain, Deepak K=0A= >> ; Hemant Agrawal ; Roy = Pledge=0A= >> ; Youri Querry =0A= >> Subject: RE: [RFC v2] doc compression API for DPDK=0A= >>=0A= >> Hi Ahmed,=0A= >>=0A= >>> -----Original Message-----=0A= >>> From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]=0A= >>> Sent: 02 February 2018 01:53=0A= >>> To: Trahe, Fiona ; Verma, Shally ; dev@dpdk.org=0A= >>> Cc: Athreya, Narayana Prasad ; Gupta= , Ashish=0A= >> ; Sahu, Sunila=0A= >>> ; De Lara Guarch, Pablo ; Challa,=0A= >> Mahipal=0A= >>> ; Jain, Deepak K ; = Hemant Agrawal=0A= >> ; Roy=0A= >>> Pledge ; Youri Querry =0A= >>> Subject: Re: [RFC v2] doc compression API for DPDK=0A= >>>=0A= >>> On 1/31/2018 2:03 PM, Trahe, Fiona wrote:=0A= >>>> Hi Ahmed, Shally,=0A= >>>>=0A= >>>> ///snip///=0A= >>>>>>>>>> D.1.1 Stateless and OUT_OF_SPACE=0A= >>>>>>>>>> ------------------------------------------------=0A= >>>>>>>>>> OUT_OF_SPACE is a condition when output buffer runs out of space= =0A= >>>>>>> and=0A= >>>>>>>>> where PMD still has more data to produce. If PMD run into such=0A= >>>>>>> condition,=0A= >>>>>>>>> then it's an error condition in stateless processing.=0A= >>>>>>>>>> In such case, PMD resets itself and return with status=0A= >>>>>>>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=3Dconsumed=3D0=0A= >>>>>>> i.e.=0A= >>>>>>>>> no input read, no output written.=0A= >>>>>>>>>> Application can resubmit an full input with larger output buffer= size.=0A= >>>>>>>>> [Ahmed] Can we add an option to allow the user to read the data t= hat=0A= >>>>>>> was=0A= >>>>>>>>> produced while still reporting OUT_OF_SPACE? this is mainly usefu= l for=0A= >>>>>>>>> decompression applications doing search.=0A= >>>>>>>> [Shally] It is there but applicable for stateful operation type (p= lease refer to=0A= >>>>>>> handling out_of_space under=0A= >>>>>>>> "Stateful Section").=0A= >>>>>>>> By definition, "stateless" here means that application (such as IP= COMP)=0A= >>>>>>> knows maximum output size=0A= >>>>>>>> guaranteedly and ensure that uncompressed data size cannot grow mo= re=0A= >>>>>>> than provided output buffer.=0A= >>>>>>>> Such apps can submit an op with type =3D STATELESS and provide ful= l input,=0A= >>>>>>> then PMD assume it has=0A= >>>>>>>> sufficient input and output and thus doesn't need to maintain any = contexts=0A= >>>>>>> after op is processed.=0A= >>>>>>>> If application doesn't know about max output size, then it should = process it=0A= >>>>>>> as stateful op i.e. setup op=0A= >>>>>>>> with type =3D STATEFUL and attach a stream so that PMD can maintai= n=0A= >>>>>>> relevant context to handle such=0A= >>>>>>>> condition.=0A= >>>>>>> [Fiona] There may be an alternative that's useful for Ahmed, while = still=0A= >>>>>>> respecting the stateless concept.=0A= >>>>>>> In Stateless case where a PMD reports OUT_OF_SPACE in decompression= =0A= >>>>>>> case=0A= >>>>>>> it could also return consumed=3D0, produced =3D x, where x>0. X ind= icates the=0A= >>>>>>> amount of valid data which has=0A= >>>>>>> been written to the output buffer. It is not complete, but if an a= pplication=0A= >>>>>>> wants to search it it may be sufficient.=0A= >>>>>>> If the application still wants the data it must resubmit the whole = input with a=0A= >>>>>>> bigger output buffer, and=0A= >>>>>>> decompression will be repeated from the start, it=0A= >>>>>>> cannot expect to continue on as the PMD has not maintained state, = history=0A= >>>>>>> or data.=0A= >>>>>>> I don't think there would be any need to indicate this in capabilit= ies, PMDs=0A= >>>>>>> which cannot provide this=0A= >>>>>>> functionality would always return produced=3Dconsumed=3D0, while PM= Ds which=0A= >>>>>>> can could set produced > 0.=0A= >>>>>>> If this works for you both, we could consider a similar case for co= mpression.=0A= >>>>>>>=0A= >>>>>> [Shally] Sounds Fine to me. Though then in that case, consume should= also be updated to actual=0A= >>>>> consumed by PMD.=0A= >>>>>> Setting consumed =3D 0 with produced > 0 doesn't correlate.=0A= >>>>> [Ahmed]I like Fiona's suggestion, but I also do not like the implicat= ion=0A= >>>>> of returning consumed =3D 0. At the same time returning consumed =3D = y=0A= >>>>> implies to the user that it can proceed from the middle. I prefer the= =0A= >>>>> consumed =3D 0 implementation, but I think a different return is need= ed to=0A= >>>>> distinguish it from OUT_OF_SPACE that the use can recover from. Perha= ps=0A= >>>>> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allow= s=0A= >>>>> future PMD implementations to provide recover-ability even in STATELE= SS=0A= >>>>> mode if they so wish. In this model STATELESS or STATEFUL would be a= =0A= >>>>> hint for the PMD implementation to make optimizations for each case, = but=0A= >>>>> it does not force the PMD implementation to limit functionality if it= =0A= >>>>> can provide recover-ability.=0A= >>>> [Fiona] So you're suggesting the following:=0A= >>>> OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.p= roduced=0A= >>>> can be used and next op in stream should continue on from op.consu= med+1.=0A= >>>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation.=0A= >>>> Error condition, no recovery possible.=0A= >>>> consumed=3Dproduced=3D0. Application must resubmit all input data = with=0A= >>>> a bigger output buffer.=0A= >>>> OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some = recovery possible.=0A= >>>> - consumed =3D 0, produced > 0. Application must resubmit all inp= ut data with=0A= >>>> a bigger output buffer. However in decompression case, data up= to produced=0A= >>>> in dst buffer may be inspected/searched. Never happens in comp= ression=0A= >>>> case as output data would be meaningless.=0A= >>>> - consumed > 0, produced > 0. PMD has stored relevant state and h= istory and so=0A= >>>> can convert to stateful, using op.produced and continuing from= consumed+1.=0A= >>>> I don't expect our PMDs to use this last case, but maybe this works fo= r others?=0A= >>>> I'm not convinced it's not just adding complexity. It sounds like a ve= rsion of stateful=0A= >>>> without a stream, and maybe less efficient?=0A= >>>> If so should it respect the FLUSH flag? Which would have been FULL or = FINAL in the op.=0A= >>>> Or treat it as FLUSH_NONE or SYNC? I don't know why an application wou= ld not=0A= >>>> simply have submitted a STATEFUL request if this is the behaviour it w= ants?=0A= >>> [Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entirely= =0A= >>> and replacing it with=0A= >>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation.=0A= >>> Error condition, no recovery possible.=0A= >>> - consumed=3D0 produced=3Damount of data produced. Application m= ust=0A= >>> resubmit all input data with=0A= >>> a bigger output buffer to process all of the op=0A= >>> OUT_OF_SPACE_RECOVERABLE - Normally returned on stateful operation. No= t=0A= >>> an error. Op.produced=0A= >>> can be used and next op in stream should continue on from op.consume= d+1.=0A= >>> - consumed > 0, produced > 0. PMD has stored relevant state and= =0A= >>> history and so=0A= >>> can continue using op.produced and continuing from consumed+= 1.=0A= >>>=0A= >>> We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in our= =0A= >>> implementation either.=0A= >>>=0A= >>> Regardless of speculative future PMDs. The more important aspect of thi= s=0A= >>> for today is that the return status clearly determines=0A= >>> the meaning of "consumed". If it is RECOVERABLE then consumed is=0A= >>> meaningful. if it is TERMINATED then consumed in meaningless.=0A= >>> This way we take away the ambiguity of having OUT_OF_SPACE mean two=0A= >>> different user work flows.=0A= >>>=0A= >>> A speculative future PMD may be designed to return RECOVERABLE for=0A= >>> stateless ops that are attached to streams.=0A= >>> A future PMD may look to see if an op has a stream is attached and writ= e=0A= >>> out the state there and go into recoverable mode.=0A= >>> in essence this leaves the choice up to the implementation and allows= =0A= >>> the PMD to take advantage of stateless optimizations=0A= >>> so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump=0A= >>> context as soon as it fully processes an op. It will only=0A= >>> write context out in cases where the op chokes.=0A= >>> This futuristic PMD should ignore the FLUSH since this STATELESS mode a= s=0A= >>> indicated by the user and optimize=0A= >> [Shally] IMO, it looks okay to have two separate return code TERMINATED = and RECOVERABLE with=0A= >> definition as you mentioned and seem doable.=0A= >> So then it mean all following conditions:=0A= >> a. stateless with flush =3D full/final, no stream pointer provided , PMD= can return TERMINATED i.e. user=0A= >> has to start all over again, it's a failure (as in current definition)= =0A= >> b. stateless with flush =3D full/final, stream pointer provided, here it= 's up to PMD to return either=0A= >> TERMINATED or RECOVERABLE depending upon its ability (note if Recoverabl= e, then PMD will maintain=0A= >> states in stream pointer)=0A= >> c. stateful with flush =3D full / NO_SYNC, stream pointer always there, = PMD will=0A= >> TERMINATED/RECOVERABLE depending on STATEFUL_COMPRESSION/DECOMPRESSION f= eature flag=0A= >> enabled or not=0A= > [Fiona] I don't think the flush flag is relevant - it could be out of spa= ce on any flush flag, and if out of space=0A= > should ignore the flush flag. =0A= > Is there a need for TERMINATED? - I didn't think it would ever need to be= returned in stateful case.=0A= > Why the ref to feature flag? If a PMD doesn't support a feature I think = it should fail the op - not with=0A= > out-of space, but unsupported or similar. Or it would fail on stream cre= ation.=0A= [Ahmed] Agreed with Fiona. The flush flag only matters on success. By=0A= definition the PMD should return OUT_OF_SPACE_RECOVERABLE in stateful=0A= mode when it runs out of space.=0A= @Shally If the user did not provide a stream, then the PMD should=0A= probably return TERMINATED every time. I am not sure we should make a=0A= "really smart" PMD which returns RECOVERABLE even if no stream pointer=0A= was given. In that case the PMD must give some ID back to the caller=0A= that the caller can use to "recover" the op. I am not sure how it would=0A= be implemented in the PMD and when does the PMD decide to retire streams=0A= belonging to dead ops that the caller decided not to "recover".=0A= >=0A= >> and one more exception case is:=0A= >> d. stateless with flush =3D full, no stream pointer provided, PMD can re= turn RECOVERABLE i.e. PMD=0A= >> internally maintained that state somehow and consumed & produced > 0, so= user can start consumed+1=0A= >> but there's restriction on user not to alter or change op until it is fu= lly processed?!=0A= > [Fiona] Why the need for this case? =0A= > There's always a restriction on user not to alter or change op until it i= s fully processed.=0A= > If a PMD can do this - why doesn't it create a stream when that API is ca= lled - and then it's same as b?=0A= [Ahmed] Agreed. The user should not touch an op once enqueued until they=0A= receive it in dequeue. We ignore the flush in stateless mode. We assume=0A= it to be final every time.=0A= >=0A= >> API currently takes care of case a and c, and case b can be supported if= specification accept another=0A= >> proposal which mention optional usage of stream with stateless.=0A= > [Fiona] API has this, but as we agreed, not optional to call the create_s= tream() with an op_type =0A= > parameter (stateful/stateless). PMD can return NULL or provide a stream, = if the latter then that =0A= > stream must be attached to ops.=0A= >=0A= > Until then API takes no difference to=0A= >> case b and c i.e. we can have op such as,=0A= >> - type=3D stateful with flush =3D full/final, stream pointer provided, P= MD can return=0A= >> TERMINATED/RECOVERABLE according to its ability=0A= >>=0A= >> Case d , is something exceptional, if there's requirement in PMDs to sup= port it, then believe it will be=0A= >> doable with concept of different return code.=0A= >>=0A= > [Fiona] That's not quite how I understood it. Can it be simpler and only = following cases?=0A= > a. stateless with flush =3D full/final, no stream pointer provided , PMD = can return TERMINATED i.e. user=0A= > has to start all over again, it's a failure (as in current definition= ). =0A= > consumed =3D 0, produced=3Damount of data produced. This is usually 0= , but in decompression =0A= > case a PMD may return > 0 and application may find it useful to inspe= ct that data.=0A= > b. stateless with flush =3D full/final, stream pointer provided, here it'= s up to PMD to return either=0A= > TERMINATED or RECOVERABLE depending upon its ability (note if Recover= able, then PMD will maintain=0A= > states in stream pointer)=0A= > c. stateful with flush =3D any, stream pointer always there, PMD will ret= urn RECOVERABLE.=0A= > op.produced can be used and next op in stream should continue on from= op.consumed+1.=0A= > Consumed=3D0, produced=3D0 is an unusual but allowed case. I'm not su= re if it could ever happen, but=0A= > no need to change state to TERMINATED in this case. There may be usef= ul state/history =0A= > stored in the PMD, even though no output produced yet.=0A= [Ahmed] Agreed=0A= >=0A= >>>>>>>>>> D.2 Compression API Stateful operation=0A= >>>>>>>>>> ----------------------------------------------------------=0A= >>>>>>>>>> A Stateful operation in DPDK compression means application invo= kes=0A= >>>>>>>>> enqueue burst() multiple times to process related chunk of data e= ither=0A= >>>>>>>>> because=0A= >>>>>>>>>> - Application broke data into several ops, and/or=0A= >>>>>>>>>> - PMD ran into out_of_space situation during input processing=0A= >>>>>>>>>>=0A= >>>>>>>>>> In case of either one or all of the above conditions, PMD is req= uired to=0A= >>>>>>>>> maintain state of op across enque_burst() calls and=0A= >>>>>>>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with= =0A= >>>>>>>>> flush value =3D RTE_COMP_NO/SYNC_FLUSH and end at flush value=0A= >>>>>>>>> RTE_COMP_FULL/FINAL_FLUSH.=0A= >>>>>>>>>> D.2.1 Stateful operation state maintenance=0A= >>>>>>>>>> ---------------------------------------------------------------= =0A= >>>>>>>>>> It is always an ideal expectation from application that it shoul= d parse=0A= >>>>>>>>> through all related chunk of source data making its mbuf-chain an= d=0A= >>>>>>> enqueue=0A= >>>>>>>>> it for stateless processing.=0A= >>>>>>>>>> However, if it need to break it into several enqueue_burst() cal= ls, then=0A= >>>>>>> an=0A= >>>>>>>>> expected call flow would be something like:=0A= >>>>>>>>>> enqueue_burst( |op.no_flush |)=0A= >>>>>>>>> [Ahmed] The work is now in flight to the PMD.The user will call d= equeue=0A= >>>>>>>>> burst in a loop until all ops are received. Is this correct?=0A= >>>>>>>>>=0A= >>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next=0A= >>>>>>>> [Shally] Yes. Ideally every submitted op need to be dequeued. Howe= ver=0A= >>>>>>> this illustration is specifically in=0A= >>>>>>>> context of stateful op processing to reflect if a stream is broken= into=0A= >>>>>>> chunks, then each chunk should be=0A= >>>>>>>> submitted as one op at-a-time with type =3D STATEFUL and need to b= e=0A= >>>>>>> dequeued first before next chunk is=0A= >>>>>>>> enqueued.=0A= >>>>>>>>=0A= >>>>>>>>>> enqueue_burst( |op.no_flush |)=0A= >>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next=0A= >>>>>>>>>> enqueue_burst( |op.full_flush |)=0A= >>>>>>>>> [Ahmed] Why now allow multiple work items in flight? I understand= that=0A= >>>>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we just=0A= >>>>>>> distinguish=0A= >>>>>>>>> the response in exception cases?=0A= >>>>>>>> [Shally] Multiples ops are allowed in flight, however condition is= each op in=0A= >>>>>>> such case is independent of=0A= >>>>>>>> each other i.e. belong to different streams altogether.=0A= >>>>>>>> Earlier (as part of RFC v1 doc) we did consider the proposal to pr= ocess all=0A= >>>>>>> related chunks of data in single=0A= >>>>>>>> burst by passing them as ops array but later found that as not-so-= useful for=0A= >>>>>>> PMD handling for various=0A= >>>>>>>> reasons. You may please refer to RFC v1 doc review comments for sa= me.=0A= >>>>>>> [Fiona] Agree with Shally. In summary, as only one op can be proces= sed at a=0A= >>>>>>> time, since each needs the=0A= >>>>>>> state of the previous, to allow more than 1 op to be in-flight at a= time would=0A= >>>>>>> force PMDs to implement internal queueing and exception handling fo= r=0A= >>>>>>> OUT_OF_SPACE conditions you mention.=0A= >>>>> [Ahmed] But we are putting the ops on qps which would make them=0A= >>>>> sequential. Handling OUT_OF_SPACE conditions would be a little bit mo= re=0A= >>>>> complex but doable.=0A= >>>> [Fiona] In my opinion this is not doable, could be very inefficient.= =0A= >>>> There may be many streams.=0A= >>>> The PMD would have to have an internal queue per stream so=0A= >>>> it could adjust the next src offset and length in the OUT_OF_SPACE cas= e.=0A= >>>> And this may ripple back though all subsequent ops in the stream as ea= ch=0A= >>>> source len is increased and its dst buffer is not big enough.=0A= >>> [Ahmed] Regarding multi op OUT_OF_SPACE handling.=0A= >>> The caller would still need to adjust=0A= >>> the src length/output buffer as you say. The PMD cannot handle=0A= >>> OUT_OF_SPACE internally.=0A= >>> After OUT_OF_SPACE occurs, the PMD should reject all ops in this stream= =0A= >>> until it gets explicit=0A= >>> confirmation from the caller to continue working on this stream. Any op= s=0A= >>> received by=0A= >>> the PMD should be returned to the caller with status STREAM_PAUSED sinc= e=0A= >>> the caller did not=0A= >>> explicitly acknowledge that it has solved the OUT_OF_SPACE issue.=0A= >>> These semantics can be enabled by adding a new function to the API=0A= >>> perhaps stream_resume().=0A= >>> This allows the caller to indicate that it acknowledges that it has see= n=0A= >>> the issue and this op=0A= >>> should be used to resolve the issue. Implementations that do not suppor= t=0A= >>> this mode of use=0A= >>> can push back immediately after one op is in flight. Implementations=0A= >>> that support this use=0A= >>> mode can allow many ops from the same session=0A= >>>=0A= >> [Shally] Is it still in context of having single burst where all op belo= ngs to one stream? If yes, I would still=0A= >> say it would add an overhead to PMDs especially if it is expected to wor= k closer to HW (which I think is=0A= >> the case with DPDK PMD).=0A= >> Though your approach is doable but why this all cannot be in a layer abo= ve PMD? i.e. a layer above PMD=0A= >> can either pass one-op at a time with burst size =3D 1 OR can make chain= ed mbuf of input and output and=0A= >> pass than as one op.=0A= >> Is it just to ease applications of chained mbuf burden or do you see any= performance /use-case=0A= >> impacting aspect also?=0A= >>=0A= >> if it is in context where each op belong to different stream in a burst,= then why do we need=0A= >> stream_pause and resume? It is a expectations from app to pass more outp= ut buffer with consumed + 1=0A= >> from next call onwards as it has already=0A= >> seen OUT_OF_SPACE.=0A= [Ahmed] Yes, this would add extra overhead to the PMD. Our PMD=0A= implementation rejects all ops that belong to a stream that has entered=0A= "RECOVERABLE" state for one reason or another. The caller must=0A= acknowledge explicitly that it has received news of the problem before=0A= the PMD allows this stream to exit "RECOVERABLE" state. I agree with you=0A= that implementing this functionality in the software layer above the PMD=0A= is a bad idea since the latency reductions are lost.=0A= This setup is useful in latency sensitive applications where the latency=0A= of buffering multiple ops into one op is significant. We found latency=0A= makes a significant difference in search applications where the PMD=0A= competes with software decompression.=0A= > [Fiona] I still have concerns with this and would not want to support in = our PMD.=0A= > TO make sure I understand, you want to send a burst of ops, with several = from same stream.=0A= > If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not process a= ny =0A= > subsequent ops in that stream. =0A= > Should it return them in a dequeue_burst() with status still NOT_PROCESSE= D?=0A= > Or somehow drop them? How?=0A= > While still processing ops form other streams.=0A= [Ahmed] This is exactly correct. It should return them with=0A= NOT_PROCESSED. Yes, the PMD should continue processing other streams.=0A= > As we want to offload each op to hardware with as little CPU processing a= s possible we=0A= > would not want to open up each op to see which stream it's attached to an= d=0A= > make decisions to do per-stream storage, or drop it, or bypass hw and deq= ueue without processing.=0A= [Ahmed] I think I might have missed your point here, but I will try to=0A= answer. There is no need to "cushion" ops in DPDK. DPDK should send ops=0A= to the PMD and the PMD should reject until stream_continue() is called.=0A= The next op to be sent by the user will have a special marker in it to=0A= inform the PMD to continue working on this stream. Alternatively the=0A= DPDK layer can be made "smarter" to fail during the enqueue by checking=0A= the stream and its state, but like you say this adds additional CPU=0A= overhead during the enqueue.=0A= I am curious. In a simple synchronous use case. How do we prevent users=0A= from putting multiple ops in flight that belong to a single stream? Do=0A= we just currently say it is undefined behavior? Otherwise we would have=0A= to check the stream and incur the CPU overhead.=0A= >=0A= > Maybe we could add a capability if this behaviour is important for you?= =0A= > e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ?=0A= > Our PMD would set this to 0. And expect no more than one op from a statef= ul stream=0A= > to be in flight at any time. =0A= [Ahmed] That makes sense. This way the different DPDK implementations do=0A= not have to add extra checking for unsupported cases.=0A= >=0A= > =0A= >>> Regarding the ordering of ops=0A= >>> We do force serialization of ops belonging to a stream in STATEFUL=0A= >>> operation. Related ops do=0A= >>> not go out of order and are given to available PMDs one at a time.=0A= >>>=0A= >>>>> The question is this mode of use useful for real=0A= >>>>> life applications or would we be just adding complexity? The technica= l=0A= >>>>> advantage of this is that processing of Stateful ops is interdependen= t=0A= >>>>> and PMDs can take advantage of caching and other optimizations to mak= e=0A= >>>>> processing related ops much faster than switching on every op. PMDs h= ave=0A= >>>>> maintain state of more than 32 KB for DEFLATE for every stream.=0A= >>>>>>> If the application has all the data, it can put it into chained mbu= fs in a single=0A= >>>>>>> op rather than=0A= >>>>>>> multiple ops, which avoids pushing all that complexity down to the = PMDs.=0A= >>>>> [Ahmed] I think that your suggested scheme of putting all related mbu= fs=0A= >>>>> into one op may be the best solution without the extra complexity of= =0A= >>>>> handling OUT_OF_SPACE cases, while still allowing the enqueuer extra= =0A= >>>>> time If we have a way of marking mbufs as ready for consumption. The= =0A= >>>>> enqueuer may not have all the data at hand but can enqueue the op wit= h a=0A= >>>>> couple of empty mbus marked as not ready for consumption. The enqueue= r=0A= >>>>> will then update the rest of the mbufs to ready for consumption once = the=0A= >>>>> data is added. This introduces a race condition. A second flag for ea= ch=0A= >>>>> mbuf can be updated by the PMD to indicate that it processed it or no= t.=0A= >>>>> This way in cases where the PMD beat the application to the op, the= =0A= >>>>> application will just update the op to point to the first unprocessed= =0A= >>>>> mbuf and resend it to the PMD.=0A= >>>> [Fiona] This doesn't sound safe. You want to add data to a stream afte= r you've=0A= >>>> enqueued the op. You would have to write to op.src.length at a time wh= en the PMD=0A= >>>> might be reading it. Sounds like a lock would be necessary.=0A= >>>> Once the op has been enqueued, my understanding is its ownership is ha= nded=0A= >>>> over to the PMD and the application should not touch it until it has b= een dequeued.=0A= >>>> I don't think it's a good idea to change this model.=0A= >>>> Can't the application just collect a stream of data in chained mbufs u= ntil it has=0A= >>>> enough to send an op, then construct the op and while waiting for that= op to=0A= >>>> complete, accumulate the next batch of chained mbufs? Only construct t= he next op=0A= >>>> after the previous one is complete, based on the result of the previou= s one.=0A= >>>>=0A= >>> [Ahmed] Fair enough. I agree with you. I imagined it in a different way= =0A= >>> in which each mbuf would have its own length.=0A= >>> The advantage to gain is in applications where there is one PMD user,= =0A= >>> the down time between ops can be significant and setting up a single=0A= >>> producer consumer pair significantly reduces the CPU cycles and PMD dow= n=0A= >>> time.=0A= >>>=0A= >>> ////snip////=0A= =0A= =0A=