From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on0065.outbound.protection.outlook.com [104.47.1.65]) by dpdk.org (Postfix) with ESMTP id 76C991B16F for ; Tue, 20 Feb 2018 20:56:10 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nxp.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=Ou7DUKCAFBNlViezD+hqlcsfDLbbHTQ1EdBsTzyJweM=; b=QSC5PRmMo0ZQw+lcLMQVHsNVUZzetOfNFjcz/Bg4CcDrRD8VZV/1HbdcNV14D2bq4P7K8h1N7muxyYmfhVn9Y7InPFrt7lth2UhIJuXkcnHihAkmd6QaUVG78GAf8A8/lXe7bvJgFVMjNjPpbfDY47AR58Hh1tR86UoLN1vr7Ds= Received: from DB3PR0402MB3852.eurprd04.prod.outlook.com (52.134.71.143) by DB3PR0402MB3929.eurprd04.prod.outlook.com (52.134.71.160) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.506.18; Tue, 20 Feb 2018 19:56:08 +0000 Received: from DB3PR0402MB3852.eurprd04.prod.outlook.com ([fe80::8554:d533:15e:1376]) by DB3PR0402MB3852.eurprd04.prod.outlook.com ([fe80::8554:d533:15e:1376%13]) with mapi id 15.20.0506.021; Tue, 20 Feb 2018 19:56:06 +0000 From: Ahmed Mansour To: "Verma, Shally" , "Trahe, Fiona" , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHg== Date: Tue, 20 Feb 2018 19:56:06 +0000 Message-ID: References: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B4358931F82B@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589321277@IRSMSX101.ger.corp.intel.com> Accept-Language: en-CA, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=ahmed.mansour@nxp.com; x-originating-ip: [192.88.168.49] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB3PR0402MB3929; 7:7qNwFfZ72y6NBf/Z2JkGCZP9N8AqFhE4h4OfIPa03NZT44cWa52R/qulbueMQ7TyaiLs3hoPG4OetUR0boyDu0GqBNHebgECJGIRdErGIgyQf/OYDL3HJ3YjbLm1+nPB7QxxIWoqp9NVMZ47jiAe0ehvzJSkQxAd/NtQUmg6ybRXrrdnFEUEqF1+J3CKbsAC+JiCfdnff1IDKIedQHZP+fuBLsEPQ6yP9SHHXgsPj1aT5eHZX2U+n/QXABW4PgBT x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(346002)(376002)(39860400002)(396003)(366004)(39380400002)(199004)(189003)(57704003)(93886005)(25786009)(105586002)(478600001)(7696005)(3280700002)(305945005)(74316002)(110136005)(7736002)(54906003)(76176011)(316002)(86362001)(66066001)(2906002)(6116002)(3846002)(14454004)(229853002)(5890100001)(26005)(81166006)(81156014)(5250100002)(97736004)(53946003)(8936002)(5660300001)(102836004)(186003)(6506007)(59450400001)(33656002)(55016002)(3660700001)(6436002)(4326008)(99286004)(9686003)(6246003)(53936002)(8676002)(561944003)(2900100001)(106356001)(68736007)(2501003); DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR0402MB3929; H:DB3PR0402MB3852.eurprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 1a877464-1194-41c7-9cb7-08d5789bfaf3 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(48565401081)(4534165)(4627221)(201703031133081)(201702281549075)(5600026)(4604075)(3008032)(2017052603307)(7153060)(7193020); SRVR:DB3PR0402MB3929; x-ms-traffictypediagnostic: DB3PR0402MB3929: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(244540007438412)(278428928389397)(271806183753584); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040501)(2401047)(8121501046)(5005006)(3231101)(944501161)(93006095)(93001095)(3002001)(10201501046)(6055026)(6041288)(20161123558120)(20161123564045)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(6072148)(201708071742011); SRVR:DB3PR0402MB3929; BCL:0; PCL:0; RULEID:; SRVR:DB3PR0402MB3929; x-forefront-prvs: 05891FB07F received-spf: None (protection.outlook.com: nxp.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: 5Io6YYobF9j09c2sj6pxS7ccc1RTuPbA5MXZ6RL6fsXjEeZw/HS2SL+JV3EnFGQz18/ALc+lRk+xXBo8HtkehThYBB8vBclWxikvQUc3OWveT7ObaKC4jeMxlyEWDkJeyb6mUuGeuoCzZ5/4rF0WuqO0UA192EaD6lupEOe65JM= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-Network-Message-Id: 1a877464-1194-41c7-9cb7-08d5789bfaf3 X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Feb 2018 19:56:06.7232 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR0402MB3929 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Feb 2018 19:56:11 -0000 /// snip ///=0A= >>>>>>>>>>>>>>> D.2.1 Stateful operation state maintenance=0A= >>>>>>>>>>>>>>> -----------------------------------------------------------= ----=0A= >>>>>>>>>>>>>>> It is always an ideal expectation from application that it = should parse=0A= >>>>>>>>>>>>>> through all related chunk of source data making its mbuf-cha= in and=0A= >>>>>>>>>>>> enqueue=0A= >>>>>>>>>>>>>> it for stateless processing.=0A= >>>>>>>>>>>>>>> However, if it need to break it into several enqueue_burst(= ) calls, then=0A= >>>>>>>>>>>> an=0A= >>>>>>>>>>>>>> expected call flow would be something like:=0A= >>>>>>>>>>>>>>> enqueue_burst( |op.no_flush |)=0A= >>>>>>>>>>>>>> [Ahmed] The work is now in flight to the PMD.The user will c= all dequeue=0A= >>>>>>>>>>>>>> burst in a loop until all ops are received. Is this correct?= =0A= >>>>>>>>>>>>>>=0A= >>>>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next=0A= >>>>>>>>>>>>> [Shally] Yes. Ideally every submitted op need to be dequeued.= However=0A= >>>>>>>>>>>> this illustration is specifically in=0A= >>>>>>>>>>>>> context of stateful op processing to reflect if a stream is b= roken into=0A= >>>>>>>>>>>> chunks, then each chunk should be=0A= >>>>>>>>>>>>> submitted as one op at-a-time with type =3D STATEFUL and need= to be=0A= >>>>>>>>>>>> dequeued first before next chunk is=0A= >>>>>>>>>>>>> enqueued.=0A= >>>>>>>>>>>>>=0A= >>>>>>>>>>>>>>> enqueue_burst( |op.no_flush |)=0A= >>>>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next=0A= >>>>>>>>>>>>>>> enqueue_burst( |op.full_flush |)=0A= >>>>>>>>>>>>>> [Ahmed] Why now allow multiple work items in flight? I under= stand that=0A= >>>>>>>>>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we jus= t=0A= >>>>>>>>>>>> distinguish=0A= >>>>>>>>>>>>>> the response in exception cases?=0A= >>>>>>>>>>>>> [Shally] Multiples ops are allowed in flight, however conditi= on is each op in=0A= >>>>>>>>>>>> such case is independent of=0A= >>>>>>>>>>>>> each other i.e. belong to different streams altogether.=0A= >>>>>>>>>>>>> Earlier (as part of RFC v1 doc) we did consider the proposal = to process all=0A= >>>>>>>>>>>> related chunks of data in single=0A= >>>>>>>>>>>>> burst by passing them as ops array but later found that as no= t-so-useful for=0A= >>>>>>>>>>>> PMD handling for various=0A= >>>>>>>>>>>>> reasons. You may please refer to RFC v1 doc review comments f= or same.=0A= >>>>>>>>>>>> [Fiona] Agree with Shally. In summary, as only one op can be p= rocessed at a=0A= >>>>>>>>>>>> time, since each needs the=0A= >>>>>>>>>>>> state of the previous, to allow more than 1 op to be in-flight= at a time would=0A= >>>>>>>>>>>> force PMDs to implement internal queueing and exception handli= ng for=0A= >>>>>>>>>>>> OUT_OF_SPACE conditions you mention.=0A= >>>>>>>>>> [Ahmed] But we are putting the ops on qps which would make them= =0A= >>>>>>>>>> sequential. Handling OUT_OF_SPACE conditions would be a little b= it more=0A= >>>>>>>>>> complex but doable.=0A= >>>>>>>>> [Fiona] In my opinion this is not doable, could be very inefficie= nt.=0A= >>>>>>>>> There may be many streams.=0A= >>>>>>>>> The PMD would have to have an internal queue per stream so=0A= >>>>>>>>> it could adjust the next src offset and length in the OUT_OF_SPAC= E case.=0A= >>>>>>>>> And this may ripple back though all subsequent ops in the stream = as each=0A= >>>>>>>>> source len is increased and its dst buffer is not big enough.=0A= >>>>>>>> [Ahmed] Regarding multi op OUT_OF_SPACE handling.=0A= >>>>>>>> The caller would still need to adjust=0A= >>>>>>>> the src length/output buffer as you say. The PMD cannot handle=0A= >>>>>>>> OUT_OF_SPACE internally.=0A= >>>>>>>> After OUT_OF_SPACE occurs, the PMD should reject all ops in this s= tream=0A= >>>>>>>> until it gets explicit=0A= >>>>>>>> confirmation from the caller to continue working on this stream. A= ny ops=0A= >>>>>>>> received by=0A= >>>>>>>> the PMD should be returned to the caller with status STREAM_PAUSED= since=0A= >>>>>>>> the caller did not=0A= >>>>>>>> explicitly acknowledge that it has solved the OUT_OF_SPACE issue.= =0A= >>>>>>>> These semantics can be enabled by adding a new function to the API= =0A= >>>>>>>> perhaps stream_resume().=0A= >>>>>>>> This allows the caller to indicate that it acknowledges that it ha= s seen=0A= >>>>>>>> the issue and this op=0A= >>>>>>>> should be used to resolve the issue. Implementations that do not s= upport=0A= >>>>>>>> this mode of use=0A= >>>>>>>> can push back immediately after one op is in flight. Implementatio= ns=0A= >>>>>>>> that support this use=0A= >>>>>>>> mode can allow many ops from the same session=0A= >>>>>>>>=0A= >>>>>>> [Shally] Is it still in context of having single burst where all op= belongs to one stream? If yes, I would=0A= >>>> still=0A= >>>>>>> say it would add an overhead to PMDs especially if it is expected t= o work closer to HW (which I think=0A= >>>> is=0A= >>>>>>> the case with DPDK PMD).=0A= >>>>>>> Though your approach is doable but why this all cannot be in a laye= r above PMD? i.e. a layer above=0A= >>>> PMD=0A= >>>>>>> can either pass one-op at a time with burst size =3D 1 OR can make = chained mbuf of input and output=0A= >>>> and=0A= >>>>>>> pass than as one op.=0A= >>>>>>> Is it just to ease applications of chained mbuf burden or do you se= e any performance /use-case=0A= >>>>>>> impacting aspect also?=0A= >>>>>>>=0A= >>>>>>> if it is in context where each op belong to different stream in a b= urst, then why do we need=0A= >>>>>>> stream_pause and resume? It is a expectations from app to pass more= output buffer with consumed=0A= >>>> + 1=0A= >>>>>>> from next call onwards as it has already=0A= >>>>>>> seen OUT_OF_SPACE.=0A= >>>>> [Ahmed] Yes, this would add extra overhead to the PMD. Our PMD=0A= >>>>> implementation rejects all ops that belong to a stream that has enter= ed=0A= >>>>> "RECOVERABLE" state for one reason or another. The caller must=0A= >>>>> acknowledge explicitly that it has received news of the problem befor= e=0A= >>>>> the PMD allows this stream to exit "RECOVERABLE" state. I agree with = you=0A= >>>>> that implementing this functionality in the software layer above the = PMD=0A= >>>>> is a bad idea since the latency reductions are lost.=0A= >>>> [Shally] Just reiterating, I rather meant other way around i.e. I see = it easier to put all such complexity in a=0A= >>>> layer above PMD.=0A= >>>>=0A= >>>>> This setup is useful in latency sensitive applications where the late= ncy=0A= >>>>> of buffering multiple ops into one op is significant. We found latenc= y=0A= >>>>> makes a significant difference in search applications where the PMD= =0A= >>>>> competes with software decompression.=0A= >>> [Fiona] I see, so when all goes well, you get best-case latency, but wh= en=0A= >>> out-of-space occurs latency will probably be worse.=0A= >> [Ahmed] This is exactly right. This use mode assumes out-of-space is a= =0A= >> rare occurrence. Recovering from it should take similar time to=0A= >> synchronous implementations. The caller gets OUT_OF_SPACE_RECOVERABLE in= =0A= >> both sync and async use. The caller can fix up the op and send it back= =0A= >> to the PMD to continue work just as would be done in sync. Nonetheless,= =0A= >> the added complexity is not justifiable if out-of-space is very common= =0A= >> since the recoverable state will be the limiting factor that forces=0A= >> synchronicity.=0A= >>>>>> [Fiona] I still have concerns with this and would not want to suppor= t in our PMD.=0A= >>>>>> TO make sure I understand, you want to send a burst of ops, with sev= eral from same stream.=0A= >>>>>> If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not proc= ess any=0A= >>>>>> subsequent ops in that stream.=0A= >>>>>> Should it return them in a dequeue_burst() with status still NOT_PRO= CESSED?=0A= >>>>>> Or somehow drop them? How?=0A= >>>>>> While still processing ops form other streams.=0A= >>>>> [Ahmed] This is exactly correct. It should return them with=0A= >>>>> NOT_PROCESSED. Yes, the PMD should continue processing other streams.= =0A= >>>>>> As we want to offload each op to hardware with as little CPU process= ing as possible we=0A= >>>>>> would not want to open up each op to see which stream it's attached = to and=0A= >>>>>> make decisions to do per-stream storage, or drop it, or bypass hw an= d dequeue without processing.=0A= >>>>> [Ahmed] I think I might have missed your point here, but I will try t= o=0A= >>>>> answer. There is no need to "cushion" ops in DPDK. DPDK should send o= ps=0A= >>>>> to the PMD and the PMD should reject until stream_continue() is calle= d.=0A= >>>>> The next op to be sent by the user will have a special marker in it t= o=0A= >>>>> inform the PMD to continue working on this stream. Alternatively the= =0A= >>>>> DPDK layer can be made "smarter" to fail during the enqueue by checki= ng=0A= >>>>> the stream and its state, but like you say this adds additional CPU= =0A= >>>>> overhead during the enqueue.=0A= >>>>> I am curious. In a simple synchronous use case. How do we prevent use= rs=0A= >>>> >from putting multiple ops in flight that belong to a single stream? D= o=0A= >>>>> we just currently say it is undefined behavior? Otherwise we would ha= ve=0A= >>>>> to check the stream and incur the CPU overhead.=0A= >>> [Fiona] We don't do anything to prevent it. It's undefined. IMO on data= path in=0A= >>> DPDK model we expect good behaviour and don't have to error check for t= hings like this.=0A= >> [Ahmed] This makes sense. We also assume good behavior.=0A= >>> In our PMD if we got a burst of 20 ops, we allocate 20 spaces on the hw= q, then=0A= >>> build and send those messages. If we found an op from a stream which al= ready=0A= >>> had one inflight, we'd have to hold that back, store in a sw stream-spe= cific holding queue,=0A= >>> only send 19 to hw. We cannot send multiple ops from same stream to=0A= >>> the hw as it fans them out and does them in parallel.=0A= >>> Once the enqueue_burst() returns, there is no processing=0A= >>> context which would spot that the first has completed=0A= >>> and send the next op to the hw. On a dequeue_burst() we would spot this= ,=0A= >>> in that context could process the next op in the stream.=0A= >>> On out of space, instead of processing the next op we would have to tra= nsfer=0A= >>> all unprocessed ops from the stream to the dequeue result.=0A= >>> Some parts of this are doable, but seems likely to add a lot more laten= cy,=0A= >>> we'd need to add extra threads and timers to move ops from the sw=0A= >>> queue to the hw q to get any benefit, and these constructs would add=0A= >>> context switching and CPU cycles. So we prefer to push this responsibil= ity=0A= >>> to above the API and it can achieve similar.=0A= >> [Ahmed] I see what you mean. Our workflow is almost exactly the same=0A= >> with our hardware, but the fanning out is done by the hardware based on= =0A= >> the stream and ops that belong to the same stream are never allowed to= =0A= >> go out of order. Otherwise the data would be corrupted. Likewise the=0A= >> hardware is responsible for checking the state of the stream and=0A= >> returning frames as NOT_PROCESSED to the software=0A= >>>>>> Maybe we could add a capability if this behaviour is important for y= ou?=0A= >>>>>> e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ?=0A= >>>>>> Our PMD would set this to 0. And expect no more than one op from a s= tateful stream=0A= >>>>>> to be in flight at any time.=0A= >>>>> [Ahmed] That makes sense. This way the different DPDK implementations= do=0A= >>>>> not have to add extra checking for unsupported cases.=0A= >>>> [Shally] @ahmed, If I summarise your use-case, this is how to want to = PMD to support?=0A= >>>> - a burst *carry only one stream* and all ops then assumed to be belon= g to that stream? (please note,=0A= >>>> here burst is not carrying more than one stream)=0A= >> [Ahmed] No. In this use case the caller sets up an op and enqueues a=0A= >> single op. Then before the response comes back from the PMD the caller= =0A= >> enqueues a second op on the same stream.=0A= >>>> -PMD will submit one op at a time to HW?=0A= >> [Ahmed] I misunderstood what PMD means. I used it throughout to mean the= =0A= >> HW. I used DPDK to mean the software implementation that talks to the=0A= >> hardware.=0A= >> The software will submit all ops immediately. The hardware has to figure= =0A= >> out what to do with the ops depending on what stream they belong to.=0A= >>>> -if processed successfully, push it back to completion queue with stat= us =3D SUCCESS. If failed or run to=0A= >>>> into OUT_OF_SPACE, then push it to completion queue with status =3D FA= ILURE/=0A= >>>> OUT_OF_SPACE_RECOVERABLE and rest with status =3D NOT_PROCESSED and re= turn with enqueue count=0A= >>>> =3D total # of ops submitted originally with burst?=0A= >> [Ahmed] This is exactly what I had in mind. all ops will be submitted to= =0A= >> the HW. The HW will put all of them on the completion queue with the=0A= >> correct status exactly as you say.=0A= >>>> -app assumes all have been enqueued, so it go and dequeue all ops=0A= >>>> -on seeing an op with OUT_OF_SPACE_RECOVERABLE, app resubmit a burst o= f ops with call to=0A= >>>> stream_continue/resume API starting from op which encountered OUT_OF_S= PACE and others as=0A= >>>> NOT_PROCESSED with updated input and output buffer?=0A= >> [Ahmed] Correct this is what we do today in our proprietary API.=0A= >>>> -repeat until *all* are dequeued with status =3D SUCCESS or *any* with= status =3D FAILURE? If anytime=0A= >>>> failure is seen, then app start whole processing all over again or jus= t drop this burst?!=0A= >> [Ahmed] The app has the choice on how to proceed. If the issue is=0A= >> recoverable then the application can continue this stream from where it= =0A= >> stopped. if the failure is unrecoverable then the application should=0A= >> first fix the problem and start from the beginning of the stream.=0A= >>>> If all of above is true, then I think we should add another API such a= s rte_comp_enque_single_stream()=0A= >>>> which will be functional under Feature Flag =3D ALLOW_ENQUEUE_MULTIPLE= _STATEFUL_OPS or better=0A= >>>> name is SUPPORT_ENQUEUE_SINGLE_STREAM?!=0A= >> [Ahmed] The main advantage in async use is lost if we force all related= =0A= >> ops to be in the same burst. if we do that, then we might as well merge= =0A= >> all the ops into one op. That would reduce the overhead.=0A= >> The use mode I am proposing is only useful in cases where the data=0A= >> becomes available after the first enqueue occurred. I want to allow the= =0A= >> caller to enqueue the second set of data as soon as it is available=0A= >> regardless of whether or not the HW has already started working on the= =0A= >> first op inflight.=0A= > [Shally] @ahmed, Ok.. seems I missed a point here. So, confirm me follow= ing:=0A= > =0A= > As per current description in doc, expected stateful usage is:=0A= > enqueue (op1) --> dequeue(op1) --> enqueue(op2)=0A= >=0A= > but you're suggesting to allow an option to change it to =0A= >=0A= > enqueue(op1) -->enqueue(op2) =0A= >=0A= > i.e. multiple ops from same stream can be put in-flight via subsequent e= nqueue_burst() calls without waiting to dequeue previous ones as PMD suppor= t it . So, no change to current definition of a burst. It will still carry = multiple streams where each op belonging to different stream ?!=0A= [Ahmed] Correct. I guess a user could put two ops on the same burst that=0A= belong to the same stream. In that case it would be more efficient to=0A= merge the ops using scatter gather. Nonetheless, I would not add checks=0A= in my implementation to limit that use. The hardware does not perceive a=0A= difference between ops that came on one burst and ops that came on two=0A= different bursts. to the hardware they are all ops. What matters is=0A= which stream each op belongs to.=0A= > if yes, then seems your HW can be setup for multiple streams so it is eff= icient for your case to support it in DPDK PMD layer but our hw doesn't by= -default and need SW to back it. Given that, I also suggest to enable it un= der some feature flag.=0A= >=0A= > However it looks like an add-on and if it doesn't change current definiti= on of a burst and minimum expectation set on stateful processing described = in this document, then IMO, you can propose this feature as an incremental = patch on baseline version, in absence of which, =0A= > application will exercise stateful processing as described here (enq->deq= ->enq). Thoughts?=0A= [Ahmed] Makes sense. I was worried that there might be fundamental=0A= limitations to this mode of use in the API design. That is why I wanted=0A= to share this use mode with you guys and see if it can be accommodated=0A= using an incremental patch in the future.=0A= >>> [Fiona] Am curious about Ahmed's response to this. I didn't get that a = burst should carry only one stream=0A= >>> Or get how this makes a difference? As there can be many enqueue_burst(= ) calls done before an dequeue_burst()=0A= >>> Maybe you're thinking the enqueue_burst() would be a blocking call that= would not return until all the ops=0A= >>> had been processed? This would turn it into a synchronous call which is= n't the intent.=0A= >> [Ahmed] Agreed, a blocking or even a buffering software layer that baby= =0A= >> sits the hardware does not fundamentally change the parameters of the=0A= >> system as a whole. It just moves workflow management complexity down=0A= >> into the DPDK software layer. Rather there are real latency and=0A= >> throughput advantages (because of caching) that I want to expose.=0A= >>=0A= >> /// snip ///=0A= >=0A= >=0A= =0A=