From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-DM3-obe.outbound.protection.outlook.com (mail-dm3nam03on0066.outbound.protection.outlook.com [104.47.41.66]) by dpdk.org (Postfix) with ESMTP id 900222C16 for ; Thu, 22 Feb 2018 05:47:49 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=cGmtjd/wurzWPrDx9FwjOnJVEfMXohHhnHEA4Kkgfp4=; b=lUIcaECph5AUkeG0/dDvXrHEdWba3Rn+WL50wUzKT1kZKjKLtZZtSJ4ZAGmp1WdoX7Y6VyZZOO2+qnkgCvljpPZUtF2SL/kwpm5r0x5z8iGTS/Ok/Cwc6GxN+JD3PNLMOEQInbThllt+1dUIGqIXKfvatM5xBvk8b9ijukrOFPA= Received: from CY4PR0701MB3634.namprd07.prod.outlook.com (52.132.101.164) by CY4PR0701MB3682.namprd07.prod.outlook.com (52.132.102.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.506.18; Thu, 22 Feb 2018 04:47:47 +0000 Received: from CY4PR0701MB3634.namprd07.prod.outlook.com ([fe80::e842:2eb2:2ff5:5dd6]) by CY4PR0701MB3634.namprd07.prod.outlook.com ([fe80::e842:2eb2:2ff5:5dd6%13]) with mapi id 15.20.0506.023; Thu, 22 Feb 2018 04:47:45 +0000 From: "Verma, Shally" To: Ahmed Mansour , "Trahe, Fiona" , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHgmRdRyw Date: Thu, 22 Feb 2018 04:47:45 +0000 Message-ID: References: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B4358931F82B@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589321277@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589324E3A@IRSMSX101.ger.corp.intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Shally.Verma@cavium.com; x-originating-ip: [122.175.91.50] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; CY4PR0701MB3682; 7:tf1F7hXv0SRd+q/3VOdzTDq8XgFYqHAEzg+QD1vjN24IGCvujLpJB5R4fY4eZPQwy/UMLnJS54KIpKqKRbVbzHm16ZKJw/NX6XELpLcWCkMu71clcCMDo+W5KcuATW0wGqmrCgfTZKWu5jj0rox5WTCh/udBb4NOmDw9h2F6wJY9EAtTdCsC+bfqd33FtM9zGpi+4erTYRWnadOwWYpdSiePLSv9lx0eaGgRM/k7pJLKaSgZke6Hn4OmYZuTKH6N x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(366004)(376002)(39380400002)(396003)(39860400002)(346002)(199004)(13464003)(189003)(57704003)(3660700001)(4326008)(53936002)(53946003)(561944003)(93886005)(6436002)(25786009)(2906002)(3846002)(6116002)(68736007)(74316002)(8936002)(3280700002)(81166006)(81156014)(97736004)(229853002)(6246003)(2950100002)(9686003)(55016002)(7696005)(6506007)(110136005)(86362001)(59450400001)(105586002)(316002)(478600001)(53546011)(72206003)(54906003)(8656006)(99286004)(2900100001)(76176011)(106356001)(66066001)(2501003)(5890100001)(5250100002)(186003)(102836004)(8676002)(33656002)(7736002)(14454004)(305945005)(5660300001)(26005)(579004); DIR:OUT; SFP:1101; SCL:1; SRVR:CY4PR0701MB3682; H:CY4PR0701MB3634.namprd07.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; x-ms-office365-filtering-correlation-id: bb74bbfe-c297-4746-041e-08d579af6ac2 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(4534165)(4627221)(201703031133081)(201702281549075)(5600026)(4604075)(3008032)(2017052603307)(7153060)(7193020); SRVR:CY4PR0701MB3682; x-ms-traffictypediagnostic: CY4PR0701MB3682: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(244540007438412)(278428928389397)(271806183753584)(185117386973197)(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001075)(6040501)(2401047)(8121501046)(5005006)(10201501046)(93006095)(93001095)(3002001)(3231101)(944501161)(6041288)(20161123560045)(20161123564045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(6072148)(201708071742011); SRVR:CY4PR0701MB3682; BCL:0; PCL:0; RULEID:; SRVR:CY4PR0701MB3682; x-forefront-prvs: 059185FE08 received-spf: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: cSja2Y/2MmClATUlKwsndX2MDyYUP58u9wrPtcq4PAA7qiblb1lmhXRDT+JPj8KscnzgcVyJ9fXmgTdaEM/3a5lINZaSXrevIyiE5muZ0oYYUCMjDJe3ZuO6MtggbAp3at6E+pAja9d6YmF44FMt/utY0KCyBLqxAr2vVPLkdvM= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-Network-Message-Id: bb74bbfe-c297-4746-041e-08d579af6ac2 X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Feb 2018 04:47:45.8412 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR0701MB3682 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Feb 2018 04:47:50 -0000 >-----Original Message----- >From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] >Sent: 22 February 2018 01:06 >To: Trahe, Fiona ; Verma, Shally ; dev@dpdk.org >Cc: Athreya, Narayana Prasad ; Gupta, A= shish ; Sahu, Sunila >; De Lara Guarch, Pablo ; Challa, Mahipal >; Jain, Deepak K ; Hem= ant Agrawal ; Roy >Pledge ; Youri Querry >Subject: Re: [RFC v2] doc compression API for DPDK > >On 2/21/2018 9:35 AM, Trahe, Fiona wrote: >> Hi Ahmed, Shally, >> >> >>> -----Original Message----- >>> From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] >>> Sent: Tuesday, February 20, 2018 7:56 PM >>> To: Verma, Shally ; Trahe, Fiona ; dev@dpdk.org >>> Cc: Athreya, Narayana Prasad ; Gupta= , Ashish >>> ; Sahu, Sunila ; De La= ra Guarch, Pablo >>> ; Challa, Mahipal ; Jain, Deepak K >>> ; Hemant Agrawal ; Roy= Pledge >>> ; Youri Querry >>> Subject: Re: [RFC v2] doc compression API for DPDK >>> >>> /// snip /// >>>>>>>>>>>>>>>>>> D.2.1 Stateful operation state maintenance >>>>>>>>>>>>>>>>>> --------------------------------------------------------= ------- >>>>>>>>>>>>>>>>>> It is always an ideal expectation from application that = it should parse >>>>>>>>>>>>>>>>> through all related chunk of source data making its mbuf-= chain and >>>>>>>>>>>>>>> enqueue >>>>>>>>>>>>>>>>> it for stateless processing. >>>>>>>>>>>>>>>>>> However, if it need to break it into several enqueue_bur= st() calls, then >>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>> expected call flow would be something like: >>>>>>>>>>>>>>>>>> enqueue_burst( |op.no_flush |) >>>>>>>>>>>>>>>>> [Ahmed] The work is now in flight to the PMD.The user wil= l call dequeue >>>>>>>>>>>>>>>>> burst in a loop until all ops are received. Is this corre= ct? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next >>>>>>>>>>>>>>>> [Shally] Yes. Ideally every submitted op need to be dequeu= ed. However >>>>>>>>>>>>>>> this illustration is specifically in >>>>>>>>>>>>>>>> context of stateful op processing to reflect if a stream i= s broken into >>>>>>>>>>>>>>> chunks, then each chunk should be >>>>>>>>>>>>>>>> submitted as one op at-a-time with type =3D STATEFUL and n= eed to be >>>>>>>>>>>>>>> dequeued first before next chunk is >>>>>>>>>>>>>>>> enqueued. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> enqueue_burst( |op.no_flush |) >>>>>>>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next >>>>>>>>>>>>>>>>>> enqueue_burst( |op.full_flush |) >>>>>>>>>>>>>>>>> [Ahmed] Why now allow multiple work items in flight? I un= derstand that >>>>>>>>>>>>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we = just >>>>>>>>>>>>>>> distinguish >>>>>>>>>>>>>>>>> the response in exception cases? >>>>>>>>>>>>>>>> [Shally] Multiples ops are allowed in flight, however cond= ition is each op in >>>>>>>>>>>>>>> such case is independent of >>>>>>>>>>>>>>>> each other i.e. belong to different streams altogether. >>>>>>>>>>>>>>>> Earlier (as part of RFC v1 doc) we did consider the propos= al to process all >>>>>>>>>>>>>>> related chunks of data in single >>>>>>>>>>>>>>>> burst by passing them as ops array but later found that as= not-so-useful for >>>>>>>>>>>>>>> PMD handling for various >>>>>>>>>>>>>>>> reasons. You may please refer to RFC v1 doc review comment= s for same. >>>>>>>>>>>>>>> [Fiona] Agree with Shally. In summary, as only one op can b= e processed at a >>>>>>>>>>>>>>> time, since each needs the >>>>>>>>>>>>>>> state of the previous, to allow more than 1 op to be in-fli= ght at a time would >>>>>>>>>>>>>>> force PMDs to implement internal queueing and exception han= dling for >>>>>>>>>>>>>>> OUT_OF_SPACE conditions you mention. >>>>>>>>>>>>> [Ahmed] But we are putting the ops on qps which would make th= em >>>>>>>>>>>>> sequential. Handling OUT_OF_SPACE conditions would be a littl= e bit more >>>>>>>>>>>>> complex but doable. >>>>>>>>>>>> [Fiona] In my opinion this is not doable, could be very ineffi= cient. >>>>>>>>>>>> There may be many streams. >>>>>>>>>>>> The PMD would have to have an internal queue per stream so >>>>>>>>>>>> it could adjust the next src offset and length in the OUT_OF_S= PACE case. >>>>>>>>>>>> And this may ripple back though all subsequent ops in the stre= am as each >>>>>>>>>>>> source len is increased and its dst buffer is not big enough. >>>>>>>>>>> [Ahmed] Regarding multi op OUT_OF_SPACE handling. >>>>>>>>>>> The caller would still need to adjust >>>>>>>>>>> the src length/output buffer as you say. The PMD cannot handle >>>>>>>>>>> OUT_OF_SPACE internally. >>>>>>>>>>> After OUT_OF_SPACE occurs, the PMD should reject all ops in thi= s stream >>>>>>>>>>> until it gets explicit >>>>>>>>>>> confirmation from the caller to continue working on this stream= . Any ops >>>>>>>>>>> received by >>>>>>>>>>> the PMD should be returned to the caller with status STREAM_PAU= SED since >>>>>>>>>>> the caller did not >>>>>>>>>>> explicitly acknowledge that it has solved the OUT_OF_SPACE issu= e. >>>>>>>>>>> These semantics can be enabled by adding a new function to the = API >>>>>>>>>>> perhaps stream_resume(). >>>>>>>>>>> This allows the caller to indicate that it acknowledges that it= has seen >>>>>>>>>>> the issue and this op >>>>>>>>>>> should be used to resolve the issue. Implementations that do no= t support >>>>>>>>>>> this mode of use >>>>>>>>>>> can push back immediately after one op is in flight. Implementa= tions >>>>>>>>>>> that support this use >>>>>>>>>>> mode can allow many ops from the same session >>>>>>>>>>> >>>>>>>>>> [Shally] Is it still in context of having single burst where all= op belongs to one stream? If yes, I >>> would >>>>>>> still >>>>>>>>>> say it would add an overhead to PMDs especially if it is expecte= d to work closer to HW (which I >>> think >>>>>>> is >>>>>>>>>> the case with DPDK PMD). >>>>>>>>>> Though your approach is doable but why this all cannot be in a l= ayer above PMD? i.e. a layer >>> above >>>>>>> PMD >>>>>>>>>> can either pass one-op at a time with burst size =3D 1 OR can ma= ke chained mbuf of input and >>> output >>>>>>> and >>>>>>>>>> pass than as one op. >>>>>>>>>> Is it just to ease applications of chained mbuf burden or do you= see any performance /use-case >>>>>>>>>> impacting aspect also? >>>>>>>>>> >>>>>>>>>> if it is in context where each op belong to different stream in = a burst, then why do we need >>>>>>>>>> stream_pause and resume? It is a expectations from app to pass m= ore output buffer with >>> consumed >>>>>>> + 1 >>>>>>>>>> from next call onwards as it has already >>>>>>>>>> seen OUT_OF_SPACE. >>>>>>>> [Ahmed] Yes, this would add extra overhead to the PMD. Our PMD >>>>>>>> implementation rejects all ops that belong to a stream that has en= tered >>>>>>>> "RECOVERABLE" state for one reason or another. The caller must >>>>>>>> acknowledge explicitly that it has received news of the problem be= fore >>>>>>>> the PMD allows this stream to exit "RECOVERABLE" state. I agree wi= th you >>>>>>>> that implementing this functionality in the software layer above t= he PMD >>>>>>>> is a bad idea since the latency reductions are lost. >>>>>>> [Shally] Just reiterating, I rather meant other way around i.e. I s= ee it easier to put all such complexity >>> in a >>>>>>> layer above PMD. >>>>>>> >>>>>>>> This setup is useful in latency sensitive applications where the l= atency >>>>>>>> of buffering multiple ops into one op is significant. We found lat= ency >>>>>>>> makes a significant difference in search applications where the PM= D >>>>>>>> competes with software decompression. >>>>>> [Fiona] I see, so when all goes well, you get best-case latency, but= when >>>>>> out-of-space occurs latency will probably be worse. >>>>> [Ahmed] This is exactly right. This use mode assumes out-of-space is = a >>>>> rare occurrence. Recovering from it should take similar time to >>>>> synchronous implementations. The caller gets OUT_OF_SPACE_RECOVERABLE= in >>>>> both sync and async use. The caller can fix up the op and send it bac= k >>>>> to the PMD to continue work just as would be done in sync. Nonetheles= s, >>>>> the added complexity is not justifiable if out-of-space is very commo= n >>>>> since the recoverable state will be the limiting factor that forces >>>>> synchronicity. >>>>>>>>> [Fiona] I still have concerns with this and would not want to sup= port in our PMD. >>>>>>>>> TO make sure I understand, you want to send a burst of ops, with = several from same stream. >>>>>>>>> If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not p= rocess any >>>>>>>>> subsequent ops in that stream. >>>>>>>>> Should it return them in a dequeue_burst() with status still NOT_= PROCESSED? >>>>>>>>> Or somehow drop them? How? >>>>>>>>> While still processing ops form other streams. >>>>>>>> [Ahmed] This is exactly correct. It should return them with >>>>>>>> NOT_PROCESSED. Yes, the PMD should continue processing other strea= ms. >>>>>>>>> As we want to offload each op to hardware with as little CPU proc= essing as possible we >>>>>>>>> would not want to open up each op to see which stream it's attach= ed to and >>>>>>>>> make decisions to do per-stream storage, or drop it, or bypass hw= and dequeue without >>> processing. >>>>>>>> [Ahmed] I think I might have missed your point here, but I will tr= y to >>>>>>>> answer. There is no need to "cushion" ops in DPDK. DPDK should sen= d ops >>>>>>>> to the PMD and the PMD should reject until stream_continue() is ca= lled. >>>>>>>> The next op to be sent by the user will have a special marker in i= t to >>>>>>>> inform the PMD to continue working on this stream. Alternatively t= he >>>>>>>> DPDK layer can be made "smarter" to fail during the enqueue by che= cking >>>>>>>> the stream and its state, but like you say this adds additional CP= U >>>>>>>> overhead during the enqueue. >>>>>>>> I am curious. In a simple synchronous use case. How do we prevent = users >>>>>>> >from putting multiple ops in flight that belong to a single stream= ? Do >>>>>>>> we just currently say it is undefined behavior? Otherwise we would= have >>>>>>>> to check the stream and incur the CPU overhead. >>>>>> [Fiona] We don't do anything to prevent it. It's undefined. IMO on d= ata path in >>>>>> DPDK model we expect good behaviour and don't have to error check fo= r things like this. >>>>> [Ahmed] This makes sense. We also assume good behavior. >>>>>> In our PMD if we got a burst of 20 ops, we allocate 20 spaces on the= hw q, then >>>>>> build and send those messages. If we found an op from a stream which= already >>>>>> had one inflight, we'd have to hold that back, store in a sw stream-= specific holding queue, >>>>>> only send 19 to hw. We cannot send multiple ops from same stream to >>>>>> the hw as it fans them out and does them in parallel. >>>>>> Once the enqueue_burst() returns, there is no processing >>>>>> context which would spot that the first has completed >>>>>> and send the next op to the hw. On a dequeue_burst() we would spot t= his, >>>>>> in that context could process the next op in the stream. >>>>>> On out of space, instead of processing the next op we would have to = transfer >>>>>> all unprocessed ops from the stream to the dequeue result. >>>>>> Some parts of this are doable, but seems likely to add a lot more la= tency, >>>>>> we'd need to add extra threads and timers to move ops from the sw >>>>>> queue to the hw q to get any benefit, and these constructs would add >>>>>> context switching and CPU cycles. So we prefer to push this responsi= bility >>>>>> to above the API and it can achieve similar. >>>>> [Ahmed] I see what you mean. Our workflow is almost exactly the same >>>>> with our hardware, but the fanning out is done by the hardware based = on >>>>> the stream and ops that belong to the same stream are never allowed t= o >>>>> go out of order. Otherwise the data would be corrupted. Likewise the >>>>> hardware is responsible for checking the state of the stream and >>>>> returning frames as NOT_PROCESSED to the software >>>>>>>>> Maybe we could add a capability if this behaviour is important fo= r you? >>>>>>>>> e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ? >>>>>>>>> Our PMD would set this to 0. And expect no more than one op from = a stateful stream >>>>>>>>> to be in flight at any time. >>>>>>>> [Ahmed] That makes sense. This way the different DPDK implementati= ons do >>>>>>>> not have to add extra checking for unsupported cases. >>>>>>> [Shally] @ahmed, If I summarise your use-case, this is how to want = to PMD to support? >>>>>>> - a burst *carry only one stream* and all ops then assumed to be be= long to that stream? (please >>> note, >>>>>>> here burst is not carrying more than one stream) >>>>> [Ahmed] No. In this use case the caller sets up an op and enqueues a >>>>> single op. Then before the response comes back from the PMD the calle= r >>>>> enqueues a second op on the same stream. >>>>>>> -PMD will submit one op at a time to HW? >>>>> [Ahmed] I misunderstood what PMD means. I used it throughout to mean = the >>>>> HW. I used DPDK to mean the software implementation that talks to the >>>>> hardware. >>>>> The software will submit all ops immediately. The hardware has to fig= ure >>>>> out what to do with the ops depending on what stream they belong to. >>>>>>> -if processed successfully, push it back to completion queue with s= tatus =3D SUCCESS. If failed or run to >>>>>>> into OUT_OF_SPACE, then push it to completion queue with status =3D= FAILURE/ >>>>>>> OUT_OF_SPACE_RECOVERABLE and rest with status =3D NOT_PROCESSED and= return with enqueue >>> count >>>>>>> =3D total # of ops submitted originally with burst? >>>>> [Ahmed] This is exactly what I had in mind. all ops will be submitted= to >>>>> the HW. The HW will put all of them on the completion queue with the >>>>> correct status exactly as you say. >>>>>>> -app assumes all have been enqueued, so it go and dequeue all ops >>>>>>> -on seeing an op with OUT_OF_SPACE_RECOVERABLE, app resubmit a burs= t of ops with call to >>>>>>> stream_continue/resume API starting from op which encountered OUT_O= F_SPACE and others as >>>>>>> NOT_PROCESSED with updated input and output buffer? >>>>> [Ahmed] Correct this is what we do today in our proprietary API. >>>>>>> -repeat until *all* are dequeued with status =3D SUCCESS or *any* w= ith status =3D FAILURE? If anytime >>>>>>> failure is seen, then app start whole processing all over again or = just drop this burst?! >>>>> [Ahmed] The app has the choice on how to proceed. If the issue is >>>>> recoverable then the application can continue this stream from where = it >>>>> stopped. if the failure is unrecoverable then the application should >>>>> first fix the problem and start from the beginning of the stream. >>>>>>> If all of above is true, then I think we should add another API suc= h as >>> rte_comp_enque_single_stream() >>>>>>> which will be functional under Feature Flag =3D ALLOW_ENQUEUE_MULTI= PLE_STATEFUL_OPS or better >>>>>>> name is SUPPORT_ENQUEUE_SINGLE_STREAM?! >>>>> [Ahmed] The main advantage in async use is lost if we force all relat= ed >>>>> ops to be in the same burst. if we do that, then we might as well mer= ge >>>>> all the ops into one op. That would reduce the overhead. >>>>> The use mode I am proposing is only useful in cases where the data >>>>> becomes available after the first enqueue occurred. I want to allow t= he >>>>> caller to enqueue the second set of data as soon as it is available >>>>> regardless of whether or not the HW has already started working on th= e >>>>> first op inflight. >>>> [Shally] @ahmed, Ok.. seems I missed a point here. So, confirm me fol= lowing: >>>> >>>> As per current description in doc, expected stateful usage is: >>>> enqueue (op1) --> dequeue(op1) --> enqueue(op2) >>>> >>>> but you're suggesting to allow an option to change it to >>>> >>>> enqueue(op1) -->enqueue(op2) >>>> >>>> i.e. multiple ops from same stream can be put in-flight via subsequen= t enqueue_burst() calls without >>> waiting to dequeue previous ones as PMD support it . So, no change to c= urrent definition of a burst. It will >>> still carry multiple streams where each op belonging to different strea= m ?! >>> [Ahmed] Correct. I guess a user could put two ops on the same burst tha= t >>> belong to the same stream. In that case it would be more efficient to >>> merge the ops using scatter gather. Nonetheless, I would not add checks >>> in my implementation to limit that use. The hardware does not perceive = a >>> difference between ops that came on one burst and ops that came on two >>> different bursts. to the hardware they are all ops. What matters is >>> which stream each op belongs to. >>>> if yes, then seems your HW can be setup for multiple streams so it is = efficient for your case to support it >>> in DPDK PMD layer but our hw doesn't by-default and need SW to back it.= Given that, I also suggest to >>> enable it under some feature flag. >>>> However it looks like an add-on and if it doesn't change current defin= ition of a burst and minimum >>> expectation set on stateful processing described in this document, then= IMO, you can propose this feature >>> as an incremental patch on baseline version, in absence of which, >>>> application will exercise stateful processing as described here (enq->= deq->enq). Thoughts? >>> [Ahmed] Makes sense. I was worried that there might be fundamental >>> limitations to this mode of use in the API design. That is why I wanted >>> to share this use mode with you guys and see if it can be accommodated >>> using an incremental patch in the future. >>>>>> [Fiona] Am curious about Ahmed's response to this. I didn't get that= a burst should carry only one >>> stream >>>>>> Or get how this makes a difference? As there can be many enqueue_bur= st() calls done before an >>> dequeue_burst() >>>>>> Maybe you're thinking the enqueue_burst() would be a blocking call t= hat would not return until all the >>> ops >>>>>> had been processed? This would turn it into a synchronous call which= isn't the intent. >>>>> [Ahmed] Agreed, a blocking or even a buffering software layer that ba= by >>>>> sits the hardware does not fundamentally change the parameters of the >>>>> system as a whole. It just moves workflow management complexity down >>>>> into the DPDK software layer. Rather there are real latency and >>>>> throughput advantages (because of caching) that I want to expose. >>>>> >> [Fiona] ok, so I think we've agreed that this can be an option, as long = as not required of >> PMDs and enabled under an explicit capability - named something like >> ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS >> @Ahmed, we'll leave it up to you to define details. >> What's necessary is API text to describe the expected behaviour on any e= rror conditions, >> the pause/resume API, whether an API is expected to clean up if resume d= oesn't happen >> and if there's any time limit on this, etc >> But I wouldn't expect any changes to existing burst APIs, and all PMDs a= nd applications >> must be able to handle the default behaviour, i.e. with this capability = disabled. >> Specifically even if a PMD has this capability, if an application ignore= s it and only sends >> one op at a time, if a PMD returns OUT_OF_SPACE_RECOVERABLE the stream s= hould >> not be in a paused state and the PMD should not wait for a resume() to h= andle the >> next op sent for that stream. >> Does that make sense? >[Ahmed] That make sense. When this mode is enabled then additional >functions must be called to resume the work, even if only one op was in >flight. When this mode is not enabled then the PMD assumes that the >caller will never enqueue a stateful op before receiving a response to >the one that precedes it in a stream [Shally] @ahmed , just to confirm on this >When this mode is not enabled then the PMD assumes that the caller will ne= ver enqueue a stateful op ... I think what we want to ensure reverse of it i.e. "if mode is *enabled*, th= en also PMD should assume that caller can use enqueue->dequeue->enqueue seq= uence for stateful processing and if on deque,=20 he discover OUT_OF_SPACE_RECOVERABLE and call enqueue() again to handle it = , that should be also be supported by PMD" .=20 In a sense, an application written for one PMD which doesn't have this capa= bility should also work for PMD which has this capability. >> >>>>> /// snip /// >>>> >>