From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-AM5-obe.outbound.protection.outlook.com (mail-eopbgr30064.outbound.protection.outlook.com [40.107.3.64]) by dpdk.org (Postfix) with ESMTP id 825B010B7 for ; Thu, 22 Feb 2018 20:36:04 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nxp.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=oNZvDkceCMc2tkYx3pIAQ5xl1/MQGfwi/D70IrkHtGg=; b=iBh8eVYSFasOYFnxPbpDXEq/d3tMfHbIoxTGe5YBtOXV+aa2vZYpoHDp9IJALMZqm7jvTOiggh+4f6XIQZPQZfI9JyNi2PaWTHNPqv8ijfolVABahmzoFRQkfDt1U3vnbHaipoJHatC5QnDhZMrRJTVu2FR/gtfaeTcNU2TQPwc= Received: from DB3PR0402MB3852.eurprd04.prod.outlook.com (52.134.71.143) by DB3PR0402MB3834.eurprd04.prod.outlook.com (52.134.71.33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.506.18; Thu, 22 Feb 2018 19:36:02 +0000 Received: from DB3PR0402MB3852.eurprd04.prod.outlook.com ([fe80::8554:d533:15e:1376]) by DB3PR0402MB3852.eurprd04.prod.outlook.com ([fe80::8554:d533:15e:1376%13]) with mapi id 15.20.0506.023; Thu, 22 Feb 2018 19:36:00 +0000 From: Ahmed Mansour To: "Verma, Shally" , "Trahe, Fiona" , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHg== Date: Thu, 22 Feb 2018 19:35:59 +0000 Message-ID: References: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B4358931F82B@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589321277@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589324E3A@IRSMSX101.ger.corp.intel.com> Accept-Language: en-CA, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=ahmed.mansour@nxp.com; x-originating-ip: [192.88.168.1] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB3PR0402MB3834; 7:VE6TTNCMdGiTj9/4hg5UQGWs9XgP7PFziwhAPdq3eE+2u7ZkXOuoYMAzmN3h1AQtOF5sDj4zSCxYQak2BPYK81OCCaVYtZ+IW52GQ9vjg7g9s5JwIDsENvVAvseZBYxShLedYaeuwTIvsaTGQwacnDH+6WPnfxhg271eRPmuu3Cbuimwivdde/8wEVVlb5twcTHaS3A+z8gEuqF0hBwqTTf8SjTyLRmtIhxJgyPXvZDWITwUPhtSvPwMl06PHMGU x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(39860400002)(376002)(346002)(39380400002)(366004)(396003)(57704003)(13464003)(199004)(189003)(110136005)(54906003)(561944003)(105586002)(33656002)(229853002)(106356001)(66066001)(3846002)(99286004)(7696005)(5890100001)(14454004)(5250100002)(2501003)(4326008)(102836004)(93886005)(8676002)(2900100001)(186003)(74316002)(26005)(59450400001)(6246003)(7736002)(305945005)(316002)(6116002)(6506007)(53546011)(76176011)(5660300001)(3660700001)(478600001)(53946003)(9686003)(68736007)(53936002)(2906002)(81156014)(3280700002)(8936002)(81166006)(86362001)(6436002)(25786009)(97736004)(55016002)(579004); DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR0402MB3834; H:DB3PR0402MB3852.eurprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 163dfd55-b407-4cb7-6ba2-08d57a2b8105 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(3008032)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603307)(7153060)(7193020); SRVR:DB3PR0402MB3834; x-ms-traffictypediagnostic: DB3PR0402MB3834: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(244540007438412)(278428928389397)(271806183753584)(185117386973197)(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001082)(6040501)(2401047)(8121501046)(5005006)(3231101)(944501161)(93006095)(93001095)(10201501046)(3002001)(6055026)(6041288)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123558120)(20161123562045)(20161123564045)(6072148)(201708071742011); SRVR:DB3PR0402MB3834; BCL:0; PCL:0; RULEID:; SRVR:DB3PR0402MB3834; x-forefront-prvs: 059185FE08 received-spf: None (protection.outlook.com: nxp.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: AJ1H6+GIkXHRFXybn/G981fJzKUx1fXS2QV89dVOwtRTmk4En9HpfJi4q5biGDcn5L1WA0LRTzpDhVJWQY5ufrjTcmShL8emfSJupDYHxOGjZCkH6jZuoahApsl71PhnGCbjsIB0PJUa7TNVTqCa/IEQ2SH5CjIqlJj/aqmMwDU= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-Network-Message-Id: 163dfd55-b407-4cb7-6ba2-08d57a2b8105 X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Feb 2018 19:36:00.8029 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR0402MB3834 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Feb 2018 19:36:04 -0000 On 2/21/2018 11:47 PM, Verma, Shally wrote:=0A= >=0A= >> -----Original Message-----=0A= >> From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]=0A= >> Sent: 22 February 2018 01:06=0A= >> To: Trahe, Fiona ; Verma, Shally ; dev@dpdk.org=0A= >> Cc: Athreya, Narayana Prasad ; Gupta,= Ashish ; Sahu, Sunila=0A= >> ; De Lara Guarch, Pablo ; Challa, Mahipal=0A= >> ; Jain, Deepak K ; H= emant Agrawal ; Roy=0A= >> Pledge ; Youri Querry =0A= >> Subject: Re: [RFC v2] doc compression API for DPDK=0A= >>=0A= >> On 2/21/2018 9:35 AM, Trahe, Fiona wrote:=0A= >>> Hi Ahmed, Shally,=0A= >>>=0A= >>>=0A= >>>> -----Original Message-----=0A= >>>> From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]=0A= >>>> Sent: Tuesday, February 20, 2018 7:56 PM=0A= >>>> To: Verma, Shally ; Trahe, Fiona ; dev@dpdk.org=0A= >>>> Cc: Athreya, Narayana Prasad ; Gupt= a, Ashish=0A= >>>> ; Sahu, Sunila ; De L= ara Guarch, Pablo=0A= >>>> ; Challa, Mahipal ; Jain, Deepak K=0A= >>>> ; Hemant Agrawal ; Ro= y Pledge=0A= >>>> ; Youri Querry =0A= >>>> Subject: Re: [RFC v2] doc compression API for DPDK=0A= >>>>=0A= >>>> /// snip ///=0A= >>>>>>>>>>>>>>>>>>> D.2.1 Stateful operation state maintenance=0A= >>>>>>>>>>>>>>>>>>> -------------------------------------------------------= --------=0A= >>>>>>>>>>>>>>>>>>> It is always an ideal expectation from application that= it should parse=0A= >>>>>>>>>>>>>>>>>> through all related chunk of source data making its mbuf= -chain and=0A= >>>>>>>>>>>>>>>> enqueue=0A= >>>>>>>>>>>>>>>>>> it for stateless processing.=0A= >>>>>>>>>>>>>>>>>>> However, if it need to break it into several enqueue_bu= rst() calls, then=0A= >>>>>>>>>>>>>>>> an=0A= >>>>>>>>>>>>>>>>>> expected call flow would be something like:=0A= >>>>>>>>>>>>>>>>>>> enqueue_burst( |op.no_flush |)=0A= >>>>>>>>>>>>>>>>>> [Ahmed] The work is now in flight to the PMD.The user wi= ll call dequeue=0A= >>>>>>>>>>>>>>>>>> burst in a loop until all ops are received. Is this corr= ect?=0A= >>>>>>>>>>>>>>>>>>=0A= >>>>>>>>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue nex= t=0A= >>>>>>>>>>>>>>>>> [Shally] Yes. Ideally every submitted op need to be deque= ued. However=0A= >>>>>>>>>>>>>>>> this illustration is specifically in=0A= >>>>>>>>>>>>>>>>> context of stateful op processing to reflect if a stream = is broken into=0A= >>>>>>>>>>>>>>>> chunks, then each chunk should be=0A= >>>>>>>>>>>>>>>>> submitted as one op at-a-time with type =3D STATEFUL and = need to be=0A= >>>>>>>>>>>>>>>> dequeued first before next chunk is=0A= >>>>>>>>>>>>>>>>> enqueued.=0A= >>>>>>>>>>>>>>>>>=0A= >>>>>>>>>>>>>>>>>>> enqueue_burst( |op.no_flush |)=0A= >>>>>>>>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue nex= t=0A= >>>>>>>>>>>>>>>>>>> enqueue_burst( |op.full_flush |)=0A= >>>>>>>>>>>>>>>>>> [Ahmed] Why now allow multiple work items in flight? I u= nderstand that=0A= >>>>>>>>>>>>>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we= just=0A= >>>>>>>>>>>>>>>> distinguish=0A= >>>>>>>>>>>>>>>>>> the response in exception cases?=0A= >>>>>>>>>>>>>>>>> [Shally] Multiples ops are allowed in flight, however con= dition is each op in=0A= >>>>>>>>>>>>>>>> such case is independent of=0A= >>>>>>>>>>>>>>>>> each other i.e. belong to different streams altogether.= =0A= >>>>>>>>>>>>>>>>> Earlier (as part of RFC v1 doc) we did consider the propo= sal to process all=0A= >>>>>>>>>>>>>>>> related chunks of data in single=0A= >>>>>>>>>>>>>>>>> burst by passing them as ops array but later found that a= s not-so-useful for=0A= >>>>>>>>>>>>>>>> PMD handling for various=0A= >>>>>>>>>>>>>>>>> reasons. You may please refer to RFC v1 doc review commen= ts for same.=0A= >>>>>>>>>>>>>>>> [Fiona] Agree with Shally. In summary, as only one op can = be processed at a=0A= >>>>>>>>>>>>>>>> time, since each needs the=0A= >>>>>>>>>>>>>>>> state of the previous, to allow more than 1 op to be in-fl= ight at a time would=0A= >>>>>>>>>>>>>>>> force PMDs to implement internal queueing and exception ha= ndling for=0A= >>>>>>>>>>>>>>>> OUT_OF_SPACE conditions you mention.=0A= >>>>>>>>>>>>>> [Ahmed] But we are putting the ops on qps which would make t= hem=0A= >>>>>>>>>>>>>> sequential. Handling OUT_OF_SPACE conditions would be a litt= le bit more=0A= >>>>>>>>>>>>>> complex but doable.=0A= >>>>>>>>>>>>> [Fiona] In my opinion this is not doable, could be very ineff= icient.=0A= >>>>>>>>>>>>> There may be many streams.=0A= >>>>>>>>>>>>> The PMD would have to have an internal queue per stream so=0A= >>>>>>>>>>>>> it could adjust the next src offset and length in the OUT_OF_= SPACE case.=0A= >>>>>>>>>>>>> And this may ripple back though all subsequent ops in the str= eam as each=0A= >>>>>>>>>>>>> source len is increased and its dst buffer is not big enough.= =0A= >>>>>>>>>>>> [Ahmed] Regarding multi op OUT_OF_SPACE handling.=0A= >>>>>>>>>>>> The caller would still need to adjust=0A= >>>>>>>>>>>> the src length/output buffer as you say. The PMD cannot handle= =0A= >>>>>>>>>>>> OUT_OF_SPACE internally.=0A= >>>>>>>>>>>> After OUT_OF_SPACE occurs, the PMD should reject all ops in th= is stream=0A= >>>>>>>>>>>> until it gets explicit=0A= >>>>>>>>>>>> confirmation from the caller to continue working on this strea= m. Any ops=0A= >>>>>>>>>>>> received by=0A= >>>>>>>>>>>> the PMD should be returned to the caller with status STREAM_PA= USED since=0A= >>>>>>>>>>>> the caller did not=0A= >>>>>>>>>>>> explicitly acknowledge that it has solved the OUT_OF_SPACE iss= ue.=0A= >>>>>>>>>>>> These semantics can be enabled by adding a new function to the= API=0A= >>>>>>>>>>>> perhaps stream_resume().=0A= >>>>>>>>>>>> This allows the caller to indicate that it acknowledges that i= t has seen=0A= >>>>>>>>>>>> the issue and this op=0A= >>>>>>>>>>>> should be used to resolve the issue. Implementations that do n= ot support=0A= >>>>>>>>>>>> this mode of use=0A= >>>>>>>>>>>> can push back immediately after one op is in flight. Implement= ations=0A= >>>>>>>>>>>> that support this use=0A= >>>>>>>>>>>> mode can allow many ops from the same session=0A= >>>>>>>>>>>>=0A= >>>>>>>>>>> [Shally] Is it still in context of having single burst where al= l op belongs to one stream? If yes, I=0A= >>>> would=0A= >>>>>>>> still=0A= >>>>>>>>>>> say it would add an overhead to PMDs especially if it is expect= ed to work closer to HW (which I=0A= >>>> think=0A= >>>>>>>> is=0A= >>>>>>>>>>> the case with DPDK PMD).=0A= >>>>>>>>>>> Though your approach is doable but why this all cannot be in a = layer above PMD? i.e. a layer=0A= >>>> above=0A= >>>>>>>> PMD=0A= >>>>>>>>>>> can either pass one-op at a time with burst size =3D 1 OR can m= ake chained mbuf of input and=0A= >>>> output=0A= >>>>>>>> and=0A= >>>>>>>>>>> pass than as one op.=0A= >>>>>>>>>>> Is it just to ease applications of chained mbuf burden or do yo= u see any performance /use-case=0A= >>>>>>>>>>> impacting aspect also?=0A= >>>>>>>>>>>=0A= >>>>>>>>>>> if it is in context where each op belong to different stream in= a burst, then why do we need=0A= >>>>>>>>>>> stream_pause and resume? It is a expectations from app to pass = more output buffer with=0A= >>>> consumed=0A= >>>>>>>> + 1=0A= >>>>>>>>>>> from next call onwards as it has already=0A= >>>>>>>>>>> seen OUT_OF_SPACE.=0A= >>>>>>>>> [Ahmed] Yes, this would add extra overhead to the PMD. Our PMD=0A= >>>>>>>>> implementation rejects all ops that belong to a stream that has e= ntered=0A= >>>>>>>>> "RECOVERABLE" state for one reason or another. The caller must=0A= >>>>>>>>> acknowledge explicitly that it has received news of the problem b= efore=0A= >>>>>>>>> the PMD allows this stream to exit "RECOVERABLE" state. I agree w= ith you=0A= >>>>>>>>> that implementing this functionality in the software layer above = the PMD=0A= >>>>>>>>> is a bad idea since the latency reductions are lost.=0A= >>>>>>>> [Shally] Just reiterating, I rather meant other way around i.e. I = see it easier to put all such complexity=0A= >>>> in a=0A= >>>>>>>> layer above PMD.=0A= >>>>>>>>=0A= >>>>>>>>> This setup is useful in latency sensitive applications where the = latency=0A= >>>>>>>>> of buffering multiple ops into one op is significant. We found la= tency=0A= >>>>>>>>> makes a significant difference in search applications where the P= MD=0A= >>>>>>>>> competes with software decompression.=0A= >>>>>>> [Fiona] I see, so when all goes well, you get best-case latency, bu= t when=0A= >>>>>>> out-of-space occurs latency will probably be worse.=0A= >>>>>> [Ahmed] This is exactly right. This use mode assumes out-of-space is= a=0A= >>>>>> rare occurrence. Recovering from it should take similar time to=0A= >>>>>> synchronous implementations. The caller gets OUT_OF_SPACE_RECOVERABL= E in=0A= >>>>>> both sync and async use. The caller can fix up the op and send it ba= ck=0A= >>>>>> to the PMD to continue work just as would be done in sync. Nonethele= ss,=0A= >>>>>> the added complexity is not justifiable if out-of-space is very comm= on=0A= >>>>>> since the recoverable state will be the limiting factor that forces= =0A= >>>>>> synchronicity.=0A= >>>>>>>>>> [Fiona] I still have concerns with this and would not want to su= pport in our PMD.=0A= >>>>>>>>>> TO make sure I understand, you want to send a burst of ops, with= several from same stream.=0A= >>>>>>>>>> If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not = process any=0A= >>>>>>>>>> subsequent ops in that stream.=0A= >>>>>>>>>> Should it return them in a dequeue_burst() with status still NOT= _PROCESSED?=0A= >>>>>>>>>> Or somehow drop them? How?=0A= >>>>>>>>>> While still processing ops form other streams.=0A= >>>>>>>>> [Ahmed] This is exactly correct. It should return them with=0A= >>>>>>>>> NOT_PROCESSED. Yes, the PMD should continue processing other stre= ams.=0A= >>>>>>>>>> As we want to offload each op to hardware with as little CPU pro= cessing as possible we=0A= >>>>>>>>>> would not want to open up each op to see which stream it's attac= hed to and=0A= >>>>>>>>>> make decisions to do per-stream storage, or drop it, or bypass h= w and dequeue without=0A= >>>> processing.=0A= >>>>>>>>> [Ahmed] I think I might have missed your point here, but I will t= ry to=0A= >>>>>>>>> answer. There is no need to "cushion" ops in DPDK. DPDK should se= nd ops=0A= >>>>>>>>> to the PMD and the PMD should reject until stream_continue() is c= alled.=0A= >>>>>>>>> The next op to be sent by the user will have a special marker in = it to=0A= >>>>>>>>> inform the PMD to continue working on this stream. Alternatively = the=0A= >>>>>>>>> DPDK layer can be made "smarter" to fail during the enqueue by ch= ecking=0A= >>>>>>>>> the stream and its state, but like you say this adds additional C= PU=0A= >>>>>>>>> overhead during the enqueue.=0A= >>>>>>>>> I am curious. In a simple synchronous use case. How do we prevent= users=0A= >>>>>>>> >from putting multiple ops in flight that belong to a single strea= m? Do=0A= >>>>>>>>> we just currently say it is undefined behavior? Otherwise we woul= d have=0A= >>>>>>>>> to check the stream and incur the CPU overhead.=0A= >>>>>>> [Fiona] We don't do anything to prevent it. It's undefined. IMO on = data path in=0A= >>>>>>> DPDK model we expect good behaviour and don't have to error check f= or things like this.=0A= >>>>>> [Ahmed] This makes sense. We also assume good behavior.=0A= >>>>>>> In our PMD if we got a burst of 20 ops, we allocate 20 spaces on th= e hw q, then=0A= >>>>>>> build and send those messages. If we found an op from a stream whic= h already=0A= >>>>>>> had one inflight, we'd have to hold that back, store in a sw stream= -specific holding queue,=0A= >>>>>>> only send 19 to hw. We cannot send multiple ops from same stream to= =0A= >>>>>>> the hw as it fans them out and does them in parallel.=0A= >>>>>>> Once the enqueue_burst() returns, there is no processing=0A= >>>>>>> context which would spot that the first has completed=0A= >>>>>>> and send the next op to the hw. On a dequeue_burst() we would spot = this,=0A= >>>>>>> in that context could process the next op in the stream.=0A= >>>>>>> On out of space, instead of processing the next op we would have to= transfer=0A= >>>>>>> all unprocessed ops from the stream to the dequeue result.=0A= >>>>>>> Some parts of this are doable, but seems likely to add a lot more l= atency,=0A= >>>>>>> we'd need to add extra threads and timers to move ops from the sw= =0A= >>>>>>> queue to the hw q to get any benefit, and these constructs would ad= d=0A= >>>>>>> context switching and CPU cycles. So we prefer to push this respons= ibility=0A= >>>>>>> to above the API and it can achieve similar.=0A= >>>>>> [Ahmed] I see what you mean. Our workflow is almost exactly the same= =0A= >>>>>> with our hardware, but the fanning out is done by the hardware based= on=0A= >>>>>> the stream and ops that belong to the same stream are never allowed = to=0A= >>>>>> go out of order. Otherwise the data would be corrupted. Likewise the= =0A= >>>>>> hardware is responsible for checking the state of the stream and=0A= >>>>>> returning frames as NOT_PROCESSED to the software=0A= >>>>>>>>>> Maybe we could add a capability if this behaviour is important f= or you?=0A= >>>>>>>>>> e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ?=0A= >>>>>>>>>> Our PMD would set this to 0. And expect no more than one op from= a stateful stream=0A= >>>>>>>>>> to be in flight at any time.=0A= >>>>>>>>> [Ahmed] That makes sense. This way the different DPDK implementat= ions do=0A= >>>>>>>>> not have to add extra checking for unsupported cases.=0A= >>>>>>>> [Shally] @ahmed, If I summarise your use-case, this is how to want= to PMD to support?=0A= >>>>>>>> - a burst *carry only one stream* and all ops then assumed to be b= elong to that stream? (please=0A= >>>> note,=0A= >>>>>>>> here burst is not carrying more than one stream)=0A= >>>>>> [Ahmed] No. In this use case the caller sets up an op and enqueues a= =0A= >>>>>> single op. Then before the response comes back from the PMD the call= er=0A= >>>>>> enqueues a second op on the same stream.=0A= >>>>>>>> -PMD will submit one op at a time to HW?=0A= >>>>>> [Ahmed] I misunderstood what PMD means. I used it throughout to mean= the=0A= >>>>>> HW. I used DPDK to mean the software implementation that talks to th= e=0A= >>>>>> hardware.=0A= >>>>>> The software will submit all ops immediately. The hardware has to fi= gure=0A= >>>>>> out what to do with the ops depending on what stream they belong to.= =0A= >>>>>>>> -if processed successfully, push it back to completion queue with = status =3D SUCCESS. If failed or run to=0A= >>>>>>>> into OUT_OF_SPACE, then push it to completion queue with status = =3D FAILURE/=0A= >>>>>>>> OUT_OF_SPACE_RECOVERABLE and rest with status =3D NOT_PROCESSED an= d return with enqueue=0A= >>>> count=0A= >>>>>>>> =3D total # of ops submitted originally with burst?=0A= >>>>>> [Ahmed] This is exactly what I had in mind. all ops will be submitte= d to=0A= >>>>>> the HW. The HW will put all of them on the completion queue with the= =0A= >>>>>> correct status exactly as you say.=0A= >>>>>>>> -app assumes all have been enqueued, so it go and dequeue all ops= =0A= >>>>>>>> -on seeing an op with OUT_OF_SPACE_RECOVERABLE, app resubmit a bur= st of ops with call to=0A= >>>>>>>> stream_continue/resume API starting from op which encountered OUT_= OF_SPACE and others as=0A= >>>>>>>> NOT_PROCESSED with updated input and output buffer?=0A= >>>>>> [Ahmed] Correct this is what we do today in our proprietary API.=0A= >>>>>>>> -repeat until *all* are dequeued with status =3D SUCCESS or *any* = with status =3D FAILURE? If anytime=0A= >>>>>>>> failure is seen, then app start whole processing all over again or= just drop this burst?!=0A= >>>>>> [Ahmed] The app has the choice on how to proceed. If the issue is=0A= >>>>>> recoverable then the application can continue this stream from where= it=0A= >>>>>> stopped. if the failure is unrecoverable then the application should= =0A= >>>>>> first fix the problem and start from the beginning of the stream.=0A= >>>>>>>> If all of above is true, then I think we should add another API su= ch as=0A= >>>> rte_comp_enque_single_stream()=0A= >>>>>>>> which will be functional under Feature Flag =3D ALLOW_ENQUEUE_MULT= IPLE_STATEFUL_OPS or better=0A= >>>>>>>> name is SUPPORT_ENQUEUE_SINGLE_STREAM?!=0A= >>>>>> [Ahmed] The main advantage in async use is lost if we force all rela= ted=0A= >>>>>> ops to be in the same burst. if we do that, then we might as well me= rge=0A= >>>>>> all the ops into one op. That would reduce the overhead.=0A= >>>>>> The use mode I am proposing is only useful in cases where the data= =0A= >>>>>> becomes available after the first enqueue occurred. I want to allow = the=0A= >>>>>> caller to enqueue the second set of data as soon as it is available= =0A= >>>>>> regardless of whether or not the HW has already started working on t= he=0A= >>>>>> first op inflight.=0A= >>>>> [Shally] @ahmed, Ok.. seems I missed a point here. So, confirm me fo= llowing:=0A= >>>>>=0A= >>>>> As per current description in doc, expected stateful usage is:=0A= >>>>> enqueue (op1) --> dequeue(op1) --> enqueue(op2)=0A= >>>>>=0A= >>>>> but you're suggesting to allow an option to change it to=0A= >>>>>=0A= >>>>> enqueue(op1) -->enqueue(op2)=0A= >>>>>=0A= >>>>> i.e. multiple ops from same stream can be put in-flight via subseque= nt enqueue_burst() calls without=0A= >>>> waiting to dequeue previous ones as PMD support it . So, no change to = current definition of a burst. It will=0A= >>>> still carry multiple streams where each op belonging to different stre= am ?!=0A= >>>> [Ahmed] Correct. I guess a user could put two ops on the same burst th= at=0A= >>>> belong to the same stream. In that case it would be more efficient to= =0A= >>>> merge the ops using scatter gather. Nonetheless, I would not add check= s=0A= >>>> in my implementation to limit that use. The hardware does not perceive= a=0A= >>>> difference between ops that came on one burst and ops that came on two= =0A= >>>> different bursts. to the hardware they are all ops. What matters is=0A= >>>> which stream each op belongs to.=0A= >>>>> if yes, then seems your HW can be setup for multiple streams so it is= efficient for your case to support it=0A= >>>> in DPDK PMD layer but our hw doesn't by-default and need SW to back it= . Given that, I also suggest to=0A= >>>> enable it under some feature flag.=0A= >>>>> However it looks like an add-on and if it doesn't change current defi= nition of a burst and minimum=0A= >>>> expectation set on stateful processing described in this document, the= n IMO, you can propose this feature=0A= >>>> as an incremental patch on baseline version, in absence of which,=0A= >>>>> application will exercise stateful processing as described here (enq-= >deq->enq). Thoughts?=0A= >>>> [Ahmed] Makes sense. I was worried that there might be fundamental=0A= >>>> limitations to this mode of use in the API design. That is why I wante= d=0A= >>>> to share this use mode with you guys and see if it can be accommodated= =0A= >>>> using an incremental patch in the future.=0A= >>>>>>> [Fiona] Am curious about Ahmed's response to this. I didn't get tha= t a burst should carry only one=0A= >>>> stream=0A= >>>>>>> Or get how this makes a difference? As there can be many enqueue_bu= rst() calls done before an=0A= >>>> dequeue_burst()=0A= >>>>>>> Maybe you're thinking the enqueue_burst() would be a blocking call = that would not return until all the=0A= >>>> ops=0A= >>>>>>> had been processed? This would turn it into a synchronous call whic= h isn't the intent.=0A= >>>>>> [Ahmed] Agreed, a blocking or even a buffering software layer that b= aby=0A= >>>>>> sits the hardware does not fundamentally change the parameters of th= e=0A= >>>>>> system as a whole. It just moves workflow management complexity down= =0A= >>>>>> into the DPDK software layer. Rather there are real latency and=0A= >>>>>> throughput advantages (because of caching) that I want to expose.=0A= >>>>>>=0A= >>> [Fiona] ok, so I think we've agreed that this can be an option, as long= as not required of=0A= >>> PMDs and enabled under an explicit capability - named something like=0A= >>> ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS=0A= >>> @Ahmed, we'll leave it up to you to define details.=0A= >>> What's necessary is API text to describe the expected behaviour on any = error conditions,=0A= >>> the pause/resume API, whether an API is expected to clean up if resume = doesn't happen=0A= >>> and if there's any time limit on this, etc=0A= >>> But I wouldn't expect any changes to existing burst APIs, and all PMDs = and applications=0A= >>> must be able to handle the default behaviour, i.e. with this capability= disabled.=0A= >>> Specifically even if a PMD has this capability, if an application ignor= es it and only sends=0A= >>> one op at a time, if a PMD returns OUT_OF_SPACE_RECOVERABLE the stream = should=0A= >>> not be in a paused state and the PMD should not wait for a resume() to = handle the=0A= >>> next op sent for that stream.=0A= >>> Does that make sense?=0A= >> [Ahmed] That make sense. When this mode is enabled then additional=0A= >> functions must be called to resume the work, even if only one op was in= =0A= >> flight. When this mode is not enabled then the PMD assumes that the=0A= >> caller will never enqueue a stateful op before receiving a response to= =0A= >> the one that precedes it in a stream=0A= > [Shally] @ahmed , just to confirm on this=0A= >=0A= >> When this mode is not enabled then the PMD assumes that the caller will = never enqueue a stateful op ...=0A= > I think what we want to ensure reverse of it i.e. "if mode is *enabled*, = then also PMD should assume that caller can use enqueue->dequeue->enqueue s= equence for stateful processing and if on deque, =0A= > he discover OUT_OF_SPACE_RECOVERABLE and call enqueue() again to handle i= t , that should be also be supported by PMD" . =0A= > In a sense, an application written for one PMD which doesn't have this ca= pability should also work for PMD which has this capability.=0A= >=0A= [Ahmed] That creates a race condition. Async stateful i.e.=0A= enqueue->enqueue->dequeue requires the user to explicitly acknowledge=0A= and solve the recoverable op. The PMD cannot assume that any particular=0A= op is the response to a recoverable condition. A lock around enqueue=0A= dequeue also does not resolve the issue since the decision to resolve=0A= the issue must be entirely made by the caller and the timing of that=0A= decision is outside the knowledge of the PMD.=0A= >>>>>> /// snip ///=0A= >=0A= =0A=