From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 8C5871B1D9 for ; Fri, 16 Feb 2018 14:04:37 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Feb 2018 05:04:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,519,1511856000"; d="scan'208";a="18659130" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by orsmga008.jf.intel.com with ESMTP; 16 Feb 2018 05:04:33 -0800 Received: from irsmsx156.ger.corp.intel.com (10.108.20.68) by IRSMSX107.ger.corp.intel.com (163.33.3.99) with Microsoft SMTP Server (TLS) id 14.3.319.2; Fri, 16 Feb 2018 13:04:32 +0000 Received: from irsmsx101.ger.corp.intel.com ([169.254.1.188]) by IRSMSX156.ger.corp.intel.com ([169.254.3.153]) with mapi id 14.03.0319.002; Fri, 16 Feb 2018 13:04:32 +0000 From: "Trahe, Fiona" To: "Verma, Shally" , Ahmed Mansour , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry , "Trahe, Fiona" Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHghlU6IQAA3uPsA= Date: Fri, 16 Feb 2018 13:04:32 +0000 Message-ID: <348A99DA5F5B7549AA880327E580B43589321277@IRSMSX101.ger.corp.intel.com> References: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B4358931F82B@IRSMSX101.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZDk3NWExYjAtMmVmZi00ZmZhLWFjNzUtYTYwMzA1MGQyZTFmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IkZwemdvUG0rYmRGVjJBdUxESmVVTGliV01BZEVheVRmbTRKMlE5UXRURjA9In0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Feb 2018 13:04:38 -0000 > -----Original Message----- > From: Verma, Shally [mailto:Shally.Verma@cavium.com] > Sent: Friday, February 16, 2018 7:17 AM > To: Ahmed Mansour ; Trahe, Fiona ; > dev@dpdk.org > Cc: Athreya, Narayana Prasad ; Gupta, = Ashish > ; Sahu, Sunila ; De Lara= Guarch, Pablo > ; Challa, Mahipal ; Jain, Deepak K > ; Hemant Agrawal ; Roy P= ledge > ; Youri Querry > Subject: RE: [RFC v2] doc compression API for DPDK >=20 > Hi Fiona, Ahmed >=20 > >-----Original Message----- > >From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] > >Sent: 16 February 2018 02:40 > >To: Trahe, Fiona ; Verma, Shally ; dev@dpdk.org > >Cc: Athreya, Narayana Prasad ; Gupta,= Ashish > ; Sahu, Sunila > >; De Lara Guarch, Pablo ; Challa, > Mahipal > >; Jain, Deepak K ; H= emant Agrawal > ; Roy > >Pledge ; Youri Querry > >Subject: Re: [RFC v2] doc compression API for DPDK > > > >On 2/15/2018 1:47 PM, Trahe, Fiona wrote: > >> Hi Shally, Ahmed, > >> Sorry for the delay in replying, > >> Comments below > >> > >>> -----Original Message----- > >>> From: Verma, Shally [mailto:Shally.Verma@cavium.com] > >>> Sent: Wednesday, February 14, 2018 7:41 AM > >>> To: Ahmed Mansour ; Trahe, Fiona ; > >>> dev@dpdk.org > >>> Cc: Athreya, Narayana Prasad ; Gup= ta, Ashish > >>> ; Sahu, Sunila ; De = Lara Guarch, Pablo > >>> ; Challa, Mahipal ; Jain, Deepak K > >>> ; Hemant Agrawal ; R= oy Pledge > >>> ; Youri Querry > >>> Subject: RE: [RFC v2] doc compression API for DPDK > >>> > >>> Hi Ahmed, > >>> > >>>> -----Original Message----- > >>>> From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] > >>>> Sent: 02 February 2018 01:53 > >>>> To: Trahe, Fiona ; Verma, Shally ; > dev@dpdk.org > >>>> Cc: Athreya, Narayana Prasad ; Gu= pta, Ashish > >>> ; Sahu, Sunila > >>>> ; De Lara Guarch, Pablo ; Challa, > >>> Mahipal > >>>> ; Jain, Deepak K ; Hemant Agrawal > >>> ; Roy > >>>> Pledge ; Youri Querry > >>>> Subject: Re: [RFC v2] doc compression API for DPDK > >>>> > >>>> On 1/31/2018 2:03 PM, Trahe, Fiona wrote: > >>>>> Hi Ahmed, Shally, > >>>>> > >>>>> ///snip/// > >>>>>>>>>>> D.1.1 Stateless and OUT_OF_SPACE > >>>>>>>>>>> ------------------------------------------------ > >>>>>>>>>>> OUT_OF_SPACE is a condition when output buffer runs out of sp= ace > >>>>>>>> and > >>>>>>>>>> where PMD still has more data to produce. If PMD run into such > >>>>>>>> condition, > >>>>>>>>>> then it's an error condition in stateless processing. > >>>>>>>>>>> In such case, PMD resets itself and return with status > >>>>>>>>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=3Dconsumed=3D0 > >>>>>>>> i.e. > >>>>>>>>>> no input read, no output written. > >>>>>>>>>>> Application can resubmit an full input with larger output buf= fer size. > >>>>>>>>>> [Ahmed] Can we add an option to allow the user to read the dat= a that > >>>>>>>> was > >>>>>>>>>> produced while still reporting OUT_OF_SPACE? this is mainly us= eful for > >>>>>>>>>> decompression applications doing search. > >>>>>>>>> [Shally] It is there but applicable for stateful operation type= (please refer to > >>>>>>>> handling out_of_space under > >>>>>>>>> "Stateful Section"). > >>>>>>>>> By definition, "stateless" here means that application (such as= IPCOMP) > >>>>>>>> knows maximum output size > >>>>>>>>> guaranteedly and ensure that uncompressed data size cannot grow= more > >>>>>>>> than provided output buffer. > >>>>>>>>> Such apps can submit an op with type =3D STATELESS and provide = full input, > >>>>>>>> then PMD assume it has > >>>>>>>>> sufficient input and output and thus doesn't need to maintain a= ny contexts > >>>>>>>> after op is processed. > >>>>>>>>> If application doesn't know about max output size, then it shou= ld process it > >>>>>>>> as stateful op i.e. setup op > >>>>>>>>> with type =3D STATEFUL and attach a stream so that PMD can main= tain > >>>>>>>> relevant context to handle such > >>>>>>>>> condition. > >>>>>>>> [Fiona] There may be an alternative that's useful for Ahmed, whi= le still > >>>>>>>> respecting the stateless concept. > >>>>>>>> In Stateless case where a PMD reports OUT_OF_SPACE in decompress= ion > >>>>>>>> case > >>>>>>>> it could also return consumed=3D0, produced =3D x, where x>0. X = indicates the > >>>>>>>> amount of valid data which has > >>>>>>>> been written to the output buffer. It is not complete, but if a= n application > >>>>>>>> wants to search it it may be sufficient. > >>>>>>>> If the application still wants the data it must resubmit the who= le input with a > >>>>>>>> bigger output buffer, and > >>>>>>>> decompression will be repeated from the start, it > >>>>>>>> cannot expect to continue on as the PMD has not maintained stat= e, history > >>>>>>>> or data. > >>>>>>>> I don't think there would be any need to indicate this in capabi= lities, PMDs > >>>>>>>> which cannot provide this > >>>>>>>> functionality would always return produced=3Dconsumed=3D0, while= PMDs which > >>>>>>>> can could set produced > 0. > >>>>>>>> If this works for you both, we could consider a similar case for= compression. > >>>>>>>> > >>>>>>> [Shally] Sounds Fine to me. Though then in that case, consume sho= uld also be updated to > actual > >>>>>> consumed by PMD. > >>>>>>> Setting consumed =3D 0 with produced > 0 doesn't correlate. > >>>>>> [Ahmed]I like Fiona's suggestion, but I also do not like the impli= cation > >>>>>> of returning consumed =3D 0. At the same time returning consumed = =3D y > >>>>>> implies to the user that it can proceed from the middle. I prefer = the > >>>>>> consumed =3D 0 implementation, but I think a different return is n= eeded to > >>>>>> distinguish it from OUT_OF_SPACE that the use can recover from. Pe= rhaps > >>>>>> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also al= lows > >>>>>> future PMD implementations to provide recover-ability even in STAT= ELESS > >>>>>> mode if they so wish. In this model STATELESS or STATEFUL would be= a > >>>>>> hint for the PMD implementation to make optimizations for each cas= e, but > >>>>>> it does not force the PMD implementation to limit functionality if= it > >>>>>> can provide recover-ability. > >>>>> [Fiona] So you're suggesting the following: > >>>>> OUT_OF_SPACE - returned only on stateful operation. Not an error. O= p.produced > >>>>> can be used and next op in stream should continue on from op.co= nsumed+1. > >>>>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation. > >>>>> Error condition, no recovery possible. > >>>>> consumed=3Dproduced=3D0. Application must resubmit all input da= ta with > >>>>> a bigger output buffer. > >>>>> OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, so= me recovery possible. > >>>>> - consumed =3D 0, produced > 0. Application must resubmit all = input data with > >>>>> a bigger output buffer. However in decompression case, data= up to produced > >>>>> in dst buffer may be inspected/searched. Never happens in c= ompression > >>>>> case as output data would be meaningless. > >>>>> - consumed > 0, produced > 0. PMD has stored relevant state an= d history and so > >>>>> can convert to stateful, using op.produced and continuing f= rom consumed+1. > >>>>> I don't expect our PMDs to use this last case, but maybe this works= for others? > >>>>> I'm not convinced it's not just adding complexity. It sounds like a= version of stateful > >>>>> without a stream, and maybe less efficient? > >>>>> If so should it respect the FLUSH flag? Which would have been FULL = or FINAL in the op. > >>>>> Or treat it as FLUSH_NONE or SYNC? I don't know why an application = would not > >>>>> simply have submitted a STATEFUL request if this is the behaviour i= t wants? > >>>> [Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entire= ly > >>>> and replacing it with > >>>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation. > >>>> Error condition, no recovery possible. > >>>> - consumed=3D0 produced=3Damount of data produced. Applicatio= n must > >>>> resubmit all input data with > >>>> a bigger output buffer to process all of the op > >>>> OUT_OF_SPACE_RECOVERABLE - Normally returned on stateful operation.= Not > >>>> an error. Op.produced > >>>> can be used and next op in stream should continue on from op.cons= umed+1. > >>>> - consumed > 0, produced > 0. PMD has stored relevant state = and > >>>> history and so > >>>> can continue using op.produced and continuing from consum= ed+1. > >>>> > >>>> We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in ou= r > >>>> implementation either. > >>>> > >>>> Regardless of speculative future PMDs. The more important aspect of = this > >>>> for today is that the return status clearly determines > >>>> the meaning of "consumed". If it is RECOVERABLE then consumed is > >>>> meaningful. if it is TERMINATED then consumed in meaningless. > >>>> This way we take away the ambiguity of having OUT_OF_SPACE mean two > >>>> different user work flows. > >>>> > >>>> A speculative future PMD may be designed to return RECOVERABLE for > >>>> stateless ops that are attached to streams. > >>>> A future PMD may look to see if an op has a stream is attached and w= rite > >>>> out the state there and go into recoverable mode. > >>>> in essence this leaves the choice up to the implementation and allow= s > >>>> the PMD to take advantage of stateless optimizations > >>>> so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump > >>>> context as soon as it fully processes an op. It will only > >>>> write context out in cases where the op chokes. > >>>> This futuristic PMD should ignore the FLUSH since this STATELESS mod= e as > >>>> indicated by the user and optimize > >>> [Shally] IMO, it looks okay to have two separate return code TERMINAT= ED and RECOVERABLE with > >>> definition as you mentioned and seem doable. > >>> So then it mean all following conditions: > >>> a. stateless with flush =3D full/final, no stream pointer provided , = PMD can return TERMINATED i.e. > user > >>> has to start all over again, it's a failure (as in current definition= ) > >>> b. stateless with flush =3D full/final, stream pointer provided, here= it's up to PMD to return either > >>> TERMINATED or RECOVERABLE depending upon its ability (note if Recover= able, then PMD will > maintain > >>> states in stream pointer) > >>> c. stateful with flush =3D full / NO_SYNC, stream pointer always ther= e, PMD will > >>> TERMINATED/RECOVERABLE depending on STATEFUL_COMPRESSION/DECOMPRESSIO= N feature > flag > >>> enabled or not > >> [Fiona] I don't think the flush flag is relevant - it could be out of = space on any flush flag, and if out of > space > >> should ignore the flush flag. > >> Is there a need for TERMINATED? - I didn't think it would ever need to= be returned in stateful case. > >> Why the ref to feature flag? If a PMD doesn't support a feature I thi= nk it should fail the op - not with > >> out-of space, but unsupported or similar. Or it would fail on stream = creation. > >[Ahmed] Agreed with Fiona. The flush flag only matters on success. By > >definition the PMD should return OUT_OF_SPACE_RECOVERABLE in stateful > >mode when it runs out of space. > >@Shally If the user did not provide a stream, then the PMD should > >probably return TERMINATED every time. I am not sure we should make a > >"really smart" PMD which returns RECOVERABLE even if no stream pointer > >was given. In that case the PMD must give some ID back to the caller > >that the caller can use to "recover" the op. I am not sure how it would > >be implemented in the PMD and when does the PMD decide to retire streams > >belonging to dead ops that the caller decided not to "recover". > >> > >>> and one more exception case is: > >>> d. stateless with flush =3D full, no stream pointer provided, PMD can= return RECOVERABLE i.e. PMD > >>> internally maintained that state somehow and consumed & produced > 0,= so user can start > consumed+1 > >>> but there's restriction on user not to alter or change op until it is= fully processed?! > >> [Fiona] Why the need for this case? > >> There's always a restriction on user not to alter or change op until i= t is fully processed. > >> If a PMD can do this - why doesn't it create a stream when that API is= called - and then it's same as b? > >[Ahmed] Agreed. The user should not touch an op once enqueued until they > >receive it in dequeue. We ignore the flush in stateless mode. We assume > >it to be final every time. >=20 > [Shally] Agreed and am not in favour of supporting such implementation ei= ther. Just listed out different > possibilities up here to better visualise Ahmed requirements/applicabilit= y of TERMINATED and > RECOVERABLE. >=20 > >> > >>> API currently takes care of case a and c, and case b can be supported= if specification accept another > >>> proposal which mention optional usage of stream with stateless. > >> [Fiona] API has this, but as we agreed, not optional to call the creat= e_stream() with an op_type > >> parameter (stateful/stateless). PMD can return NULL or provide a strea= m, if the latter then that > >> stream must be attached to ops. > >> > >> Until then API takes no difference to > >>> case b and c i.e. we can have op such as, > >>> - type=3D stateful with flush =3D full/final, stream pointer provided= , PMD can return > >>> TERMINATED/RECOVERABLE according to its ability > >>> > >>> Case d , is something exceptional, if there's requirement in PMDs to = support it, then believe it will be > >>> doable with concept of different return code. > >>> > >> [Fiona] That's not quite how I understood it. Can it be simpler and on= ly following cases? > >> a. stateless with flush =3D full/final, no stream pointer provided , P= MD can return TERMINATED i.e. user > >> has to start all over again, it's a failure (as in current definit= ion). > >> consumed =3D 0, produced=3Damount of data produced. This is usuall= y 0, but in decompression > >> case a PMD may return > 0 and application may find it useful to in= spect that data. > >> b. stateless with flush =3D full/final, stream pointer provided, here = it's up to PMD to return either > >> TERMINATED or RECOVERABLE depending upon its ability (note if Reco= verable, then PMD will > maintain > >> states in stream pointer) > >> c. stateful with flush =3D any, stream pointer always there, PMD will = return RECOVERABLE. > >> op.produced can be used and next op in stream should continue on f= rom op.consumed+1. > >> Consumed=3D0, produced=3D0 is an unusual but allowed case. I'm not= sure if it could ever happen, but > >> no need to change state to TERMINATED in this case. There may be u= seful state/history > >> stored in the PMD, even though no output produced yet. > >[Ahmed] Agreed > [Shally] Sounds good. >=20 > >> > >>>>>>>>>>> D.2 Compression API Stateful operation > >>>>>>>>>>> ---------------------------------------------------------- > >>>>>>>>>>> A Stateful operation in DPDK compression means application i= nvokes > >>>>>>>>>> enqueue burst() multiple times to process related chunk of dat= a either > >>>>>>>>>> because > >>>>>>>>>>> - Application broke data into several ops, and/or > >>>>>>>>>>> - PMD ran into out_of_space situation during input processing > >>>>>>>>>>> > >>>>>>>>>>> In case of either one or all of the above conditions, PMD is = required to > >>>>>>>>>> maintain state of op across enque_burst() calls and > >>>>>>>>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin wi= th > >>>>>>>>>> flush value =3D RTE_COMP_NO/SYNC_FLUSH and end at flush value > >>>>>>>>>> RTE_COMP_FULL/FINAL_FLUSH. > >>>>>>>>>>> D.2.1 Stateful operation state maintenance > >>>>>>>>>>> -------------------------------------------------------------= -- > >>>>>>>>>>> It is always an ideal expectation from application that it sh= ould parse > >>>>>>>>>> through all related chunk of source data making its mbuf-chain= and > >>>>>>>> enqueue > >>>>>>>>>> it for stateless processing. > >>>>>>>>>>> However, if it need to break it into several enqueue_burst() = calls, then > >>>>>>>> an > >>>>>>>>>> expected call flow would be something like: > >>>>>>>>>>> enqueue_burst( |op.no_flush |) > >>>>>>>>>> [Ahmed] The work is now in flight to the PMD.The user will cal= l dequeue > >>>>>>>>>> burst in a loop until all ops are received. Is this correct? > >>>>>>>>>> > >>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next > >>>>>>>>> [Shally] Yes. Ideally every submitted op need to be dequeued. H= owever > >>>>>>>> this illustration is specifically in > >>>>>>>>> context of stateful op processing to reflect if a stream is bro= ken into > >>>>>>>> chunks, then each chunk should be > >>>>>>>>> submitted as one op at-a-time with type =3D STATEFUL and need t= o be > >>>>>>>> dequeued first before next chunk is > >>>>>>>>> enqueued. > >>>>>>>>> > >>>>>>>>>>> enqueue_burst( |op.no_flush |) > >>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next > >>>>>>>>>>> enqueue_burst( |op.full_flush |) > >>>>>>>>>> [Ahmed] Why now allow multiple work items in flight? I underst= and that > >>>>>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we just > >>>>>>>> distinguish > >>>>>>>>>> the response in exception cases? > >>>>>>>>> [Shally] Multiples ops are allowed in flight, however condition= is each op in > >>>>>>>> such case is independent of > >>>>>>>>> each other i.e. belong to different streams altogether. > >>>>>>>>> Earlier (as part of RFC v1 doc) we did consider the proposal to= process all > >>>>>>>> related chunks of data in single > >>>>>>>>> burst by passing them as ops array but later found that as not-= so-useful for > >>>>>>>> PMD handling for various > >>>>>>>>> reasons. You may please refer to RFC v1 doc review comments for= same. > >>>>>>>> [Fiona] Agree with Shally. In summary, as only one op can be pro= cessed at a > >>>>>>>> time, since each needs the > >>>>>>>> state of the previous, to allow more than 1 op to be in-flight a= t a time would > >>>>>>>> force PMDs to implement internal queueing and exception handling= for > >>>>>>>> OUT_OF_SPACE conditions you mention. > >>>>>> [Ahmed] But we are putting the ops on qps which would make them > >>>>>> sequential. Handling OUT_OF_SPACE conditions would be a little bit= more > >>>>>> complex but doable. > >>>>> [Fiona] In my opinion this is not doable, could be very inefficient= . > >>>>> There may be many streams. > >>>>> The PMD would have to have an internal queue per stream so > >>>>> it could adjust the next src offset and length in the OUT_OF_SPACE = case. > >>>>> And this may ripple back though all subsequent ops in the stream as= each > >>>>> source len is increased and its dst buffer is not big enough. > >>>> [Ahmed] Regarding multi op OUT_OF_SPACE handling. > >>>> The caller would still need to adjust > >>>> the src length/output buffer as you say. The PMD cannot handle > >>>> OUT_OF_SPACE internally. > >>>> After OUT_OF_SPACE occurs, the PMD should reject all ops in this str= eam > >>>> until it gets explicit > >>>> confirmation from the caller to continue working on this stream. Any= ops > >>>> received by > >>>> the PMD should be returned to the caller with status STREAM_PAUSED s= ince > >>>> the caller did not > >>>> explicitly acknowledge that it has solved the OUT_OF_SPACE issue. > >>>> These semantics can be enabled by adding a new function to the API > >>>> perhaps stream_resume(). > >>>> This allows the caller to indicate that it acknowledges that it has = seen > >>>> the issue and this op > >>>> should be used to resolve the issue. Implementations that do not sup= port > >>>> this mode of use > >>>> can push back immediately after one op is in flight. Implementations > >>>> that support this use > >>>> mode can allow many ops from the same session > >>>> > >>> [Shally] Is it still in context of having single burst where all op b= elongs to one stream? If yes, I would > still > >>> say it would add an overhead to PMDs especially if it is expected to = work closer to HW (which I think > is > >>> the case with DPDK PMD). > >>> Though your approach is doable but why this all cannot be in a layer = above PMD? i.e. a layer above > PMD > >>> can either pass one-op at a time with burst size =3D 1 OR can make ch= ained mbuf of input and output > and > >>> pass than as one op. > >>> Is it just to ease applications of chained mbuf burden or do you see = any performance /use-case > >>> impacting aspect also? > >>> > >>> if it is in context where each op belong to different stream in a bur= st, then why do we need > >>> stream_pause and resume? It is a expectations from app to pass more o= utput buffer with consumed > + 1 > >>> from next call onwards as it has already > >>> seen OUT_OF_SPACE. > >[Ahmed] Yes, this would add extra overhead to the PMD. Our PMD > >implementation rejects all ops that belong to a stream that has entered > >"RECOVERABLE" state for one reason or another. The caller must > >acknowledge explicitly that it has received news of the problem before > >the PMD allows this stream to exit "RECOVERABLE" state. I agree with you > >that implementing this functionality in the software layer above the PMD > >is a bad idea since the latency reductions are lost. >=20 > [Shally] Just reiterating, I rather meant other way around i.e. I see it = easier to put all such complexity in a > layer above PMD. >=20 > >This setup is useful in latency sensitive applications where the latency > >of buffering multiple ops into one op is significant. We found latency > >makes a significant difference in search applications where the PMD > >competes with software decompression. [Fiona] I see, so when all goes well, you get best-case latency, but when=20 out-of-space occurs latency will probably be worse. > >> [Fiona] I still have concerns with this and would not want to support = in our PMD. > >> TO make sure I understand, you want to send a burst of ops, with sever= al from same stream. > >> If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not proces= s any > >> subsequent ops in that stream. > >> Should it return them in a dequeue_burst() with status still NOT_PROCE= SSED? > >> Or somehow drop them? How? > >> While still processing ops form other streams. > >[Ahmed] This is exactly correct. It should return them with > >NOT_PROCESSED. Yes, the PMD should continue processing other streams. > >> As we want to offload each op to hardware with as little CPU processin= g as possible we > >> would not want to open up each op to see which stream it's attached to= and > >> make decisions to do per-stream storage, or drop it, or bypass hw and = dequeue without processing. > >[Ahmed] I think I might have missed your point here, but I will try to > >answer. There is no need to "cushion" ops in DPDK. DPDK should send ops > >to the PMD and the PMD should reject until stream_continue() is called. > >The next op to be sent by the user will have a special marker in it to > >inform the PMD to continue working on this stream. Alternatively the > >DPDK layer can be made "smarter" to fail during the enqueue by checking > >the stream and its state, but like you say this adds additional CPU > >overhead during the enqueue. > >I am curious. In a simple synchronous use case. How do we prevent users > >from putting multiple ops in flight that belong to a single stream? Do > >we just currently say it is undefined behavior? Otherwise we would have > >to check the stream and incur the CPU overhead. [Fiona] We don't do anything to prevent it. It's undefined. IMO on data pat= h in DPDK model we expect good behaviour and don't have to error check for thing= s like this. In our PMD if we got a burst of 20 ops, we allocate 20 spaces on the hw q, = then=20 build and send those messages. If we found an op from a stream which alread= y had one inflight, we'd have to hold that back, store in a sw stream-specifi= c holding queue, only send 19 to hw. We cannot send multiple ops from same stream to the hw as it fans them out and does them in parallel. Once the enqueue_burst() returns, there is no processing=20 context which would spot that the first has completed and send the next op to the hw. On a dequeue_burst() we would spot this,=20 in that context could process the next op in the stream. On out of space, instead of processing the next op we would have to transfe= r all unprocessed ops from the stream to the dequeue result. Some parts of this are doable, but seems likely to add a lot more latency,= =20 we'd need to add extra threads and timers to move ops from the sw queue to the hw q to get any benefit, and these constructs would add=20 context switching and CPU cycles. So we prefer to push this responsibility to above the API and it can achieve similar. =20 > >> > >> Maybe we could add a capability if this behaviour is important for you= ? > >> e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ? > >> Our PMD would set this to 0. And expect no more than one op from a sta= teful stream > >> to be in flight at any time. > >[Ahmed] That makes sense. This way the different DPDK implementations do > >not have to add extra checking for unsupported cases. >=20 > [Shally] @ahmed, If I summarise your use-case, this is how to want to PMD= to support? > - a burst *carry only one stream* and all ops then assumed to be belong t= o that stream? (please note, > here burst is not carrying more than one stream) > -PMD will submit one op at a time to HW? > -if processed successfully, push it back to completion queue with status = =3D SUCCESS. If failed or run to > into OUT_OF_SPACE, then push it to completion queue with status =3D FAILU= RE/ > OUT_OF_SPACE_RECOVERABLE and rest with status =3D NOT_PROCESSED and retur= n with enqueue count > =3D total # of ops submitted originally with burst? > -app assumes all have been enqueued, so it go and dequeue all ops > -on seeing an op with OUT_OF_SPACE_RECOVERABLE, app resubmit a burst of o= ps with call to > stream_continue/resume API starting from op which encountered OUT_OF_SPAC= E and others as > NOT_PROCESSED with updated input and output buffer? > -repeat until *all* are dequeued with status =3D SUCCESS or *any* with st= atus =3D FAILURE? If anytime > failure is seen, then app start whole processing all over again or just d= rop this burst?! >=20 > If all of above is true, then I think we should add another API such as r= te_comp_enque_single_stream() > which will be functional under Feature Flag =3D ALLOW_ENQUEUE_MULTIPLE_ST= ATEFUL_OPS or better > name is SUPPORT_ENQUEUE_SINGLE_STREAM?! [Fiona] Am curious about Ahmed's response to this. I didn't get that a burs= t should carry only one stream Or get how this makes a difference? As there can be many enqueue_burst() ca= lls done before an dequeue_burst()=20 Maybe you're thinking the enqueue_burst() would be a blocking call that wou= ld not return until all the ops had been processed? This would turn it into a synchronous call which isn't = the intent. >=20 >=20 > >> > >> > >>>> Regarding the ordering of ops > >>>> We do force serialization of ops belonging to a stream in STATEFUL > >>>> operation. Related ops do > >>>> not go out of order and are given to available PMDs one at a time. > >>>> > >>>>>> The question is this mode of use useful for real > >>>>>> life applications or would we be just adding complexity? The techn= ical > >>>>>> advantage of this is that processing of Stateful ops is interdepen= dent > >>>>>> and PMDs can take advantage of caching and other optimizations to = make > >>>>>> processing related ops much faster than switching on every op. PMD= s have > >>>>>> maintain state of more than 32 KB for DEFLATE for every stream. > >>>>>>>> If the application has all the data, it can put it into chained = mbufs in a single > >>>>>>>> op rather than > >>>>>>>> multiple ops, which avoids pushing all that complexity down to t= he PMDs. > >>>>>> [Ahmed] I think that your suggested scheme of putting all related = mbufs > >>>>>> into one op may be the best solution without the extra complexity = of > >>>>>> handling OUT_OF_SPACE cases, while still allowing the enqueuer ext= ra > >>>>>> time If we have a way of marking mbufs as ready for consumption. T= he > >>>>>> enqueuer may not have all the data at hand but can enqueue the op = with a > >>>>>> couple of empty mbus marked as not ready for consumption. The enqu= euer > >>>>>> will then update the rest of the mbufs to ready for consumption on= ce the > >>>>>> data is added. This introduces a race condition. A second flag for= each > >>>>>> mbuf can be updated by the PMD to indicate that it processed it or= not. > >>>>>> This way in cases where the PMD beat the application to the op, th= e > >>>>>> application will just update the op to point to the first unproces= sed > >>>>>> mbuf and resend it to the PMD. > >>>>> [Fiona] This doesn't sound safe. You want to add data to a stream a= fter you've > >>>>> enqueued the op. You would have to write to op.src.length at a time= when the PMD > >>>>> might be reading it. Sounds like a lock would be necessary. > >>>>> Once the op has been enqueued, my understanding is its ownership is= handed > >>>>> over to the PMD and the application should not touch it until it ha= s been dequeued. > >>>>> I don't think it's a good idea to change this model. > >>>>> Can't the application just collect a stream of data in chained mbuf= s until it has > >>>>> enough to send an op, then construct the op and while waiting for t= hat op to > >>>>> complete, accumulate the next batch of chained mbufs? Only construc= t the next op > >>>>> after the previous one is complete, based on the result of the prev= ious one. > >>>>> > >>>> [Ahmed] Fair enough. I agree with you. I imagined it in a different = way > >>>> in which each mbuf would have its own length. > >>>> The advantage to gain is in applications where there is one PMD user= , > >>>> the down time between ops can be significant and setting up a single > >>>> producer consumer pair significantly reduces the CPU cycles and PMD = down > >>>> time. > >>>> > >>>> ////snip//// > >