From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 5FAF11B81B for ; Wed, 31 Jan 2018 20:03:36 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Jan 2018 11:03:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,441,1511856000"; d="scan'208";a="14047122" Received: from irsmsx151.ger.corp.intel.com ([163.33.192.59]) by orsmga007.jf.intel.com with ESMTP; 31 Jan 2018 11:03:32 -0800 Received: from irsmsx101.ger.corp.intel.com ([169.254.1.188]) by IRSMSX151.ger.corp.intel.com ([169.254.4.16]) with mapi id 14.03.0319.002; Wed, 31 Jan 2018 19:03:07 +0000 From: "Trahe, Fiona" To: Ahmed Mansour , "Verma, Shally" , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry , "Trahe, Fiona" Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHgVaFBAA Date: Wed, 31 Jan 2018 19:03:06 +0000 Message-ID: <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> References: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYTk3NDBkMWQtNzM1My00YTc1LTk3MmItMDA0NjliMjg2N2VkIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IllxVjBncm5aTm83S3pOT1lSdjVpVldVSklUcTFMMmx2NytmRzVTMGlORGs9In0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Jan 2018 19:03:37 -0000 Hi Ahmed, Shally, ///snip/// > >>>>> D.1.1 Stateless and OUT_OF_SPACE > >>>>> ------------------------------------------------ > >>>>> OUT_OF_SPACE is a condition when output buffer runs out of space > >> and > >>>> where PMD still has more data to produce. If PMD run into such > >> condition, > >>>> then it's an error condition in stateless processing. > >>>>> In such case, PMD resets itself and return with status > >>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=3Dconsumed=3D0 > >> i.e. > >>>> no input read, no output written. > >>>>> Application can resubmit an full input with larger output buffer si= ze. > >>>> [Ahmed] Can we add an option to allow the user to read the data that > >> was > >>>> produced while still reporting OUT_OF_SPACE? this is mainly useful f= or > >>>> decompression applications doing search. > >>> [Shally] It is there but applicable for stateful operation type (plea= se refer to > >> handling out_of_space under > >>> "Stateful Section"). > >>> By definition, "stateless" here means that application (such as IPCOM= P) > >> knows maximum output size > >>> guaranteedly and ensure that uncompressed data size cannot grow more > >> than provided output buffer. > >>> Such apps can submit an op with type =3D STATELESS and provide full i= nput, > >> then PMD assume it has > >>> sufficient input and output and thus doesn't need to maintain any con= texts > >> after op is processed. > >>> If application doesn't know about max output size, then it should pro= cess it > >> as stateful op i.e. setup op > >>> with type =3D STATEFUL and attach a stream so that PMD can maintain > >> relevant context to handle such > >>> condition. > >> [Fiona] There may be an alternative that's useful for Ahmed, while sti= ll > >> respecting the stateless concept. > >> In Stateless case where a PMD reports OUT_OF_SPACE in decompression > >> case > >> it could also return consumed=3D0, produced =3D x, where x>0. X indica= tes the > >> amount of valid data which has > >> been written to the output buffer. It is not complete, but if an appl= ication > >> wants to search it it may be sufficient. > >> If the application still wants the data it must resubmit the whole inp= ut with a > >> bigger output buffer, and > >> decompression will be repeated from the start, it > >> cannot expect to continue on as the PMD has not maintained state, his= tory > >> or data. > >> I don't think there would be any need to indicate this in capabilities= , PMDs > >> which cannot provide this > >> functionality would always return produced=3Dconsumed=3D0, while PMDs = which > >> can could set produced > 0. > >> If this works for you both, we could consider a similar case for compr= ession. > >> > > [Shally] Sounds Fine to me. Though then in that case, consume should al= so be updated to actual > consumed by PMD. > > Setting consumed =3D 0 with produced > 0 doesn't correlate. > [Ahmed]I like Fiona's suggestion, but I also do not like the implication > of returning consumed =3D 0. At the same time returning consumed =3D y > implies to the user that it can proceed from the middle. I prefer the > consumed =3D 0 implementation, but I think a different return is needed t= o > distinguish it from OUT_OF_SPACE that the use can recover from. Perhaps > OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allows > future PMD implementations to provide recover-ability even in STATELESS > mode if they so wish. In this model STATELESS or STATEFUL would be a > hint for the PMD implementation to make optimizations for each case, but > it does not force the PMD implementation to limit functionality if it > can provide recover-ability. [Fiona] So you're suggesting the following: OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.produc= ed can be used and next op in stream should continue on from op.consumed+1= . OUT_OF_SPACE_TERMINATED - returned only on stateless operation.=20 Error condition, no recovery possible. consumed=3Dproduced=3D0. Application must resubmit all input data with a bigger output buffer. OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some recov= ery possible. - consumed =3D 0, produced > 0. Application must resubmit all input da= ta with a bigger output buffer. However in decompression case, data up to p= roduced=20 in dst buffer may be inspected/searched. Never happens in compressi= on=20 case as output data would be meaningless. - consumed > 0, produced > 0. PMD has stored relevant state and histor= y and so can convert to stateful, using op.produced and continuing from cons= umed+1.=20 I don't expect our PMDs to use this last case, but maybe this works for oth= ers? I'm not convinced it's not just adding complexity. It sounds like a version= of stateful=20 without a stream, and maybe less efficient? If so should it respect the FLUSH flag? Which would have been FULL or FINAL= in the op. Or treat it as FLUSH_NONE or SYNC? I don't know why an application would no= t simply have submitted a STATEFUL request if this is the behaviour it wants? =20 > > > >>>>> D.2 Compression API Stateful operation > >>>>> ---------------------------------------------------------- > >>>>> A Stateful operation in DPDK compression means application invokes > >>>> enqueue burst() multiple times to process related chunk of data eith= er > >>>> because > >>>>> - Application broke data into several ops, and/or > >>>>> - PMD ran into out_of_space situation during input processing > >>>>> > >>>>> In case of either one or all of the above conditions, PMD is requir= ed to > >>>> maintain state of op across enque_burst() calls and > >>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with > >>>> flush value =3D RTE_COMP_NO/SYNC_FLUSH and end at flush value > >>>> RTE_COMP_FULL/FINAL_FLUSH. > >>>>> D.2.1 Stateful operation state maintenance > >>>>> --------------------------------------------------------------- > >>>>> It is always an ideal expectation from application that it should p= arse > >>>> through all related chunk of source data making its mbuf-chain and > >> enqueue > >>>> it for stateless processing. > >>>>> However, if it need to break it into several enqueue_burst() calls,= then > >> an > >>>> expected call flow would be something like: > >>>>> enqueue_burst( |op.no_flush |) > >>>> [Ahmed] The work is now in flight to the PMD.The user will call dequ= eue > >>>> burst in a loop until all ops are received. Is this correct? > >>>> > >>>>> deque_burst(op) // should dequeue before we enqueue next > >>> [Shally] Yes. Ideally every submitted op need to be dequeued. However > >> this illustration is specifically in > >>> context of stateful op processing to reflect if a stream is broken in= to > >> chunks, then each chunk should be > >>> submitted as one op at-a-time with type =3D STATEFUL and need to be > >> dequeued first before next chunk is > >>> enqueued. > >>> > >>>>> enqueue_burst( |op.no_flush |) > >>>>> deque_burst(op) // should dequeue before we enqueue next > >>>>> enqueue_burst( |op.full_flush |) > >>>> [Ahmed] Why now allow multiple work items in flight? I understand th= at > >>>> occasionaly there will be OUT_OF_SPACE exception. Can we just > >> distinguish > >>>> the response in exception cases? > >>> [Shally] Multiples ops are allowed in flight, however condition is ea= ch op in > >> such case is independent of > >>> each other i.e. belong to different streams altogether. > >>> Earlier (as part of RFC v1 doc) we did consider the proposal to proce= ss all > >> related chunks of data in single > >>> burst by passing them as ops array but later found that as not-so-use= ful for > >> PMD handling for various > >>> reasons. You may please refer to RFC v1 doc review comments for same. > >> [Fiona] Agree with Shally. In summary, as only one op can be processed= at a > >> time, since each needs the > >> state of the previous, to allow more than 1 op to be in-flight at a ti= me would > >> force PMDs to implement internal queueing and exception handling for > >> OUT_OF_SPACE conditions you mention. > [Ahmed] But we are putting the ops on qps which would make them > sequential. Handling OUT_OF_SPACE conditions would be a little bit more > complex but doable.=20 [Fiona] In my opinion this is not doable, could be very inefficient. There may be many streams. The PMD would have to have an internal queue per stream so it could adjust the next src offset and length in the OUT_OF_SPACE case. And this may ripple back though all subsequent ops in the stream as each source len is increased and its dst buffer is not big enough. > The question is this mode of use useful for real > life applications or would we be just adding complexity? The technical > advantage of this is that processing of Stateful ops is interdependent > and PMDs can take advantage of caching and other optimizations to make > processing related ops much faster than switching on every op. PMDs have > maintain state of more than 32 KB for DEFLATE for every stream. > >> If the application has all the data, it can put it into chained mbufs = in a single > >> op rather than > >> multiple ops, which avoids pushing all that complexity down to the PMD= s. > [Ahmed] I think that your suggested scheme of putting all related mbufs > into one op may be the best solution without the extra complexity of > handling OUT_OF_SPACE cases, while still allowing the enqueuer extra > time If we have a way of marking mbufs as ready for consumption. The > enqueuer may not have all the data at hand but can enqueue the op with a > couple of empty mbus marked as not ready for consumption. The enqueuer > will then update the rest of the mbufs to ready for consumption once the > data is added. This introduces a race condition. A second flag for each > mbuf can be updated by the PMD to indicate that it processed it or not. > This way in cases where the PMD beat the application to the op, the > application will just update the op to point to the first unprocessed > mbuf and resend it to the PMD. [Fiona] This doesn't sound safe. You want to add data to a stream after you= 've enqueued the op. You would have to write to op.src.length at a time when th= e PMD might be reading it. Sounds like a lock would be necessary. Once the op has been enqueued, my understanding is its ownership is handed over to the PMD and the application should not touch it until it has been d= equeued. I don't think it's a good idea to change this model. Can't the application just collect a stream of data in chained mbufs until = it has enough to send an op, then construct the op and while waiting for that op t= o complete, accumulate the next batch of chained mbufs? Only construct the ne= xt op after the previous one is complete, based on the result of the previous one= . =20 > >>>> [Ahmed] Related to OUT_OF_SPACE. What status does the user recieve > >> in a > >>>> decompression case when the end block is encountered before the end > >> of > >>>> the input? Does the PMD continue decomp? Does it stop there and > >> return > >>>> the stop index? > >>>> > >>> [Shally] Before I could answer this, please help me understand your u= se > >> case . When you say "when the > >>> end block is encountered before the end of the input?" Do you mean - > >>> "Decompressor process a final block (i.e. has BFINAL=3D1 in its heade= r) and > >> there's some footer data after > >>> that?" Or > >>> you mean "decompressor process one block and has more to process till= its > >> final block?" > >>> What is "end block" and "end of input" reference here? > [Ahmed] I meant BFINAL=3D1 by end block. The end of input is the end of > the input length. > e.g. > | input > length--------------------------------------------------------------| > |--data----data----data------data-------BFINAL-footer-unrelated data| > >>> [Fiona] I propose if BFINAL bit is detected before end of input the decompression should stop. In this case consumed will be < src.length. produced will be < dst buffer size. Do we need an extra STATUS response? STATUS_BFINAL_DETECTED ? Only thing I don't like this is it can impact on performance, as normally=20 we can just look for STATUS =3D=3D SUCCESS. Anything else should be an exce= ption. Now the application would have to check for SUCCESS || BFINAL_DETECTED ever= y time. Do you have a suggestion on how we should handle this? =20