From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM02-CY1-obe.outbound.protection.outlook.com (mail-cys01nam02on0078.outbound.protection.outlook.com [104.47.37.78]) by dpdk.org (Postfix) with ESMTP id 859661B8DD for ; Thu, 1 Feb 2018 06:40:51 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=dFUcg5mqv9NNzfwQR75debrTtJa2uwHBOBkdyFVjBFQ=; b=NkKrKTNTcRtiN3c0RU+q7QWnJf/cWS9etzogAowlaU3oerjlAE0pLrlQq95r+QCE8OYcfp2Kijq61W3aHQW6TdUn1+4nhSmOTYy8YBUKjpW76AlKsc1+2q/NZDCdOxKuw8mKQ+UiRF4iyDJdnomE6m0m8xveaPr7cT4PD4oLubc= Received: from CY4PR0701MB3634.namprd07.prod.outlook.com (52.132.101.164) by CY4PR0701MB3811.namprd07.prod.outlook.com (52.132.102.157) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.444.14; Thu, 1 Feb 2018 05:40:50 +0000 Received: from CY4PR0701MB3634.namprd07.prod.outlook.com ([fe80::a90e:9fcd:9ebd:8cad]) by CY4PR0701MB3634.namprd07.prod.outlook.com ([fe80::a90e:9fcd:9ebd:8cad%13]) with mapi id 15.20.0444.016; Thu, 1 Feb 2018 05:40:48 +0000 From: "Verma, Shally" To: "Trahe, Fiona" , Ahmed Mansour , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHgVaFBAAABkgmxA= Date: Thu, 1 Feb 2018 05:40:48 +0000 Message-ID: References: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> In-Reply-To: <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Shally.Verma@cavium.com; x-originating-ip: [115.113.156.2] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; CY4PR0701MB3811; 7:zzGFl04xsXaFlLGDd/F+NSwc67C1IRrW+HnDv3iDjlF7rpnF/PTZeO8+7Eabe3z6AiqzoxWcx2T8aX64dHp5zCPqCsNnFXY2YeSO2cusaXTNFhXLGxXQ+1PJrozPT8Abp0uUieZ4IJr8xBsCwZmm9qezDeB2sz4GunBjUiAZRfEiboEKwdrR0n8J7S9YP9PQ23Ohu9F7/QQXXHesC5HI/K+ED0AVeIf6SY6qdUOPGIVKDWX020Mb/dLDuPTK9Q4J x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(346002)(39380400002)(396003)(366004)(376002)(39860400002)(53474002)(57704003)(13464003)(199004)(51444003)(189003)(102836004)(3846002)(105586002)(186003)(106356001)(68736007)(59450400001)(76176011)(81156014)(8936002)(55236004)(54906003)(33656002)(8676002)(3280700002)(7696005)(110136005)(99286004)(81166006)(5660300001)(8656006)(2950100002)(3660700001)(6506007)(93886005)(26005)(316002)(72206003)(478600001)(2501003)(561944003)(66066001)(14454004)(2900100001)(2906002)(4326008)(229853002)(74316002)(55016002)(97736004)(53936002)(305945005)(6436002)(9686003)(86362001)(6116002)(5250100002)(7736002)(6246003)(5890100001)(25786009); DIR:OUT; SFP:1101; SCL:1; SRVR:CY4PR0701MB3811; H:CY4PR0701MB3634.namprd07.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; x-ms-office365-filtering-correlation-id: 1f26bcef-6942-4d9f-9aa2-08d5693658df x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(4534165)(7168020)(4627221)(201703031133081)(201702281549075)(5600026)(4604075)(3008032)(2017052603307)(7153060)(7193020); SRVR:CY4PR0701MB3811; x-ms-traffictypediagnostic: CY4PR0701MB3811: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(60795455431006)(20558992708506)(278428928389397)(185117386973197)(211171220733660)(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040501)(2401047)(8121501046)(5005006)(93006095)(93001095)(3231101)(2400082)(944501161)(3002001)(10201501046)(6041288)(20161123558120)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123562045)(6072148)(201708071742011); SRVR:CY4PR0701MB3811; BCL:0; PCL:0; RULEID:; SRVR:CY4PR0701MB3811; x-forefront-prvs: 0570F1F193 received-spf: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: bIjxg9PdBf4bpovnqrTMX8L3PdsCYuXt1Z8cWa3zVT0fU717Dmoxr+0BnA/9rlY6dhIPjlf7qMT8C1ekKSyLaA== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-Network-Message-Id: 1f26bcef-6942-4d9f-9aa2-08d5693658df X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Feb 2018 05:40:48.1742 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR0701MB3811 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Feb 2018 05:40:52 -0000 >-----Original Message----- >From: Trahe, Fiona [mailto:fiona.trahe@intel.com] >Sent: 01 February 2018 00:33 >To: Ahmed Mansour ; Verma, Shally ; dev@dpdk.org >Cc: Athreya, Narayana Prasad ; Gupta, A= shish ; Sahu, Sunila >; De Lara Guarch, Pablo ; Challa, Mahipal >; Jain, Deepak K ; Hem= ant Agrawal ; Roy >Pledge ; Youri Querry ; Trahe,= Fiona >Subject: RE: [RFC v2] doc compression API for DPDK > >Hi Ahmed, Shally, > >///snip/// >> >>>>> D.1.1 Stateless and OUT_OF_SPACE >> >>>>> ------------------------------------------------ >> >>>>> OUT_OF_SPACE is a condition when output buffer runs out of space >> >> and >> >>>> where PMD still has more data to produce. If PMD run into such >> >> condition, >> >>>> then it's an error condition in stateless processing. >> >>>>> In such case, PMD resets itself and return with status >> >>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=3Dconsumed=3D0 >> >> i.e. >> >>>> no input read, no output written. >> >>>>> Application can resubmit an full input with larger output buffer s= ize. >> >>>> [Ahmed] Can we add an option to allow the user to read the data tha= t >> >> was >> >>>> produced while still reporting OUT_OF_SPACE? this is mainly useful = for >> >>>> decompression applications doing search. >> >>> [Shally] It is there but applicable for stateful operation type (ple= ase refer to >> >> handling out_of_space under >> >>> "Stateful Section"). >> >>> By definition, "stateless" here means that application (such as IPCO= MP) >> >> knows maximum output size >> >>> guaranteedly and ensure that uncompressed data size cannot grow more >> >> than provided output buffer. >> >>> Such apps can submit an op with type =3D STATELESS and provide full = input, >> >> then PMD assume it has >> >>> sufficient input and output and thus doesn't need to maintain any co= ntexts >> >> after op is processed. >> >>> If application doesn't know about max output size, then it should pr= ocess it >> >> as stateful op i.e. setup op >> >>> with type =3D STATEFUL and attach a stream so that PMD can maintain >> >> relevant context to handle such >> >>> condition. >> >> [Fiona] There may be an alternative that's useful for Ahmed, while st= ill >> >> respecting the stateless concept. >> >> In Stateless case where a PMD reports OUT_OF_SPACE in decompression >> >> case >> >> it could also return consumed=3D0, produced =3D x, where x>0. X indic= ates the >> >> amount of valid data which has >> >> been written to the output buffer. It is not complete, but if an app= lication >> >> wants to search it it may be sufficient. >> >> If the application still wants the data it must resubmit the whole in= put with a >> >> bigger output buffer, and >> >> decompression will be repeated from the start, it >> >> cannot expect to continue on as the PMD has not maintained state, hi= story >> >> or data. >> >> I don't think there would be any need to indicate this in capabilitie= s, PMDs >> >> which cannot provide this >> >> functionality would always return produced=3Dconsumed=3D0, while PMDs= which >> >> can could set produced > 0. >> >> If this works for you both, we could consider a similar case for comp= ression. >> >> >> > [Shally] Sounds Fine to me. Though then in that case, consume should a= lso be updated to actual >> consumed by PMD. >> > Setting consumed =3D 0 with produced > 0 doesn't correlate. >> [Ahmed]I like Fiona's suggestion, but I also do not like the implication >> of returning consumed =3D 0. At the same time returning consumed =3D y >> implies to the user that it can proceed from the middle. I prefer the >> consumed =3D 0 implementation, but I think a different return is needed = to >> distinguish it from OUT_OF_SPACE that the use can recover from. Perhaps >> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allows >> future PMD implementations to provide recover-ability even in STATELESS >> mode if they so wish. In this model STATELESS or STATEFUL would be a >> hint for the PMD implementation to make optimizations for each case, but >> it does not force the PMD implementation to limit functionality if it >> can provide recover-ability. >[Fiona] So you're suggesting the following: >OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.produ= ced > can be used and next op in stream should continue on from op.consumed+= 1. >OUT_OF_SPACE_TERMINATED - returned only on stateless operation. > Error condition, no recovery possible. > consumed=3Dproduced=3D0. Application must resubmit all input data with > a bigger output buffer. >OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some reco= very possible. > - consumed =3D 0, produced > 0. Application must resubmit all input d= ata with > a bigger output buffer. However in decompression case, data up to = produced > in dst buffer may be inspected/searched. Never happens in compress= ion > case as output data would be meaningless. > - consumed > 0, produced > 0. PMD has stored relevant state and histo= ry and so > can convert to stateful, using op.produced and continuing from con= sumed+1. >I don't expect our PMDs to use this last case, but maybe this works for ot= hers? >I'm not convinced it's not just adding complexity. It sounds like a versio= n of stateful >without a stream, and maybe less efficient? >If so should it respect the FLUSH flag? Which would have been FULL or FINA= L in the op. >Or treat it as FLUSH_NONE or SYNC? I don't know why an application would n= ot >simply have submitted a STATEFUL request if this is the behaviour it wants= ? > > > > >> > >> >>>>> D.2 Compression API Stateful operation >> >>>>> ---------------------------------------------------------- >> >>>>> A Stateful operation in DPDK compression means application invoke= s >> >>>> enqueue burst() multiple times to process related chunk of data eit= her >> >>>> because >> >>>>> - Application broke data into several ops, and/or >> >>>>> - PMD ran into out_of_space situation during input processing >> >>>>> >> >>>>> In case of either one or all of the above conditions, PMD is requi= red to >> >>>> maintain state of op across enque_burst() calls and >> >>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with >> >>>> flush value =3D RTE_COMP_NO/SYNC_FLUSH and end at flush value >> >>>> RTE_COMP_FULL/FINAL_FLUSH. >> >>>>> D.2.1 Stateful operation state maintenance >> >>>>> --------------------------------------------------------------- >> >>>>> It is always an ideal expectation from application that it should = parse >> >>>> through all related chunk of source data making its mbuf-chain and >> >> enqueue >> >>>> it for stateless processing. >> >>>>> However, if it need to break it into several enqueue_burst() calls= , then >> >> an >> >>>> expected call flow would be something like: >> >>>>> enqueue_burst( |op.no_flush |) >> >>>> [Ahmed] The work is now in flight to the PMD.The user will call deq= ueue >> >>>> burst in a loop until all ops are received. Is this correct? >> >>>> >> >>>>> deque_burst(op) // should dequeue before we enqueue next >> >>> [Shally] Yes. Ideally every submitted op need to be dequeued. Howeve= r >> >> this illustration is specifically in >> >>> context of stateful op processing to reflect if a stream is broken i= nto >> >> chunks, then each chunk should be >> >>> submitted as one op at-a-time with type =3D STATEFUL and need to be >> >> dequeued first before next chunk is >> >>> enqueued. >> >>> >> >>>>> enqueue_burst( |op.no_flush |) >> >>>>> deque_burst(op) // should dequeue before we enqueue next >> >>>>> enqueue_burst( |op.full_flush |) >> >>>> [Ahmed] Why now allow multiple work items in flight? I understand t= hat >> >>>> occasionaly there will be OUT_OF_SPACE exception. Can we just >> >> distinguish >> >>>> the response in exception cases? >> >>> [Shally] Multiples ops are allowed in flight, however condition is e= ach op in >> >> such case is independent of >> >>> each other i.e. belong to different streams altogether. >> >>> Earlier (as part of RFC v1 doc) we did consider the proposal to proc= ess all >> >> related chunks of data in single >> >>> burst by passing them as ops array but later found that as not-so-us= eful for >> >> PMD handling for various >> >>> reasons. You may please refer to RFC v1 doc review comments for same= . >> >> [Fiona] Agree with Shally. In summary, as only one op can be processe= d at a >> >> time, since each needs the >> >> state of the previous, to allow more than 1 op to be in-flight at a t= ime would >> >> force PMDs to implement internal queueing and exception handling for >> >> OUT_OF_SPACE conditions you mention. >> [Ahmed] But we are putting the ops on qps which would make them >> sequential. Handling OUT_OF_SPACE conditions would be a little bit more >> complex but doable. >[Fiona] In my opinion this is not doable, could be very inefficient. >There may be many streams. >The PMD would have to have an internal queue per stream so >it could adjust the next src offset and length in the OUT_OF_SPACE case. >And this may ripple back though all subsequent ops in the stream as each >source len is increased and its dst buffer is not big enough. > >> The question is this mode of use useful for real >> life applications or would we be just adding complexity? The technical >> advantage of this is that processing of Stateful ops is interdependent >> and PMDs can take advantage of caching and other optimizations to make >> processing related ops much faster than switching on every op. PMDs have >> maintain state of more than 32 KB for DEFLATE for every stream. >> >> If the application has all the data, it can put it into chained mbufs= in a single >> >> op rather than >> >> multiple ops, which avoids pushing all that complexity down to the PM= Ds. >> [Ahmed] I think that your suggested scheme of putting all related mbufs >> into one op may be the best solution without the extra complexity of >> handling OUT_OF_SPACE cases, while still allowing the enqueuer extra >> time If we have a way of marking mbufs as ready for consumption. The >> enqueuer may not have all the data at hand but can enqueue the op with a >> couple of empty mbus marked as not ready for consumption. The enqueuer >> will then update the rest of the mbufs to ready for consumption once the >> data is added. This introduces a race condition. A second flag for each >> mbuf can be updated by the PMD to indicate that it processed it or not. >> This way in cases where the PMD beat the application to the op, the >> application will just update the op to point to the first unprocessed >> mbuf and resend it to the PMD. >[Fiona] This doesn't sound safe. You want to add data to a stream after yo= u've >enqueued the op. You would have to write to op.src.length at a time when t= he PMD >might be reading it. Sounds like a lock would be necessary. >Once the op has been enqueued, my understanding is its ownership is handed >over to the PMD and the application should not touch it until it has been = dequeued. >I don't think it's a good idea to change this model. >Can't the application just collect a stream of data in chained mbufs until= it has >enough to send an op, then construct the op and while waiting for that op = to >complete, accumulate the next batch of chained mbufs? Only construct the n= ext op >after the previous one is complete, based on the result of the previous on= e. > > >> >>>> [Ahmed] Related to OUT_OF_SPACE. What status does the user recieve >> >> in a >> >>>> decompression case when the end block is encountered before the end >> >> of >> >>>> the input? Does the PMD continue decomp? Does it stop there and >> >> return >> >>>> the stop index? >> >>>> >> >>> [Shally] Before I could answer this, please help me understand your = use >> >> case . When you say "when the >> >>> end block is encountered before the end of the input?" Do you mean - >> >>> "Decompressor process a final block (i.e. has BFINAL=3D1 in its head= er) and >> >> there's some footer data after >> >>> that?" Or >> >>> you mean "decompressor process one block and has more to process til= l its >> >> final block?" >> >>> What is "end block" and "end of input" reference here? >> [Ahmed] I meant BFINAL=3D1 by end block. The end of input is the end of >> the input length. >> e.g. >> | input >> length--------------------------------------------------------------| >> |--data----data----data------data-------BFINAL-footer-unrelated data| >> >>> >[Fiona] I propose if BFINAL bit is detected before end of input >the decompression should stop. In this case consumed will be < src.length. >produced will be < dst buffer size. Do we need an extra STATUS response? >STATUS_BFINAL_DETECTED ? [Shally] @fiona, I assume you mean here decompressor stop after processing = Final block right? And if yes, and if it can process that final block succe= ssfully/unsuccessfully, then status could simply be SUCCESS/FAILED. I don't see need of specific return code for this use case. Just to share, = in past, we have practically run into such cases with boost lib, and decomp= ressor has simply worked this way. >Only thing I don't like this is it can impact on performance, as normally >we can just look for STATUS =3D=3D SUCCESS. Anything else should be an exc= eption. >Now the application would have to check for SUCCESS || BFINAL_DETECTED eve= ry time. >Do you have a suggestion on how we should handle this? >