From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03on0045.outbound.protection.outlook.com [104.47.42.45]) by dpdk.org (Postfix) with ESMTP id 83FA01B311 for ; Wed, 14 Feb 2018 08:41:12 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=P06aTBGBTIEjptJXA0fNmdeoeP0F4VwQ1lz60bGRdOA=; b=Pj3JfLeUoZOb8GvJREaqVto5sfCaMJ1UhNR6/M6kkrR64uuNVe5jY8c6jRc1Nsnzuryt0x6lbD0df1c0j4UzLB3fxX6lLtmO54UdYFMNcjtQbgoIGst11AD597UUvy0BXvNNxguGB0d6O/fop/GAnihAjF/l7xbw/OUUEmDtYa8= Received: from CY4PR0701MB3634.namprd07.prod.outlook.com (52.132.101.164) by CY4PR0701MB3681.namprd07.prod.outlook.com (52.132.102.20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.485.10; Wed, 14 Feb 2018 07:41:10 +0000 Received: from CY4PR0701MB3634.namprd07.prod.outlook.com ([fe80::a90e:9fcd:9ebd:8cad]) by CY4PR0701MB3634.namprd07.prod.outlook.com ([fe80::a90e:9fcd:9ebd:8cad%13]) with mapi id 15.20.0485.017; Wed, 14 Feb 2018 07:41:09 +0000 From: "Verma, Shally" To: Ahmed Mansour , "Trahe, Fiona" , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHggCaH+Q Date: Wed, 14 Feb 2018 07:41:09 +0000 Message-ID: References: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Shally.Verma@cavium.com; x-originating-ip: [117.98.159.251] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; CY4PR0701MB3681; 7:Ke1cPQZqYngZRZ9DbFT/qGGaKrD7hnu1HgIsaecVKzZhtFQK0gr1RAcbi/88tcTd9M3B/+IG5MOur35OL5StvIsuIzoYWSN02Iy2VAuD4uZi7XW0jN7RXzM1GIOwY0+VpLkVOHWOiQqynqaDoIOSVA+TUs9lxUARvMxr+9iXiFDL/I0Y4O/QPYc6odaQ7lk2FkAE6rUCj/rHgeVi33KDGYs8YgB+X5MMIQr2UftICoxMGUrYfnvbF+vmvEi9ejEe x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(39380400002)(376002)(39860400002)(366004)(346002)(396003)(51444003)(189003)(199004)(13464003)(57704003)(68736007)(2501003)(5250100002)(8936002)(3280700002)(106356001)(5890100001)(3660700001)(81166006)(2900100001)(53946003)(81156014)(229853002)(25786009)(4326008)(8676002)(9686003)(6246003)(5660300001)(53936002)(86362001)(66066001)(2950100002)(55016002)(6436002)(8656006)(72206003)(105586002)(26005)(14454004)(93886005)(186003)(7696005)(110136005)(54906003)(6506007)(102836004)(76176011)(6116002)(316002)(2906002)(59450400001)(561944003)(3846002)(97736004)(305945005)(7736002)(478600001)(74316002)(99286004)(33656002); DIR:OUT; SFP:1101; SCL:1; SRVR:CY4PR0701MB3681; H:CY4PR0701MB3634.namprd07.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; x-ms-office365-filtering-correlation-id: 03be9952-19a3-453e-6151-08d5737e50b4 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(5600026)(4604075)(3008032)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603307)(7153060)(7193020); SRVR:CY4PR0701MB3681; x-ms-traffictypediagnostic: CY4PR0701MB3681: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(60795455431006)(20558992708506)(278428928389397)(185117386973197)(211171220733660)(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040501)(2401047)(5005006)(8121501046)(3002001)(93006095)(93001095)(10201501046)(3231101)(2400082)(944501161)(6041288)(20161123558120)(20161123562045)(20161123564045)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:CY4PR0701MB3681; BCL:0; PCL:0; RULEID:; SRVR:CY4PR0701MB3681; x-forefront-prvs: 0583A86C08 received-spf: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: vT1r8Q9IL+Ys/J85vxz4kvjyqF+hxQMdridGc74pMVRBCVFNMnJRUVKmipgkVT6XkMNPxIaxElsVJ5REFF3jJQ== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-Network-Message-Id: 03be9952-19a3-453e-6151-08d5737e50b4 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Feb 2018 07:41:09.7834 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR0701MB3681 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Feb 2018 07:41:13 -0000 Hi Ahmed, >-----Original Message----- >From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] >Sent: 02 February 2018 01:53 >To: Trahe, Fiona ; Verma, Shally ; dev@dpdk.org >Cc: Athreya, Narayana Prasad ; Gupta, A= shish ; Sahu, Sunila >; De Lara Guarch, Pablo ; Challa, Mahipal >; Jain, Deepak K ; Hem= ant Agrawal ; Roy >Pledge ; Youri Querry >Subject: Re: [RFC v2] doc compression API for DPDK > >On 1/31/2018 2:03 PM, Trahe, Fiona wrote: >> Hi Ahmed, Shally, >> >> ///snip/// >>>>>>>> D.1.1 Stateless and OUT_OF_SPACE >>>>>>>> ------------------------------------------------ >>>>>>>> OUT_OF_SPACE is a condition when output buffer runs out of space >>>>> and >>>>>>> where PMD still has more data to produce. If PMD run into such >>>>> condition, >>>>>>> then it's an error condition in stateless processing. >>>>>>>> In such case, PMD resets itself and return with status >>>>>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=3Dconsumed=3D0 >>>>> i.e. >>>>>>> no input read, no output written. >>>>>>>> Application can resubmit an full input with larger output buffer s= ize. >>>>>>> [Ahmed] Can we add an option to allow the user to read the data tha= t >>>>> was >>>>>>> produced while still reporting OUT_OF_SPACE? this is mainly useful = for >>>>>>> decompression applications doing search. >>>>>> [Shally] It is there but applicable for stateful operation type (ple= ase refer to >>>>> handling out_of_space under >>>>>> "Stateful Section"). >>>>>> By definition, "stateless" here means that application (such as IPCO= MP) >>>>> knows maximum output size >>>>>> guaranteedly and ensure that uncompressed data size cannot grow more >>>>> than provided output buffer. >>>>>> Such apps can submit an op with type =3D STATELESS and provide full = input, >>>>> then PMD assume it has >>>>>> sufficient input and output and thus doesn't need to maintain any co= ntexts >>>>> after op is processed. >>>>>> If application doesn't know about max output size, then it should pr= ocess it >>>>> as stateful op i.e. setup op >>>>>> with type =3D STATEFUL and attach a stream so that PMD can maintain >>>>> relevant context to handle such >>>>>> condition. >>>>> [Fiona] There may be an alternative that's useful for Ahmed, while st= ill >>>>> respecting the stateless concept. >>>>> In Stateless case where a PMD reports OUT_OF_SPACE in decompression >>>>> case >>>>> it could also return consumed=3D0, produced =3D x, where x>0. X indic= ates the >>>>> amount of valid data which has >>>>> been written to the output buffer. It is not complete, but if an app= lication >>>>> wants to search it it may be sufficient. >>>>> If the application still wants the data it must resubmit the whole in= put with a >>>>> bigger output buffer, and >>>>> decompression will be repeated from the start, it >>>>> cannot expect to continue on as the PMD has not maintained state, hi= story >>>>> or data. >>>>> I don't think there would be any need to indicate this in capabilitie= s, PMDs >>>>> which cannot provide this >>>>> functionality would always return produced=3Dconsumed=3D0, while PMDs= which >>>>> can could set produced > 0. >>>>> If this works for you both, we could consider a similar case for comp= ression. >>>>> >>>> [Shally] Sounds Fine to me. Though then in that case, consume should a= lso be updated to actual >>> consumed by PMD. >>>> Setting consumed =3D 0 with produced > 0 doesn't correlate. >>> [Ahmed]I like Fiona's suggestion, but I also do not like the implicatio= n >>> of returning consumed =3D 0. At the same time returning consumed =3D y >>> implies to the user that it can proceed from the middle. I prefer the >>> consumed =3D 0 implementation, but I think a different return is needed= to >>> distinguish it from OUT_OF_SPACE that the use can recover from. Perhaps >>> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allows >>> future PMD implementations to provide recover-ability even in STATELESS >>> mode if they so wish. In this model STATELESS or STATEFUL would be a >>> hint for the PMD implementation to make optimizations for each case, bu= t >>> it does not force the PMD implementation to limit functionality if it >>> can provide recover-ability. >> [Fiona] So you're suggesting the following: >> OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.pro= duced >> can be used and next op in stream should continue on from op.consume= d+1. >> OUT_OF_SPACE_TERMINATED - returned only on stateless operation. >> Error condition, no recovery possible. >> consumed=3Dproduced=3D0. Application must resubmit all input data wi= th >> a bigger output buffer. >> OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some re= covery possible. >> - consumed =3D 0, produced > 0. Application must resubmit all input= data with >> a bigger output buffer. However in decompression case, data up t= o produced >> in dst buffer may be inspected/searched. Never happens in compre= ssion >> case as output data would be meaningless. >> - consumed > 0, produced > 0. PMD has stored relevant state and his= tory and so >> can convert to stateful, using op.produced and continuing from c= onsumed+1. >> I don't expect our PMDs to use this last case, but maybe this works for = others? >> I'm not convinced it's not just adding complexity. It sounds like a vers= ion of stateful >> without a stream, and maybe less efficient? >> If so should it respect the FLUSH flag? Which would have been FULL or FI= NAL in the op. >> Or treat it as FLUSH_NONE or SYNC? I don't know why an application would= not >> simply have submitted a STATEFUL request if this is the behaviour it wan= ts? >[Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entirely >and replacing it with >OUT_OF_SPACE_TERMINATED - returned only on stateless operation. > Error condition, no recovery possible. > - consumed=3D0 produced=3Damount of data produced. Application mus= t >resubmit all input data with > a bigger output buffer to process all of the op >OUT_OF_SPACE_RECOVERABLE - Normally returned on stateful operation. Not >an error. Op.produced > can be used and next op in stream should continue on from op.consumed+= 1. > - consumed > 0, produced > 0. PMD has stored relevant state and >history and so > can continue using op.produced and continuing from consumed+1. > >We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in our >implementation either. > >Regardless of speculative future PMDs. The more important aspect of this >for today is that the return status clearly determines >the meaning of "consumed". If it is RECOVERABLE then consumed is >meaningful. if it is TERMINATED then consumed in meaningless. >This way we take away the ambiguity of having OUT_OF_SPACE mean two >different user work flows. > >A speculative future PMD may be designed to return RECOVERABLE for >stateless ops that are attached to streams. >A future PMD may look to see if an op has a stream is attached and write >out the state there and go into recoverable mode. >in essence this leaves the choice up to the implementation and allows >the PMD to take advantage of stateless optimizations >so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump >context as soon as it fully processes an op. It will only >write context out in cases where the op chokes. >This futuristic PMD should ignore the FLUSH since this STATELESS mode as >indicated by the user and optimize [Shally] IMO, it looks okay to have two separate return code TERMINATED and= RECOVERABLE with definition as you mentioned and seem doable. So then it mean all following conditions: a. stateless with flush =3D full/final, no stream pointer provided , PMD ca= n return TERMINATED i.e. user has to start all over again, it's a failure (= as in current definition) b. stateless with flush =3D full/final, stream pointer provided, here it's = up to PMD to return either TERMINATED or RECOVERABLE depending upon its abi= lity (note if Recoverable, then PMD will maintain states in stream pointer) c. stateful with flush =3D full / NO_SYNC, stream pointer always there, PMD= will TERMINATED/RECOVERABLE depending on STATEFUL_COMPRESSION/DECOMPRESSIO= N feature flag enabled or not and one more exception case is: d. stateless with flush =3D full, no stream pointer provided, PMD can retur= n RECOVERABLE i.e. PMD internally maintained that state somehow and consume= d & produced > 0, so user can start consumed+1 but there's restriction on u= ser not to alter or change op until it is fully processed?! =20 API currently takes care of case a and c, and case b can be supported if sp= ecification accept another proposal which mention optional usage of stream = with stateless. Until then API takes no difference to case b and c i.e. we = can have op such as, - type=3D stateful with flush =3D full/final, stream pointer provided, PMD = can return TERMINATED/RECOVERABLE according to its ability Case d , is something exceptional, if there's requirement in PMDs to suppor= t it, then believe it will be doable with concept of different return code. >>>>>>>> D.2 Compression API Stateful operation >>>>>>>> ---------------------------------------------------------- >>>>>>>> A Stateful operation in DPDK compression means application invoke= s >>>>>>> enqueue burst() multiple times to process related chunk of data eit= her >>>>>>> because >>>>>>>> - Application broke data into several ops, and/or >>>>>>>> - PMD ran into out_of_space situation during input processing >>>>>>>> >>>>>>>> In case of either one or all of the above conditions, PMD is requi= red to >>>>>>> maintain state of op across enque_burst() calls and >>>>>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with >>>>>>> flush value =3D RTE_COMP_NO/SYNC_FLUSH and end at flush value >>>>>>> RTE_COMP_FULL/FINAL_FLUSH. >>>>>>>> D.2.1 Stateful operation state maintenance >>>>>>>> --------------------------------------------------------------- >>>>>>>> It is always an ideal expectation from application that it should = parse >>>>>>> through all related chunk of source data making its mbuf-chain and >>>>> enqueue >>>>>>> it for stateless processing. >>>>>>>> However, if it need to break it into several enqueue_burst() calls= , then >>>>> an >>>>>>> expected call flow would be something like: >>>>>>>> enqueue_burst( |op.no_flush |) >>>>>>> [Ahmed] The work is now in flight to the PMD.The user will call deq= ueue >>>>>>> burst in a loop until all ops are received. Is this correct? >>>>>>> >>>>>>>> deque_burst(op) // should dequeue before we enqueue next >>>>>> [Shally] Yes. Ideally every submitted op need to be dequeued. Howeve= r >>>>> this illustration is specifically in >>>>>> context of stateful op processing to reflect if a stream is broken i= nto >>>>> chunks, then each chunk should be >>>>>> submitted as one op at-a-time with type =3D STATEFUL and need to be >>>>> dequeued first before next chunk is >>>>>> enqueued. >>>>>> >>>>>>>> enqueue_burst( |op.no_flush |) >>>>>>>> deque_burst(op) // should dequeue before we enqueue next >>>>>>>> enqueue_burst( |op.full_flush |) >>>>>>> [Ahmed] Why now allow multiple work items in flight? I understand t= hat >>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we just >>>>> distinguish >>>>>>> the response in exception cases? >>>>>> [Shally] Multiples ops are allowed in flight, however condition is e= ach op in >>>>> such case is independent of >>>>>> each other i.e. belong to different streams altogether. >>>>>> Earlier (as part of RFC v1 doc) we did consider the proposal to proc= ess all >>>>> related chunks of data in single >>>>>> burst by passing them as ops array but later found that as not-so-us= eful for >>>>> PMD handling for various >>>>>> reasons. You may please refer to RFC v1 doc review comments for same= . >>>>> [Fiona] Agree with Shally. In summary, as only one op can be processe= d at a >>>>> time, since each needs the >>>>> state of the previous, to allow more than 1 op to be in-flight at a t= ime would >>>>> force PMDs to implement internal queueing and exception handling for >>>>> OUT_OF_SPACE conditions you mention. >>> [Ahmed] But we are putting the ops on qps which would make them >>> sequential. Handling OUT_OF_SPACE conditions would be a little bit more >>> complex but doable. >> [Fiona] In my opinion this is not doable, could be very inefficient. >> There may be many streams. >> The PMD would have to have an internal queue per stream so >> it could adjust the next src offset and length in the OUT_OF_SPACE case. >> And this may ripple back though all subsequent ops in the stream as each >> source len is increased and its dst buffer is not big enough. >[Ahmed] Regarding multi op OUT_OF_SPACE handling. >The caller would still need to adjust >the src length/output buffer as you say. The PMD cannot handle >OUT_OF_SPACE internally. >After OUT_OF_SPACE occurs, the PMD should reject all ops in this stream >until it gets explicit >confirmation from the caller to continue working on this stream. Any ops >received by >the PMD should be returned to the caller with status STREAM_PAUSED since >the caller did not >explicitly acknowledge that it has solved the OUT_OF_SPACE issue. >These semantics can be enabled by adding a new function to the API >perhaps stream_resume(). >This allows the caller to indicate that it acknowledges that it has seen >the issue and this op >should be used to resolve the issue. Implementations that do not support >this mode of use >can push back immediately after one op is in flight. Implementations >that support this use >mode can allow many ops from the same session > [Shally] Is it still in context of having single burst where all op belongs= to one stream? If yes, I would still say it would add an overhead to PMDs = especially if it is expected to work closer to HW (which I think is the cas= e with DPDK PMD). Though your approach is doable but why this all cannot be in a layer above = PMD? i.e. a layer above PMD can either pass one-op at a time with burst siz= e =3D 1 OR can make chained mbuf of input and output and pass than as one o= p. Is it just to ease applications of chained mbuf burden or do you see any pe= rformance /use-case impacting aspect also? if it is in context where each op belong to different stream in a burst, th= en why do we need stream_pause and resume? It is a expectations from app to= pass more output buffer with consumed + 1 from next call onwards as it has= already seen OUT_OF_SPACE. >Regarding the ordering of ops >We do force serialization of ops belonging to a stream in STATEFUL >operation. Related ops do >not go out of order and are given to available PMDs one at a time. > >>> The question is this mode of use useful for real >>> life applications or would we be just adding complexity? The technical >>> advantage of this is that processing of Stateful ops is interdependent >>> and PMDs can take advantage of caching and other optimizations to make >>> processing related ops much faster than switching on every op. PMDs hav= e >>> maintain state of more than 32 KB for DEFLATE for every stream. >>>>> If the application has all the data, it can put it into chained mbufs= in a single >>>>> op rather than >>>>> multiple ops, which avoids pushing all that complexity down to the PM= Ds. >>> [Ahmed] I think that your suggested scheme of putting all related mbufs >>> into one op may be the best solution without the extra complexity of >>> handling OUT_OF_SPACE cases, while still allowing the enqueuer extra >>> time If we have a way of marking mbufs as ready for consumption. The >>> enqueuer may not have all the data at hand but can enqueue the op with = a >>> couple of empty mbus marked as not ready for consumption. The enqueuer >>> will then update the rest of the mbufs to ready for consumption once th= e >>> data is added. This introduces a race condition. A second flag for each >>> mbuf can be updated by the PMD to indicate that it processed it or not. >>> This way in cases where the PMD beat the application to the op, the >>> application will just update the op to point to the first unprocessed >>> mbuf and resend it to the PMD. >> [Fiona] This doesn't sound safe. You want to add data to a stream after = you've >> enqueued the op. You would have to write to op.src.length at a time when= the PMD >> might be reading it. Sounds like a lock would be necessary. >> Once the op has been enqueued, my understanding is its ownership is hand= ed >> over to the PMD and the application should not touch it until it has bee= n dequeued. >> I don't think it's a good idea to change this model. >> Can't the application just collect a stream of data in chained mbufs unt= il it has >> enough to send an op, then construct the op and while waiting for that o= p to >> complete, accumulate the next batch of chained mbufs? Only construct the= next op >> after the previous one is complete, based on the result of the previous = one. >> >[Ahmed] Fair enough. I agree with you. I imagined it in a different way >in which each mbuf would have its own length. >The advantage to gain is in applications where there is one PMD user, >the down time between ops can be significant and setting up a single >producer consumer pair significantly reduces the CPU cycles and PMD down >time. > >////snip////