From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id A9683726F for ; Thu, 11 Jan 2018 19:53:52 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jan 2018 10:53:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,345,1511856000"; d="scan'208";a="20869426" Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75]) by fmsmga001.fm.intel.com with ESMTP; 11 Jan 2018 10:53:48 -0800 Received: from irsmsx101.ger.corp.intel.com ([169.254.1.46]) by IRSMSX153.ger.corp.intel.com ([169.254.9.34]) with mapi id 14.03.0319.002; Thu, 11 Jan 2018 18:53:47 +0000 From: "Trahe, Fiona" To: "Verma, Shally" , Ahmed Mansour , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry , "Trahe, Fiona" Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHgEsOofgAEHAekA= Date: Thu, 11 Jan 2018 18:53:47 +0000 Message-ID: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> References: In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZTQ3YzE4YTUtZjJiOS00NWY2LTkzNmUtYzA0OTNjMDJiNDdkIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6InNmanlsWjRRVW5YMGJOVGFscUFNSVM2MFp5NXo4UDJXNVN5XC9qQ3JkR1M4PSJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jan 2018 18:53:53 -0000 Hi Shally, Ahmed, > -----Original Message----- > From: Verma, Shally [mailto:Shally.Verma@cavium.com] > Sent: Wednesday, January 10, 2018 12:55 PM > To: Ahmed Mansour ; Trahe, Fiona ; dev@dpdk.org > Cc: Athreya, Narayana Prasad ; Gupta, = Ashish > ; Sahu, Sunila ; De Lara= Guarch, Pablo > ; Challa, Mahipal ; Jain, Deepak K > ; Hemant Agrawal ; Roy P= ledge > ; Youri Querry > Subject: RE: [RFC v2] doc compression API for DPDK >=20 > HI Ahmed >=20 > > -----Original Message----- > > From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] > > Sent: 10 January 2018 00:38 > > To: Verma, Shally ; Trahe, Fiona > > ; dev@dpdk.org > > Cc: Athreya, Narayana Prasad ; > > Gupta, Ashish ; Sahu, Sunila > > ; De Lara Guarch, Pablo > > ; Challa, Mahipal > > ; Jain, Deepak K ; > > Hemant Agrawal ; Roy Pledge > > ; Youri Querry > > Subject: Re: [RFC v2] doc compression API for DPDK > > > > Hi Shally, > > > > Thanks for the summary. It is very helpful. Please see comments below > > > > > > On 1/4/2018 6:45 AM, Verma, Shally wrote: > > > This is an RFC v2 document to brief understanding and requirements on > > compression API proposal in DPDK. It is based on "[RFC v3] Compression = API > > in DPDK > > https://emea01.safelinks.protection.outlook.com/?url=3Dhttp%3A%2F%2Fdpd > > k.org%2Fdev%2Fpatchwork%2Fpatch%2F32331%2F&data=3D02%7C01%7Cahm > > ed.mansour%40nxp.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea > > 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636506631207323264&sdata=3DJF > > tOnJxajgXX7s3DMZ79K7VVM7TXO8lBd6rNeVlsHDg%3D&reserved=3D0 ". > > > Intention of this document is to align on concepts built into compres= sion > > API, its usage and identify further requirements. > > > > > > Going further it could be a base to Compression Module Programmer > > Guide. > > > > > > Current scope is limited to > > > - definition of the terminology which makes up foundation of compress= ion > > API > > > - typical API flow expected to use by applications > > > - Stateless and Stateful operation definition and usage after RFC v1 = doc > > review > > https://emea01.safelinks.protection.outlook.com/?url=3Dhttp%3A%2F%2Fdev= . > > dpdk.narkive.com%2FCHS5l01B%2Fdpdk-dev-rfc-v1-doc-compression-api- > > for- > > dpdk&data=3D02%7C01%7Cahmed.mansour%40nxp.com%7C80bd3270430c473 > > fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6 > > 36506631207323264&sdata=3DFy7xKIyxZX97i7vEM6NqgrvnqKrNrWOYLwIA5dEH > > QNQ%3D&reserved=3D0 > > > > > > 1. Overview > > > ~~~~~~~~~~~ > > > > > > A. Compression Methodologies in compression API > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > DPDK compression supports two types of compression methodologies: > > > - Stateless - each data object is compressed individually without any > > reference to previous data, > > > - Stateful - each data object is compressed with reference to previo= us data > > object i.e. history of data is needed for compression / decompression > > > For more explanation, please refer RFC > > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fw > > ww.ietf.org%2Frfc%2Frfc1951.txt&data=3D02%7C01%7Cahmed.mansour%40nx > > p.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd9 > > 9c5c301635%7C0%7C0%7C636506631207323264&sdata=3Dpfp2VX1w3UxH5YLcL > > 2R%2BvKXNeS7jP46CsASq0B1SETw%3D&reserved=3D0 > > > > > > To support both methodologies, DPDK compression introduces two key > > concepts: Session and Stream. > > > > > > B. Notion of a session in compression API > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > A Session in DPDK compression is a logical entity which is setup one-= time > > with immutable parameters i.e. parameters that don't change across > > operations and devices. > > > A session can be shared across multiple devices and multiple operatio= ns > > simultaneously. > > > A typical Session parameters includes info such as: > > > - compress / decompress > > > - compression algorithm and associated configuration parameters > > > > > > Application can create different sessions on a device initialized wit= h > > same/different xforms. Once a session is initialized with one xform it = cannot > > be re-initialized. > > > > > > C. Notion of stream in compression API > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > Unlike session which carry common set of information across operation= s, a > > stream in DPDK compression is a logical entity which identify related s= et of > > operations and carry operation specific information as needed by device > > during its processing. > > > It is device specific data structure which is opaque to application, = setup and > > maintained by device. > > > > > > A stream can be used with *only* one op at a time i.e. no two operati= ons > > can share same stream simultaneously. > > > A stream is *must* for stateful ops processing and optional for state= less > > (Please see respective sections for more details). > > > > > > This enables sharing of a session by multiple threads handling differ= ent > > data set as each op carry its own context (internal states, history buf= fers et > > el) in its attached stream. > > > Application should call rte_comp_stream_create() and attach to op bef= ore > > beginning of operation processing and free via rte_comp_stream_free() > > after its complete. > > > > > > C. Notion of burst operations in compression API > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > A burst in DPDK compression is an array of operations where each op c= arry > > independent set of data. i.e. a burst can look like: > > > > > > -------------------------------= -------------------------------------- > > ------------------------------------ > > > enque_burst (|op1.no_flush | op2.no_flush | op3.flush_f= inal | > > op4.no_flush | op5.no_flush |) > > > ------------------------------= -------------------------------------- > > ------------------------------------- > > > > > > Where, op1 .. op5 are all independent of each other and carry entirel= y > > different set of data. > > > Each op can be attached to same/different session but *must* be attac= hed > > to different stream. > > > > > > Each op (struct rte_comp_op) carry compression/decompression > > operational parameter and is both an input/output parameter. > > > PMD gets source, destination and checksum information at input and > > update it with bytes consumed and produced and checksum at output. > > > > > > Since each operation in a burst is independent and thus can complete = out- > > of-order, applications which need ordering, should setup per-op user d= ata > > area with reordering information so that it can determine enqueue order= at > > deque. > > > > > > Also if multiple threads calls enqueue_burst() on same queue pair the= n it's > > application onus to use proper locking mechanism to ensure exclusive > > enqueuing of operations. > > > > > > D. Stateless Vs Stateful > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > Compression API provide RTE_COMP_FF_STATEFUL feature flag for PMD > > to reflect its support for Stateful operation. Each op carry an op type > > indicating if it's to be processed stateful or stateless. > > > > > > D.1 Compression API Stateless operation > > > ------------------------------------------------------ > > > An op is processed stateless if it has > > > - flush value is set to RTE_FLUSH_FULL or RTE_FLUSH_FINA= L > > (required only on compression side), > > > - op_type set to RTE_COMP_OP_STATELESS > > > - All-of the required input and sufficient large output = buffer to store > > output i.e. OUT_OF_SPACE can never occur. > > > > > > When all of the above conditions are met, PMD initiates stateless > > processing and releases acquired resources after processing of current > > operation is complete i.e. full input consumed and full output written. [Fiona] I think 3rd condition conflicts with D1.1 below and anyway cannot b= e a precondition. i.e.=20 PMD must initiate stateless processing based on RTE_COMP_OP_STATELESS. It can't always know if the output buffer is big enough before processing, = it must process the input data and=20 only when it has consumed it all can it know that all the output data fits = or doesn't fit in the output buffer. I'd suggest rewording as follows: An op is processed statelessly if op_type is set to RTE_COMP_OP_STATELESS In this case - The flush value must be set to RTE_FLUSH_FULL or RTE_FLUSH_FINAL (require= d only on compression side), - All of the input data must be in the src buffer - The dst buffer should be sufficiently large enough to hold the expected o= utput The PMD acquires the necessary resources to process the op. After processin= g of current operation is=20 complete, whether successful or not, it releases acquired resources and no = state, history or data is held in the PMD or carried over to subsequent ops. In SUCCESS case full input is consumed and full output written and status i= s set to RTE_COMP_OP_STATUS_SUCCESS. OUT-OF-SPACE as D1.1 below. > > > Application can optionally attach a stream to such ops. In such case, > > application must attach different stream to each op. > > > > > > Application can enqueue stateless burst via making consecutive > > enque_burst() calls i.e. Following is relevant usage: > > > > > > enqueued =3D rte_comp_enque_burst (dev_id, qp_id, ops1, nb_ops); > > > enqueued =3D rte_comp_enque_burst(dev_id, qp_id, ops2, nb_ops); > > > > > > *Note - Every call has different ops array i.e. same rte_comp_op arr= ay > > *cannot be re-enqueued* to process next batch of data until previous on= es > > are completely processed. > > > > > > D.1.1 Stateless and OUT_OF_SPACE > > > ------------------------------------------------ > > > OUT_OF_SPACE is a condition when output buffer runs out of space and > > where PMD still has more data to produce. If PMD run into such conditio= n, > > then it's an error condition in stateless processing. > > > In such case, PMD resets itself and return with status > > RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=3Dconsumed=3D0 i.e. > > no input read, no output written. > > > Application can resubmit an full input with larger output buffer size= . > > > > [Ahmed] Can we add an option to allow the user to read the data that wa= s > > produced while still reporting OUT_OF_SPACE? this is mainly useful for > > decompression applications doing search. >=20 > [Shally] It is there but applicable for stateful operation type (please r= efer to handling out_of_space under > "Stateful Section"). > By definition, "stateless" here means that application (such as IPCOMP) k= nows maximum output size > guaranteedly and ensure that uncompressed data size cannot grow more than= provided output buffer. > Such apps can submit an op with type =3D STATELESS and provide full input= , then PMD assume it has > sufficient input and output and thus doesn't need to maintain any context= s after op is processed. > If application doesn't know about max output size, then it should process= it as stateful op i.e. setup op > with type =3D STATEFUL and attach a stream so that PMD can maintain relev= ant context to handle such > condition. [Fiona] There may be an alternative that's useful for Ahmed, while still re= specting the stateless concept. In Stateless case where a PMD reports OUT_OF_SPACE in decompression case=20 it could also return consumed=3D0, produced =3D x, where x>0. X indicates t= he amount of valid data which has been written to the output buffer. It is not complete, but if an applicati= on wants to search it it may be sufficient. If the application still wants the data it must resubmit the whole input wi= th a bigger output buffer, and decompression will be repeated from the start, it cannot expect to continue on as the PMD has not maintained state, history = or data. I don't think there would be any need to indicate this in capabilities, PMD= s which cannot provide this=20 functionality would always return produced=3Dconsumed=3D0, while PMDs which= can could set produced > 0. If this works for you both, we could consider a similar case for compressio= n. >=20 > > > > > D.2 Compression API Stateful operation > > > ---------------------------------------------------------- > > > A Stateful operation in DPDK compression means application invokes > > enqueue burst() multiple times to process related chunk of data either > > because > > > - Application broke data into several ops, and/or > > > - PMD ran into out_of_space situation during input processing > > > > > > In case of either one or all of the above conditions, PMD is required= to > > maintain state of op across enque_burst() calls and > > > ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with > > flush value =3D RTE_COMP_NO/SYNC_FLUSH and end at flush value > > RTE_COMP_FULL/FINAL_FLUSH. > > > > > > D.2.1 Stateful operation state maintenance > > > --------------------------------------------------------------- > > > It is always an ideal expectation from application that it should par= se > > through all related chunk of source data making its mbuf-chain and enqu= eue > > it for stateless processing. > > > However, if it need to break it into several enqueue_burst() calls, t= hen an > > expected call flow would be something like: > > > > > > enqueue_burst( |op.no_flush |) > > > > [Ahmed] The work is now in flight to the PMD.The user will call dequeue > > burst in a loop until all ops are received. Is this correct? > > > > > deque_burst(op) // should dequeue before we enqueue next >=20 > [Shally] Yes. Ideally every submitted op need to be dequeued. However thi= s illustration is specifically in > context of stateful op processing to reflect if a stream is broken into c= hunks, then each chunk should be > submitted as one op at-a-time with type =3D STATEFUL and need to be deque= ued first before next chunk is > enqueued. >=20 > > > enqueue_burst( |op.no_flush |) > > > deque_burst(op) // should dequeue before we enqueue next > > > enqueue_burst( |op.full_flush |) > > > > [Ahmed] Why now allow multiple work items in flight? I understand that > > occasionaly there will be OUT_OF_SPACE exception. Can we just distingui= sh > > the response in exception cases? >=20 > [Shally] Multiples ops are allowed in flight, however condition is each o= p in such case is independent of > each other i.e. belong to different streams altogether. > Earlier (as part of RFC v1 doc) we did consider the proposal to process a= ll related chunks of data in single > burst by passing them as ops array but later found that as not-so-useful = for PMD handling for various > reasons. You may please refer to RFC v1 doc review comments for same. [Fiona] Agree with Shally. In summary, as only one op can be processed at a= time, since each needs the state of the previous, to allow more than 1 op to be in-flight at a time wo= uld force PMDs to implement internal queueing and exception handling for OUT_OF= _SPACE conditions you mention. If the application has all the data, it can put it into chained mbufs in a = single op rather than multiple ops, which avoids pushing all that complexity down to the PMDs. >=20 > > > > > > Here an op *must* be attached to a stream and every subsequent > > enqueue_burst() call should carry *same* stream. Since PMD maintain ops > > state in stream, thus it is mandatory for application to attach stream = to such > > ops. [Fiona] I think you're referring only to a single stream above, but as ther= e may be many different streams, maybe add the following? Above is simplified to show just a single stream. However there may be many= streams, and each=20 enqueue_burst() may contain ops from different streams, as long as there is= only one op in-flight from any stream at a given time. > > > > > > D.2.2 Stateful and Out_of_Space > > > -------------------------------------------- > > > If PMD support stateful and run into OUT_OF_SPACE situation, then it = is > > not an error condition for PMD. In such case, PMD return with status > > RTE_COMP_OP_STATUS_OUT_OF_SPACE with consumed =3D number of input > > bytes read and produced =3D length of complete output buffer. [Fiona] - produced would be <=3D output buffer len (typically =3D, but coul= d be a few bytes less) > > > Application should enqueue op with source starting at consumed+1 and > > output buffer with available space. > > > > [Ahmed] Related to OUT_OF_SPACE. What status does the user recieve in a > > decompression case when the end block is encountered before the end of > > the input? Does the PMD continue decomp? Does it stop there and return > > the stop index? > > >=20 > [Shally] Before I could answer this, please help me understand your use c= ase . When you say "when the > end block is encountered before the end of the input?" Do you mean - > "Decompressor process a final block (i.e. has BFINAL=3D1 in its header) a= nd there's some footer data after > that?" Or > you mean "decompressor process one block and has more to process till its= final block?" > What is "end block" and "end of input" reference here? >=20 > > > > > > D.2.3 Sliding Window Size > > > ------------------------------------ > > > Every PMD will reflect in its algorithm capability structure maximum = length > > of Sliding Window in bytes which would indicate maximum history buffer > > length used by algo. > > > > > > 2. Example API illustration > > > ~~~~~~~~~~~~~~~~~~~~~~~ > > > [Fiona] I think it would be useful to show an example of both a STATELESS f= low and a STATEFUL flow. > > > Following is an illustration on API usage (This is just one flow, ot= her variants > > are also possible): > > > 1. rte_comp_session *sess =3D rte_compressdev_session_create > > (rte_mempool *pool); > > > 2. rte_compressdev_session_init (int dev_id, rte_comp_session *sess, > > rte_comp_xform *xform, rte_mempool *sess_pool); > > > 3. rte_comp_op_pool_create(rte_mempool ..) > > > 4. rte_comp_op_bulk_alloc (struct rte_mempool *mempool, struct > > rte_comp_op **ops, uint16_t nb_ops); > > > 5. for every rte_comp_op in ops[], > > > 5.1 rte_comp_op_attach_session (rte_comp_op *op, rte_comp_session > > *sess); > > > 5.2 op.op_type =3D RTE_COMP_OP_STATELESS > > > 5.3 op.flush =3D RTE_FLUSH_FINAL > > > 6. [Optional] for every rte_comp_op in ops[], > > > 6.1 rte_comp_stream_create(int dev_id, rte_comp_session *sess, vo= id > > **stream); > > > 6.2 rte_comp_op_attach_stream(rte_comp_op *op, rte_comp_session > > *stream); > > > > [Ahmed] What is the semantic effect of attaching a stream to every op? = will > > this application benefit for this given that it is setup with op_type S= TATELESS >=20 > [Shally] By role, stream is data structure that hold all information that= PMD need to maintain for an op > processing and thus it's marked device specific. It is required for state= ful processing but optional for > statelss as PMD doesn't need to maintain context once op is processed unl= ike stateful. > It may be of advantage to use stream for stateless to some of the PMD. Th= ey can be designed to do one- > time per op setup (such as mapping session params) during stream_create()= in control path than data > path. >=20 [Fiona] yes, we agreed that stream_create() should be called for every sess= ion and if it returns non-NULL the PMD needs it so op_attach_stream() must be called. However I've just realised we don't have a STATEFUL/STATELESS param on the = xform, just on the op. So we could either add stateful/stateless param to stream_create() ? OR add stateful/stateless param to xform so it would be in session? However, Shally, can you reconsider if you really need it for STATELESS or = if the data you want to=20 store there could be stored in the session? Or if it's needed per-op does i= t really need to be visible on the API as a stream or could it be hidden within the PMD? > > > > > 7.for every rte_comp_op in ops[], > > > 7.1 set up with src/dst buffer > > > 8. enq =3D rte_compressdev_enqueue_burst (dev_id, qp_id, &ops, nb_ops= ); > > > 9. do while (dqu < enq) // Wait till all of enqueued are dequeued > > > 9.1 dqu =3D rte_compressdev_dequeue_burst (dev_id, qp_id, &ops, e= nq); > > > > [Ahmed] I am assuming that waiting for all enqueued to be dequeued is n= ot > > strictly necessary, but is just the chosen example in this case > > >=20 > [Shally] Yes. By design, for burst_size>1 each op is independent of each = other. So app may proceed as soon > as it dequeue any. >=20 > > > 10. Repeat 7 for next batch of data > > > 11. for every ops in ops[] > > > 11.1 rte_comp_stream_free(op->stream); > > > 11. rte_comp_session_clear (sess) ; > > > 12. rte_comp_session_terminate(ret_comp_sess *session) > > > > > > Thanks > > > Shally > > > > > >