From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM01-SN1-obe.outbound.protection.outlook.com (mail-sn1nam01on0055.outbound.protection.outlook.com [104.47.32.55]) by dpdk.org (Postfix) with ESMTP id 891521B1E2 for ; Wed, 10 Jan 2018 13:55:17 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=xgcdj4eohzUL2iCa0CfToS6wtVitqmevWfo+RWaD7tk=; b=RouvLjkuNL+FTWtPcxJY80UrjQtLTFJfpbRRvFHw8zkHRq9WngFrMzlXRwcTSOO2P1WV+nKwwRqCLlByfWvsbXJfOQYcbikl6hvW2doVm/tcfTYA9jNwrVKSHp32euM23fYekrOf+aDnFYavilY79uGK3p32Si4Hlm5aM+lCGlg= Received: from BY1PR0701MB1111.namprd07.prod.outlook.com (10.160.104.21) by CY4PR07MB3063.namprd07.prod.outlook.com (10.172.116.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.386.5; Wed, 10 Jan 2018 12:55:15 +0000 Received: from BY1PR0701MB1111.namprd07.prod.outlook.com ([fe80::e040:3b79:6671:b66a]) by BY1PR0701MB1111.namprd07.prod.outlook.com ([fe80::e040:3b79:6671:b66a%14]) with mapi id 15.20.0386.009; Wed, 10 Jan 2018 12:55:12 +0000 From: "Verma, Shally" To: Ahmed Mansour , "Trahe, Fiona" , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHgEsOofg Date: Wed, 10 Jan 2018 12:55:12 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Shally.Verma@cavium.com; x-originating-ip: [115.113.156.2] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; CY4PR07MB3063; 7:IIKabTzNaLFSK0yy7XUWRG+rV1h5f3D7dxBOXjWAxNSIuLKugXPQNk7ETtN8kRpTk3dJe5kSA95IOe5XCqqq2OwZaJ33gPtdLQfv8gVyCYlWh7JqaTUonhfFjbMcjCXn0cHM3p3YVPpI8fFiQumWqpdN/R8DC0LHInA29AtArRcz4f4UiWkJLlPCRFTio2vmKgBRki15yINUL6WD0W9iK5L+4ycSYOEKbrBhHhAgE9c9krL3AhJqQq5ABNBO+rM5 x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(376002)(366004)(346002)(396003)(39380400002)(39860400002)(51914003)(53474002)(199004)(13464003)(189003)(24454002)(8656006)(2501003)(5250100002)(5890100001)(9686003)(74316002)(55016002)(81166006)(54906003)(110136005)(6306002)(2900100001)(81156014)(8676002)(316002)(8936002)(305945005)(229853002)(6436002)(2950100002)(66066001)(53936002)(53946003)(7736002)(5660300001)(86362001)(3280700002)(3846002)(102836004)(7696005)(99286004)(4326008)(53546011)(6506007)(55236004)(33656002)(25786009)(3660700001)(575784001)(6246003)(6116002)(2906002)(76176011)(59450400001)(68736007)(106356001)(966005)(105586002)(97736004)(72206003)(14454004)(478600001)(561944003)(45080400002); DIR:OUT; SFP:1101; SCL:1; SRVR:CY4PR07MB3063; H:BY1PR0701MB1111.namprd07.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; x-ms-office365-filtering-correlation-id: 45e11c1a-78c2-4f23-6b0d-08d558296336 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(5600026)(4604075)(3008032)(4534020)(4602075)(7168020)(4627115)(201703031133081)(201702281549075)(2017052603307)(7153060)(7193020); SRVR:CY4PR07MB3063; x-ms-traffictypediagnostic: CY4PR07MB3063: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(278428928389397)(189930954265078)(185117386973197)(45079756050767)(211171220733660)(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040470)(2401047)(5005006)(8121501046)(3002001)(93006095)(93001095)(3231023)(944501075)(10201501046)(6041268)(20161123564045)(20161123562045)(20161123558120)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:CY4PR07MB3063; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:CY4PR07MB3063; x-forefront-prvs: 0548586081 received-spf: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: 0cb1NOGCYRp+FVyun7RHPzIEhJ6JIc/GBraRnMuXeFwMfH1qTFZZe7EI9HCqZT6svZKNoanwST1QL7UPyvcUoQ== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-Network-Message-Id: 45e11c1a-78c2-4f23-6b0d-08d558296336 X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Jan 2018 12:55:12.2957 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR07MB3063 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jan 2018 12:55:18 -0000 HI Ahmed > -----Original Message----- > From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] > Sent: 10 January 2018 00:38 > To: Verma, Shally ; Trahe, Fiona > ; dev@dpdk.org > Cc: Athreya, Narayana Prasad ; > Gupta, Ashish ; Sahu, Sunila > ; De Lara Guarch, Pablo > ; Challa, Mahipal > ; Jain, Deepak K ; > Hemant Agrawal ; Roy Pledge > ; Youri Querry > Subject: Re: [RFC v2] doc compression API for DPDK >=20 > Hi Shally, >=20 > Thanks for the summary. It is very helpful. Please see comments below >=20 >=20 > On 1/4/2018 6:45 AM, Verma, Shally wrote: > > This is an RFC v2 document to brief understanding and requirements on > compression API proposal in DPDK. It is based on "[RFC v3] Compression AP= I > in DPDK > https://emea01.safelinks.protection.outlook.com/?url=3Dhttp%3A%2F%2Fdpd > k.org%2Fdev%2Fpatchwork%2Fpatch%2F32331%2F&data=3D02%7C01%7Cahm > ed.mansour%40nxp.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea > 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636506631207323264&sdata=3DJF > tOnJxajgXX7s3DMZ79K7VVM7TXO8lBd6rNeVlsHDg%3D&reserved=3D0 ". > > Intention of this document is to align on concepts built into compressi= on > API, its usage and identify further requirements. > > > > Going further it could be a base to Compression Module Programmer > Guide. > > > > Current scope is limited to > > - definition of the terminology which makes up foundation of compressio= n > API > > - typical API flow expected to use by applications > > - Stateless and Stateful operation definition and usage after RFC v1 do= c > review > https://emea01.safelinks.protection.outlook.com/?url=3Dhttp%3A%2F%2Fdev. > dpdk.narkive.com%2FCHS5l01B%2Fdpdk-dev-rfc-v1-doc-compression-api- > for- > dpdk&data=3D02%7C01%7Cahmed.mansour%40nxp.com%7C80bd3270430c473 > fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6 > 36506631207323264&sdata=3DFy7xKIyxZX97i7vEM6NqgrvnqKrNrWOYLwIA5dEH > QNQ%3D&reserved=3D0 > > > > 1. Overview > > ~~~~~~~~~~~ > > > > A. Compression Methodologies in compression API > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > DPDK compression supports two types of compression methodologies: > > - Stateless - each data object is compressed individually without any > reference to previous data, > > - Stateful - each data object is compressed with reference to previous= data > object i.e. history of data is needed for compression / decompression > > For more explanation, please refer RFC > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fw > ww.ietf.org%2Frfc%2Frfc1951.txt&data=3D02%7C01%7Cahmed.mansour%40nx > p.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd9 > 9c5c301635%7C0%7C0%7C636506631207323264&sdata=3Dpfp2VX1w3UxH5YLcL > 2R%2BvKXNeS7jP46CsASq0B1SETw%3D&reserved=3D0 > > > > To support both methodologies, DPDK compression introduces two key > concepts: Session and Stream. > > > > B. Notion of a session in compression API > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > A Session in DPDK compression is a logical entity which is setup one-ti= me > with immutable parameters i.e. parameters that don't change across > operations and devices. > > A session can be shared across multiple devices and multiple operations > simultaneously. > > A typical Session parameters includes info such as: > > - compress / decompress > > - compression algorithm and associated configuration parameters > > > > Application can create different sessions on a device initialized with > same/different xforms. Once a session is initialized with one xform it ca= nnot > be re-initialized. > > > > C. Notion of stream in compression API > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Unlike session which carry common set of information across operations,= a > stream in DPDK compression is a logical entity which identify related set= of > operations and carry operation specific information as needed by device > during its processing. > > It is device specific data structure which is opaque to application, se= tup and > maintained by device. > > > > A stream can be used with *only* one op at a time i.e. no two operation= s > can share same stream simultaneously. > > A stream is *must* for stateful ops processing and optional for statele= ss > (Please see respective sections for more details). > > > > This enables sharing of a session by multiple threads handling differen= t > data set as each op carry its own context (internal states, history buffe= rs et > el) in its attached stream. > > Application should call rte_comp_stream_create() and attach to op befor= e > beginning of operation processing and free via rte_comp_stream_free() > after its complete. > > > > C. Notion of burst operations in compression API > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > A burst in DPDK compression is an array of operations where each op car= ry > independent set of data. i.e. a burst can look like: > > > > ---------------------------------= ------------------------------------ > ------------------------------------ > > enque_burst (|op1.no_flush | op2.no_flush | op3.flush_fin= al | > op4.no_flush | op5.no_flush |) > > --------------------------------= ------------------------------------ > ------------------------------------- > > > > Where, op1 .. op5 are all independent of each other and carry entirely > different set of data. > > Each op can be attached to same/different session but *must* be attache= d > to different stream. > > > > Each op (struct rte_comp_op) carry compression/decompression > operational parameter and is both an input/output parameter. > > PMD gets source, destination and checksum information at input and > update it with bytes consumed and produced and checksum at output. > > > > Since each operation in a burst is independent and thus can complete ou= t- > of-order, applications which need ordering, should setup per-op user dat= a > area with reordering information so that it can determine enqueue order a= t > deque. > > > > Also if multiple threads calls enqueue_burst() on same queue pair then = it's > application onus to use proper locking mechanism to ensure exclusive > enqueuing of operations. > > > > D. Stateless Vs Stateful > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Compression API provide RTE_COMP_FF_STATEFUL feature flag for PMD > to reflect its support for Stateful operation. Each op carry an op type > indicating if it's to be processed stateful or stateless. > > > > D.1 Compression API Stateless operation > > ------------------------------------------------------ > > An op is processed stateless if it has > > - flush value is set to RTE_FLUSH_FULL or RTE_FLUSH_FINAL > (required only on compression side), > > - op_type set to RTE_COMP_OP_STATELESS > > - All-of the required input and sufficient large output bu= ffer to store > output i.e. OUT_OF_SPACE can never occur. > > > > When all of the above conditions are met, PMD initiates stateless > processing and releases acquired resources after processing of current > operation is complete i.e. full input consumed and full output written. > > Application can optionally attach a stream to such ops. In such case, > application must attach different stream to each op. > > > > Application can enqueue stateless burst via making consecutive > enque_burst() calls i.e. Following is relevant usage: > > > > enqueued =3D rte_comp_enque_burst (dev_id, qp_id, ops1, nb_ops); > > enqueued =3D rte_comp_enque_burst(dev_id, qp_id, ops2, nb_ops); > > > > *Note - Every call has different ops array i.e. same rte_comp_op array > *cannot be re-enqueued* to process next batch of data until previous ones > are completely processed. > > > > D.1.1 Stateless and OUT_OF_SPACE > > ------------------------------------------------ > > OUT_OF_SPACE is a condition when output buffer runs out of space and > where PMD still has more data to produce. If PMD run into such condition, > then it's an error condition in stateless processing. > > In such case, PMD resets itself and return with status > RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=3Dconsumed=3D0 i.e. > no input read, no output written. > > Application can resubmit an full input with larger output buffer size. >=20 > [Ahmed] Can we add an option to allow the user to read the data that was > produced while still reporting OUT_OF_SPACE? this is mainly useful for > decompression applications doing search. [Shally] It is there but applicable for stateful operation type (please ref= er to handling out_of_space under "Stateful Section"). By definition, "stateless" here means that application (such as IPCOMP) kno= ws maximum output size guaranteedly and ensure that uncompressed data size = cannot grow more than provided output buffer. Such apps can submit an op with type =3D STATELESS and provide full input, = then PMD assume it has sufficient input and output and thus doesn't need to= maintain any contexts after op is processed.=20 If application doesn't know about max output size, then it should process i= t as stateful op i.e. setup op with type =3D STATEFUL and attach a stream s= o that PMD can maintain relevant context to handle such condition. >=20 > > D.2 Compression API Stateful operation > > ---------------------------------------------------------- > > A Stateful operation in DPDK compression means application invokes > enqueue burst() multiple times to process related chunk of data either > because > > - Application broke data into several ops, and/or > > - PMD ran into out_of_space situation during input processing > > > > In case of either one or all of the above conditions, PMD is required t= o > maintain state of op across enque_burst() calls and > > ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with > flush value =3D RTE_COMP_NO/SYNC_FLUSH and end at flush value > RTE_COMP_FULL/FINAL_FLUSH. > > > > D.2.1 Stateful operation state maintenance > > --------------------------------------------------------------- > > It is always an ideal expectation from application that it should parse > through all related chunk of source data making its mbuf-chain and enqueu= e > it for stateless processing. > > However, if it need to break it into several enqueue_burst() calls, the= n an > expected call flow would be something like: > > > > enqueue_burst( |op.no_flush |) >=20 > [Ahmed] The work is now in flight to the PMD.The user will call dequeue > burst in a loop until all ops are received. Is this correct? >=20 > > deque_burst(op) // should dequeue before we enqueue next [Shally] Yes. Ideally every submitted op need to be dequeued. However this = illustration is specifically in context of stateful op processing to reflec= t if a stream is broken into chunks, then each chunk should be submitted as= one op at-a-time with type =3D STATEFUL and need to be dequeued first befo= re next chunk is enqueued. > > enqueue_burst( |op.no_flush |) > > deque_burst(op) // should dequeue before we enqueue next > > enqueue_burst( |op.full_flush |) >=20 > [Ahmed] Why now allow multiple work items in flight? I understand that > occasionaly there will be OUT_OF_SPACE exception. Can we just distinguish > the response in exception cases? [Shally] Multiples ops are allowed in flight, however condition is each op = in such case is independent of each other i.e. belong to different streams = altogether. Earlier (as part of RFC v1 doc) we did consider the proposal to process all= related chunks of data in single burst by passing them as ops array but la= ter found that as not-so-useful for PMD handling for various reasons. You m= ay please refer to RFC v1 doc review comments for same. =20 > > > > Here an op *must* be attached to a stream and every subsequent > enqueue_burst() call should carry *same* stream. Since PMD maintain ops > state in stream, thus it is mandatory for application to attach stream to= such > ops. > > > > D.2.2 Stateful and Out_of_Space > > -------------------------------------------- > > If PMD support stateful and run into OUT_OF_SPACE situation, then it is > not an error condition for PMD. In such case, PMD return with status > RTE_COMP_OP_STATUS_OUT_OF_SPACE with consumed =3D number of input > bytes read and produced =3D length of complete output buffer. > > Application should enqueue op with source starting at consumed+1 and > output buffer with available space. >=20 > [Ahmed] Related to OUT_OF_SPACE. What status does the user recieve in a > decompression case when the end block is encountered before the end of > the input? Does the PMD continue decomp? Does it stop there and return > the stop index? >=20 [Shally] Before I could answer this, please help me understand your use cas= e . When you say "when the end block is encountered before the end of the = input?" Do you mean - "Decompressor process a final block (i.e. has BFINAL=3D1 in its header) and= there's some footer data after that?" Or=20 you mean "decompressor process one block and has more to process till its f= inal block?" What is "end block" and "end of input" reference here? > > > > D.2.3 Sliding Window Size > > ------------------------------------ > > Every PMD will reflect in its algorithm capability structure maximum le= ngth > of Sliding Window in bytes which would indicate maximum history buffer > length used by algo. > > > > 2. Example API illustration > > ~~~~~~~~~~~~~~~~~~~~~~~ > > > > Following is an illustration on API usage (This is just one flow, othe= r variants > are also possible): > > 1. rte_comp_session *sess =3D rte_compressdev_session_create > (rte_mempool *pool); > > 2. rte_compressdev_session_init (int dev_id, rte_comp_session *sess, > rte_comp_xform *xform, rte_mempool *sess_pool); > > 3. rte_comp_op_pool_create(rte_mempool ..) > > 4. rte_comp_op_bulk_alloc (struct rte_mempool *mempool, struct > rte_comp_op **ops, uint16_t nb_ops); > > 5. for every rte_comp_op in ops[], > > 5.1 rte_comp_op_attach_session (rte_comp_op *op, rte_comp_session > *sess); > > 5.2 op.op_type =3D RTE_COMP_OP_STATELESS > > 5.3 op.flush =3D RTE_FLUSH_FINAL > > 6. [Optional] for every rte_comp_op in ops[], > > 6.1 rte_comp_stream_create(int dev_id, rte_comp_session *sess, void > **stream); > > 6.2 rte_comp_op_attach_stream(rte_comp_op *op, rte_comp_session > *stream); >=20 > [Ahmed] What is the semantic effect of attaching a stream to every op? wi= ll > this application benefit for this given that it is setup with op_type STA= TELESS [Shally] By role, stream is data structure that hold all information that P= MD need to maintain for an op processing and thus it's marked device specif= ic. It is required for stateful processing but optional for statelss as PMD= doesn't need to maintain context once op is processed unlike stateful. It may be of advantage to use stream for stateless to some of the PMD. They= can be designed to do one-time per op setup (such as mapping session param= s) during stream_create() in control path than data path. >=20 > > 7.for every rte_comp_op in ops[], > > 7.1 set up with src/dst buffer > > 8. enq =3D rte_compressdev_enqueue_burst (dev_id, qp_id, &ops, nb_ops); > > 9. do while (dqu < enq) // Wait till all of enqueued are dequeued > > 9.1 dqu =3D rte_compressdev_dequeue_burst (dev_id, qp_id, &ops, enq= ); >=20 > [Ahmed] I am assuming that waiting for all enqueued to be dequeued is not > strictly necessary, but is just the chosen example in this case >=20 [Shally] Yes. By design, for burst_size>1 each op is independent of each ot= her. So app may proceed as soon as it dequeue any. > > 10. Repeat 7 for next batch of data > > 11. for every ops in ops[] > > 11.1 rte_comp_stream_free(op->stream); > > 11. rte_comp_session_clear (sess) ; > > 12. rte_comp_session_terminate(ret_comp_sess *session) > > > > Thanks > > Shally > > > >