From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ahmed.mansour@nxp.com>
Received: from EUR01-DB5-obe.outbound.protection.outlook.com
 (mail-db5eur01on0055.outbound.protection.outlook.com [104.47.2.55])
 by dpdk.org (Postfix) with ESMTP id D9B245681
 for <dev@dpdk.org>; Wed, 21 Feb 2018 20:35:39 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nxp.com; s=selector1; 
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;
 bh=YCucxZvX1BtcLJdf2F1bgxA+8VcspMAIXHajVYCydq4=;
 b=dC2E8HrTJpUjhRN93e7Dv3yipG3yf8Oi1IKwVx9xZC8zUa2+RcnKCSIrKRu61Hjdi4cl95Ig6TGkZWBV4yvHtAbrMjqfAd/RG7/HdxN2AJEzA/WyF1+vfC35APL11bgGL9/NsGRu9U+DHhfxfdguSd+v2xd6WDawnPIo/Gcf9N0=
Received: from DB3PR0402MB3852.eurprd04.prod.outlook.com (52.134.71.143) by
 DB3PR0402MB3866.eurprd04.prod.outlook.com (52.134.71.145) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id
 15.20.506.18; Wed, 21 Feb 2018 19:35:37 +0000
Received: from DB3PR0402MB3852.eurprd04.prod.outlook.com
 ([fe80::8554:d533:15e:1376]) by DB3PR0402MB3852.eurprd04.prod.outlook.com
 ([fe80::8554:d533:15e:1376%13]) with mapi id 15.20.0506.023; Wed, 21 Feb 2018
 19:35:36 +0000
From: Ahmed Mansour <ahmed.mansour@nxp.com>
To: "Trahe, Fiona" <fiona.trahe@intel.com>, "Verma, Shally"
 <Shally.Verma@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>
CC: "Athreya, Narayana Prasad" <NarayanaPrasad.Athreya@cavium.com>, "Gupta,
 Ashish" <Ashish.Gupta@cavium.com>, "Sahu, Sunila" <Sunila.Sahu@cavium.com>,
 "De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>, "Challa, Mahipal"
 <Mahipal.Challa@cavium.com>, "Jain, Deepak K" <deepak.k.jain@intel.com>,
 Hemant Agrawal <hemant.agrawal@nxp.com>, Roy Pledge <roy.pledge@nxp.com>,
 Youri Querry <youri.querry_1@nxp.com>
Thread-Topic: [RFC v2] doc compression API for DPDK
Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHg==
Date: Wed, 21 Feb 2018 19:35:35 +0000
Message-ID: <DB3PR0402MB38522013D57D303CC2AA3363E1CE0@DB3PR0402MB3852.eurprd04.prod.outlook.com>
References: <BY1PR0701MB1111E78D7398ECBFD76E98B3F01F0@BY1PR0701MB1111.namprd07.prod.outlook.com>
 <AM0PR0402MB3842F20147CFD02B62EAA6B5E1100@AM0PR0402MB3842.eurprd04.prod.outlook.com>
 <BY1PR0701MB1111EFBFFF380349DEEE7EF7F0110@BY1PR0701MB1111.namprd07.prod.outlook.com>
 <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com>
 <BY1PR0701MB11117F9E9B314BA62A7CA6CFF0170@BY1PR0701MB1111.namprd07.prod.outlook.com>
 <VI1PR0402MB38531C5234498E5EE9C2BE62E1E10@VI1PR0402MB3853.eurprd04.prod.outlook.com>
 <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com>
 <DB3PR0402MB3852855639B5464D3AE1A441E1FA0@DB3PR0402MB3852.eurprd04.prod.outlook.com>
 <CY4PR0701MB36346F0F27F97834E2BD1D59F0F50@CY4PR0701MB3634.namprd07.prod.outlook.com>
 <348A99DA5F5B7549AA880327E580B4358931F82B@IRSMSX101.ger.corp.intel.com>
 <AM0PR0402MB3842EBB5E54770B624F29BACE1F40@AM0PR0402MB3842.eurprd04.prod.outlook.com>
 <MWHPR0701MB36410C328F5A5890057283C0F0CB0@MWHPR0701MB3641.namprd07.prod.outlook.com>
 <348A99DA5F5B7549AA880327E580B43589321277@IRSMSX101.ger.corp.intel.com>
 <AM0PR0402MB38420AA6F8DD32BB465B9A9CE1CB0@AM0PR0402MB3842.eurprd04.prod.outlook.com>
 <CY4PR0701MB36340844553E78E55E38FDECF0CF0@CY4PR0701MB3634.namprd07.prod.outlook.com>
 <DB3PR0402MB3852CEF63F85AE855FC0FA94E1CF0@DB3PR0402MB3852.eurprd04.prod.outlook.com>
 <348A99DA5F5B7549AA880327E580B43589324E3A@IRSMSX101.ger.corp.intel.com>
Accept-Language: en-CA, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=ahmed.mansour@nxp.com; 
x-originating-ip: [192.88.168.1]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; DB3PR0402MB3866;
 7:aPvYBit39Hv4QADP8bdqdLEsNN2H0sLklkbMKfvxycybCunQRj/pipN2FuAwGP/wRgJMCul9aYXgS65np1kPQSaCG3jDpjfJSybcKNwkZHXLPUyeu+SAGvxXEVOb8J2BPMXDjYgr8rf5OnHhEOSfP1//YD6KY3Xtxkmc2Y+eNCxH0H0TQMnWprPVuph0VkdCOnoJ8kLekiexXjIOWHjyODKhNuS1uWpqdcLTYnCr+/CHklRaZd90xXR2VGpYdYCb
x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR;
x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM;
 SFS:(10009020)(366004)(39380400002)(39860400002)(396003)(346002)(376002)(57704003)(13464003)(189003)(199004)(59450400001)(2900100001)(229853002)(2501003)(3846002)(4326008)(6116002)(9686003)(99286004)(6246003)(316002)(76176011)(53936002)(106356001)(6436002)(55016002)(54906003)(66066001)(33656002)(53946003)(68736007)(105586002)(2906002)(7736002)(5250100002)(305945005)(110136005)(74316002)(5890100001)(3280700002)(7696005)(97736004)(86362001)(81156014)(93886005)(102836004)(8936002)(5660300001)(25786009)(3660700001)(478600001)(14454004)(561944003)(186003)(53546011)(6506007)(8676002)(26005)(81166006);
 DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR0402MB3866;
 H:DB3PR0402MB3852.eurprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords;
 MX:1; A:1; LANG:en; 
x-ms-office365-filtering-correlation-id: 8583e994-8c1f-402c-100b-08d5796247cf
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: UriScan:; BCL:0; PCL:0;
 RULEID:(7020095)(4652020)(5600026)(4604075)(3008032)(48565401081)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603307)(7153060)(7193020);
 SRVR:DB3PR0402MB3866; 
x-ms-traffictypediagnostic: DB3PR0402MB3866:
x-microsoft-antispam-prvs: <DB3PR0402MB38669A9B0D35AB7E65B89EA3E1CE0@DB3PR0402MB3866.eurprd04.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(244540007438412)(278428928389397)(271806183753584)(185117386973197)(228905959029699);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0;
 RULEID:(6040501)(2401047)(5005006)(8121501046)(93006095)(93001095)(3231101)(944501161)(10201501046)(3002001)(6055026)(6041288)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123560045)(20161123562045)(20161123558120)(6072148)(201708071742011);
 SRVR:DB3PR0402MB3866; BCL:0; PCL:0; RULEID:; SRVR:DB3PR0402MB3866; 
x-forefront-prvs: 0590BBCCBC
received-spf: None (protection.outlook.com: nxp.com does not designate
 permitted sender hosts)
x-microsoft-antispam-message-info: 7EBJWTRo0DfN2Bj3pbLsDOcZTdSr9sgZ27jvx98kbxOlmbIeAs7GfYuvuRJ1mhNnackh2kX49jqd1xDcJ1sasEtx+xe0ey0DtV2klMX+9ZYC2E5eZrY/vnFZ20266YtXTrN7h61Nq7PWfdBMya9MVx2IC0WN4q4X4NzkrHgEOBs=
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: nxp.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 8583e994-8c1f-402c-100b-08d5796247cf
X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Feb 2018 19:35:35.9895 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR0402MB3866
Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Feb 2018 19:35:40 -0000

On 2/21/2018 9:35 AM, Trahe, Fiona wrote:=0A=
> Hi Ahmed, Shally,=0A=
>=0A=
>=0A=
>> -----Original Message-----=0A=
>> From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]=0A=
>> Sent: Tuesday, February 20, 2018 7:56 PM=0A=
>> To: Verma, Shally <Shally.Verma@cavium.com>; Trahe, Fiona <fiona.trahe@i=
ntel.com>; dev@dpdk.org=0A=
>> Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya@cavium.com>; Gupta,=
 Ashish=0A=
>> <Ashish.Gupta@cavium.com>; Sahu, Sunila <Sunila.Sahu@cavium.com>; De Lar=
a Guarch, Pablo=0A=
>> <pablo.de.lara.guarch@intel.com>; Challa, Mahipal <Mahipal.Challa@cavium=
.com>; Jain, Deepak K=0A=
>> <deepak.k.jain@intel.com>; Hemant Agrawal <hemant.agrawal@nxp.com>; Roy =
Pledge=0A=
>> <roy.pledge@nxp.com>; Youri Querry <youri.querry_1@nxp.com>=0A=
>> Subject: Re: [RFC v2] doc compression API for DPDK=0A=
>>=0A=
>> /// snip ///=0A=
>>>>>>>>>>>>>>>>> D.2.1 Stateful operation state maintenance=0A=
>>>>>>>>>>>>>>>>> ---------------------------------------------------------=
------=0A=
>>>>>>>>>>>>>>>>> It is always an ideal expectation from application that i=
t should parse=0A=
>>>>>>>>>>>>>>>> through all related chunk of source data making its mbuf-c=
hain and=0A=
>>>>>>>>>>>>>> enqueue=0A=
>>>>>>>>>>>>>>>> it for stateless processing.=0A=
>>>>>>>>>>>>>>>>> However, if it need to break it into several enqueue_burs=
t() calls, then=0A=
>>>>>>>>>>>>>> an=0A=
>>>>>>>>>>>>>>>> expected call flow would be something like:=0A=
>>>>>>>>>>>>>>>>> enqueue_burst( |op.no_flush |)=0A=
>>>>>>>>>>>>>>>> [Ahmed] The work is now in flight to the PMD.The user will=
 call dequeue=0A=
>>>>>>>>>>>>>>>> burst in a loop until all ops are received. Is this correc=
t?=0A=
>>>>>>>>>>>>>>>>=0A=
>>>>>>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next=
=0A=
>>>>>>>>>>>>>>> [Shally] Yes. Ideally every submitted op need to be dequeue=
d. However=0A=
>>>>>>>>>>>>>> this illustration is specifically in=0A=
>>>>>>>>>>>>>>> context of stateful op processing to reflect if a stream is=
 broken into=0A=
>>>>>>>>>>>>>> chunks, then each chunk should be=0A=
>>>>>>>>>>>>>>> submitted as one op at-a-time with type =3D STATEFUL and ne=
ed to be=0A=
>>>>>>>>>>>>>> dequeued first before next chunk is=0A=
>>>>>>>>>>>>>>> enqueued.=0A=
>>>>>>>>>>>>>>>=0A=
>>>>>>>>>>>>>>>>> enqueue_burst( |op.no_flush |)=0A=
>>>>>>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next=
=0A=
>>>>>>>>>>>>>>>>> enqueue_burst( |op.full_flush |)=0A=
>>>>>>>>>>>>>>>> [Ahmed] Why now allow multiple work items in flight? I und=
erstand that=0A=
>>>>>>>>>>>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we j=
ust=0A=
>>>>>>>>>>>>>> distinguish=0A=
>>>>>>>>>>>>>>>> the response in exception cases?=0A=
>>>>>>>>>>>>>>> [Shally] Multiples ops are allowed in flight, however condi=
tion is each op in=0A=
>>>>>>>>>>>>>> such case is independent of=0A=
>>>>>>>>>>>>>>> each other i.e. belong to different streams altogether.=0A=
>>>>>>>>>>>>>>> Earlier (as part of RFC v1 doc) we did consider the proposa=
l to process all=0A=
>>>>>>>>>>>>>> related chunks of data in single=0A=
>>>>>>>>>>>>>>> burst by passing them as ops array but later found that as =
not-so-useful for=0A=
>>>>>>>>>>>>>> PMD handling for various=0A=
>>>>>>>>>>>>>>> reasons. You may please refer to RFC v1 doc review comments=
 for same.=0A=
>>>>>>>>>>>>>> [Fiona] Agree with Shally. In summary, as only one op can be=
 processed at a=0A=
>>>>>>>>>>>>>> time, since each needs the=0A=
>>>>>>>>>>>>>> state of the previous, to allow more than 1 op to be in-flig=
ht at a time would=0A=
>>>>>>>>>>>>>> force PMDs to implement internal queueing and exception hand=
ling for=0A=
>>>>>>>>>>>>>> OUT_OF_SPACE conditions you mention.=0A=
>>>>>>>>>>>> [Ahmed] But we are putting the ops on qps which would make the=
m=0A=
>>>>>>>>>>>> sequential. Handling OUT_OF_SPACE conditions would be a little=
 bit more=0A=
>>>>>>>>>>>> complex but doable.=0A=
>>>>>>>>>>> [Fiona] In my opinion this is not doable, could be very ineffic=
ient.=0A=
>>>>>>>>>>> There may be many streams.=0A=
>>>>>>>>>>> The PMD would have to have an internal queue per stream so=0A=
>>>>>>>>>>> it could adjust the next src offset and length in the OUT_OF_SP=
ACE case.=0A=
>>>>>>>>>>> And this may ripple back though all subsequent ops in the strea=
m as each=0A=
>>>>>>>>>>> source len is increased and its dst buffer is not big enough.=
=0A=
>>>>>>>>>> [Ahmed] Regarding multi op OUT_OF_SPACE handling.=0A=
>>>>>>>>>> The caller would still need to adjust=0A=
>>>>>>>>>> the src length/output buffer as you say. The PMD cannot handle=
=0A=
>>>>>>>>>> OUT_OF_SPACE internally.=0A=
>>>>>>>>>> After OUT_OF_SPACE occurs, the PMD should reject all ops in this=
 stream=0A=
>>>>>>>>>> until it gets explicit=0A=
>>>>>>>>>> confirmation from the caller to continue working on this stream.=
 Any ops=0A=
>>>>>>>>>> received by=0A=
>>>>>>>>>> the PMD should be returned to the caller with status STREAM_PAUS=
ED since=0A=
>>>>>>>>>> the caller did not=0A=
>>>>>>>>>> explicitly acknowledge that it has solved the OUT_OF_SPACE issue=
.=0A=
>>>>>>>>>> These semantics can be enabled by adding a new function to the A=
PI=0A=
>>>>>>>>>> perhaps stream_resume().=0A=
>>>>>>>>>> This allows the caller to indicate that it acknowledges that it =
has seen=0A=
>>>>>>>>>> the issue and this op=0A=
>>>>>>>>>> should be used to resolve the issue. Implementations that do not=
 support=0A=
>>>>>>>>>> this mode of use=0A=
>>>>>>>>>> can push back immediately after one op is in flight. Implementat=
ions=0A=
>>>>>>>>>> that support this use=0A=
>>>>>>>>>> mode can allow many ops from the same session=0A=
>>>>>>>>>>=0A=
>>>>>>>>> [Shally] Is it still in context of having single burst where all =
op belongs to one stream? If yes, I=0A=
>> would=0A=
>>>>>> still=0A=
>>>>>>>>> say it would add an overhead to PMDs especially if it is expected=
 to work closer to HW (which I=0A=
>> think=0A=
>>>>>> is=0A=
>>>>>>>>> the case with DPDK PMD).=0A=
>>>>>>>>> Though your approach is doable but why this all cannot be in a la=
yer above PMD? i.e. a layer=0A=
>> above=0A=
>>>>>> PMD=0A=
>>>>>>>>> can either pass one-op at a time with burst size =3D 1 OR can mak=
e chained mbuf of input and=0A=
>> output=0A=
>>>>>> and=0A=
>>>>>>>>> pass than as one op.=0A=
>>>>>>>>> Is it just to ease applications of chained mbuf burden or do you =
see any performance /use-case=0A=
>>>>>>>>> impacting aspect also?=0A=
>>>>>>>>>=0A=
>>>>>>>>> if it is in context where each op belong to different stream in a=
 burst, then why do we need=0A=
>>>>>>>>> stream_pause and resume? It is a expectations from app to pass mo=
re output buffer with=0A=
>> consumed=0A=
>>>>>> + 1=0A=
>>>>>>>>> from next call onwards as it has already=0A=
>>>>>>>>> seen OUT_OF_SPACE.=0A=
>>>>>>> [Ahmed] Yes, this would add extra overhead to the PMD. Our PMD=0A=
>>>>>>> implementation rejects all ops that belong to a stream that has ent=
ered=0A=
>>>>>>> "RECOVERABLE" state for one reason or another. The caller must=0A=
>>>>>>> acknowledge explicitly that it has received news of the problem bef=
ore=0A=
>>>>>>> the PMD allows this stream to exit "RECOVERABLE" state. I agree wit=
h you=0A=
>>>>>>> that implementing this functionality in the software layer above th=
e PMD=0A=
>>>>>>> is a bad idea since the latency reductions are lost.=0A=
>>>>>> [Shally] Just reiterating, I rather meant other way around i.e. I se=
e it easier to put all such complexity=0A=
>> in a=0A=
>>>>>> layer above PMD.=0A=
>>>>>>=0A=
>>>>>>> This setup is useful in latency sensitive applications where the la=
tency=0A=
>>>>>>> of buffering multiple ops into one op is significant. We found late=
ncy=0A=
>>>>>>> makes a significant difference in search applications where the PMD=
=0A=
>>>>>>> competes with software decompression.=0A=
>>>>> [Fiona] I see, so when all goes well, you get best-case latency, but =
when=0A=
>>>>> out-of-space occurs latency will probably be worse.=0A=
>>>> [Ahmed] This is exactly right. This use mode assumes out-of-space is a=
=0A=
>>>> rare occurrence. Recovering from it should take similar time to=0A=
>>>> synchronous implementations. The caller gets OUT_OF_SPACE_RECOVERABLE =
in=0A=
>>>> both sync and async use. The caller can fix up the op and send it back=
=0A=
>>>> to the PMD to continue work just as would be done in sync. Nonetheless=
,=0A=
>>>> the added complexity is not justifiable if out-of-space is very common=
=0A=
>>>> since the recoverable state will be the limiting factor that forces=0A=
>>>> synchronicity.=0A=
>>>>>>>> [Fiona] I still have concerns with this and would not want to supp=
ort in our PMD.=0A=
>>>>>>>> TO make sure I understand, you want to send a burst of ops, with s=
everal from same stream.=0A=
>>>>>>>> If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not pr=
ocess any=0A=
>>>>>>>> subsequent ops in that stream.=0A=
>>>>>>>> Should it return them in a dequeue_burst() with status still NOT_P=
ROCESSED?=0A=
>>>>>>>> Or somehow drop them? How?=0A=
>>>>>>>> While still processing ops form other streams.=0A=
>>>>>>> [Ahmed] This is exactly correct. It should return them with=0A=
>>>>>>> NOT_PROCESSED. Yes, the PMD should continue processing other stream=
s.=0A=
>>>>>>>> As we want to offload each op to hardware with as little CPU proce=
ssing as possible we=0A=
>>>>>>>> would not want to open up each op to see which stream it's attache=
d to and=0A=
>>>>>>>> make decisions to do per-stream storage, or drop it, or bypass hw =
and dequeue without=0A=
>> processing.=0A=
>>>>>>> [Ahmed] I think I might have missed your point here, but I will try=
 to=0A=
>>>>>>> answer. There is no need to "cushion" ops in DPDK. DPDK should send=
 ops=0A=
>>>>>>> to the PMD and the PMD should reject until stream_continue() is cal=
led.=0A=
>>>>>>> The next op to be sent by the user will have a special marker in it=
 to=0A=
>>>>>>> inform the PMD to continue working on this stream. Alternatively th=
e=0A=
>>>>>>> DPDK layer can be made "smarter" to fail during the enqueue by chec=
king=0A=
>>>>>>> the stream and its state, but like you say this adds additional CPU=
=0A=
>>>>>>> overhead during the enqueue.=0A=
>>>>>>> I am curious. In a simple synchronous use case. How do we prevent u=
sers=0A=
>>>>>> >from putting multiple ops in flight that belong to a single stream?=
 Do=0A=
>>>>>>> we just currently say it is undefined behavior? Otherwise we would =
have=0A=
>>>>>>> to check the stream and incur the CPU overhead.=0A=
>>>>> [Fiona] We don't do anything to prevent it. It's undefined. IMO on da=
ta path in=0A=
>>>>> DPDK model we expect good behaviour and don't have to error check for=
 things like this.=0A=
>>>> [Ahmed] This makes sense. We also assume good behavior.=0A=
>>>>> In our PMD if we got a burst of 20 ops, we allocate 20 spaces on the =
hw q, then=0A=
>>>>> build and send those messages. If we found an op from a stream which =
already=0A=
>>>>> had one inflight, we'd have to hold that back, store in a sw stream-s=
pecific holding queue,=0A=
>>>>> only send 19 to hw. We cannot send multiple ops from same stream to=
=0A=
>>>>> the hw as it fans them out and does them in parallel.=0A=
>>>>> Once the enqueue_burst() returns, there is no processing=0A=
>>>>> context which would spot that the first has completed=0A=
>>>>> and send the next op to the hw. On a dequeue_burst() we would spot th=
is,=0A=
>>>>> in that context could process the next op in the stream.=0A=
>>>>> On out of space, instead of processing the next op we would have to t=
ransfer=0A=
>>>>> all unprocessed ops from the stream to the dequeue result.=0A=
>>>>> Some parts of this are doable, but seems likely to add a lot more lat=
ency,=0A=
>>>>> we'd need to add extra threads and timers to move ops from the sw=0A=
>>>>> queue to the hw q to get any benefit, and these constructs would add=
=0A=
>>>>> context switching and CPU cycles. So we prefer to push this responsib=
ility=0A=
>>>>> to above the API and it can achieve similar.=0A=
>>>> [Ahmed] I see what you mean. Our workflow is almost exactly the same=
=0A=
>>>> with our hardware, but the fanning out is done by the hardware based o=
n=0A=
>>>> the stream and ops that belong to the same stream are never allowed to=
=0A=
>>>> go out of order. Otherwise the data would be corrupted. Likewise the=
=0A=
>>>> hardware is responsible for checking the state of the stream and=0A=
>>>> returning frames as NOT_PROCESSED to the software=0A=
>>>>>>>> Maybe we could add a capability if this behaviour is important for=
 you?=0A=
>>>>>>>> e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ?=0A=
>>>>>>>> Our PMD would set this to 0. And expect no more than one op from a=
 stateful stream=0A=
>>>>>>>> to be in flight at any time.=0A=
>>>>>>> [Ahmed] That makes sense. This way the different DPDK implementatio=
ns do=0A=
>>>>>>> not have to add extra checking for unsupported cases.=0A=
>>>>>> [Shally] @ahmed, If I summarise your use-case, this is how to want t=
o PMD to support?=0A=
>>>>>> - a burst *carry only one stream* and all ops then assumed to be bel=
ong to that stream? (please=0A=
>> note,=0A=
>>>>>> here burst is not carrying more than one stream)=0A=
>>>> [Ahmed] No. In this use case the caller sets up an op and enqueues a=
=0A=
>>>> single op. Then before the response comes back from the PMD the caller=
=0A=
>>>> enqueues a second op on the same stream.=0A=
>>>>>> -PMD will submit one op at a time to HW?=0A=
>>>> [Ahmed] I misunderstood what PMD means. I used it throughout to mean t=
he=0A=
>>>> HW. I used DPDK to mean the software implementation that talks to the=
=0A=
>>>> hardware.=0A=
>>>> The software will submit all ops immediately. The hardware has to figu=
re=0A=
>>>> out what to do with the ops depending on what stream they belong to.=
=0A=
>>>>>> -if processed successfully, push it back to completion queue with st=
atus =3D SUCCESS. If failed or run to=0A=
>>>>>> into OUT_OF_SPACE, then push it to completion queue with status =3D =
FAILURE/=0A=
>>>>>> OUT_OF_SPACE_RECOVERABLE and rest with status =3D NOT_PROCESSED and =
return with enqueue=0A=
>> count=0A=
>>>>>> =3D total # of ops submitted originally with burst?=0A=
>>>> [Ahmed] This is exactly what I had in mind. all ops will be submitted =
to=0A=
>>>> the HW. The HW will put all of them on the completion queue with the=
=0A=
>>>> correct status exactly as you say.=0A=
>>>>>> -app assumes all have been enqueued, so it go and dequeue all ops=0A=
>>>>>> -on seeing an op with OUT_OF_SPACE_RECOVERABLE, app resubmit a burst=
 of ops with call to=0A=
>>>>>> stream_continue/resume API starting from op which encountered OUT_OF=
_SPACE and others as=0A=
>>>>>> NOT_PROCESSED with updated input and output buffer?=0A=
>>>> [Ahmed] Correct this is what we do today in our proprietary API.=0A=
>>>>>> -repeat until *all* are dequeued with status =3D SUCCESS or *any* wi=
th status =3D FAILURE? If anytime=0A=
>>>>>> failure is seen, then app start whole processing all over again or j=
ust drop this burst?!=0A=
>>>> [Ahmed] The app has the choice on how to proceed. If the issue is=0A=
>>>> recoverable then the application can continue this stream from where i=
t=0A=
>>>> stopped. if the failure is unrecoverable then the application should=
=0A=
>>>> first fix the problem and start from the beginning of the stream.=0A=
>>>>>> If all of above is true, then I think we should add another API such=
 as=0A=
>> rte_comp_enque_single_stream()=0A=
>>>>>> which will be functional under Feature Flag =3D ALLOW_ENQUEUE_MULTIP=
LE_STATEFUL_OPS or better=0A=
>>>>>> name is SUPPORT_ENQUEUE_SINGLE_STREAM?!=0A=
>>>> [Ahmed] The main advantage in async use is lost if we force all relate=
d=0A=
>>>> ops to be in the same burst. if we do that, then we might as well merg=
e=0A=
>>>> all the ops into one op. That would reduce the overhead.=0A=
>>>> The use mode I am proposing is only useful in cases where the data=0A=
>>>> becomes available after the first enqueue occurred. I want to allow th=
e=0A=
>>>> caller to enqueue the second set of data as soon as it is available=0A=
>>>> regardless of whether or not the HW has already started working on the=
=0A=
>>>> first op inflight.=0A=
>>> [Shally] @ahmed,  Ok.. seems I missed a point here. So, confirm me foll=
owing:=0A=
>>>=0A=
>>> As per current description in doc, expected stateful usage is:=0A=
>>> enqueue (op1) --> dequeue(op1) --> enqueue(op2)=0A=
>>>=0A=
>>> but you're suggesting to allow an option to change it to=0A=
>>>=0A=
>>> enqueue(op1) -->enqueue(op2)=0A=
>>>=0A=
>>> i.e.  multiple ops from same stream can be put in-flight via subsequent=
 enqueue_burst() calls without=0A=
>> waiting to dequeue previous ones as PMD support it . So, no change to cu=
rrent definition of a burst. It will=0A=
>> still carry multiple streams where each op belonging to different stream=
 ?!=0A=
>> [Ahmed] Correct. I guess a user could put two ops on the same burst that=
=0A=
>> belong to the same stream. In that case it would be more efficient to=0A=
>> merge the ops using scatter gather. Nonetheless, I would not add checks=
=0A=
>> in my implementation to limit that use. The hardware does not perceive a=
=0A=
>> difference between ops that came on one burst and ops that came on two=
=0A=
>> different bursts. to the hardware they are all ops. What matters is=0A=
>> which stream each op belongs to.=0A=
>>> if yes, then seems your HW can be setup for multiple streams so it is e=
fficient for your case to support it=0A=
>> in DPDK PMD layer but our hw doesn't by-default and need SW to back it. =
Given that, I also suggest to=0A=
>> enable it under some feature flag.=0A=
>>> However it looks like an add-on and if it doesn't change current defini=
tion of a burst and minimum=0A=
>> expectation set on stateful processing described in this document, then =
IMO, you can propose this feature=0A=
>> as an incremental patch on baseline version, in absence of which,=0A=
>>> application will exercise stateful processing as described here (enq->d=
eq->enq). Thoughts?=0A=
>> [Ahmed] Makes sense. I was worried that there might be fundamental=0A=
>> limitations to this mode of use in the API design. That is why I wanted=
=0A=
>> to share this use mode with you guys and see if it can be accommodated=
=0A=
>> using an incremental patch in the future.=0A=
>>>>> [Fiona] Am curious about Ahmed's response to this. I didn't get that =
a burst should carry only one=0A=
>> stream=0A=
>>>>> Or get how this makes a difference? As there can be many enqueue_burs=
t() calls done before an=0A=
>> dequeue_burst()=0A=
>>>>> Maybe you're thinking the enqueue_burst() would be a blocking call th=
at would not return until all the=0A=
>> ops=0A=
>>>>> had been processed? This would turn it into a synchronous call which =
isn't the intent.=0A=
>>>> [Ahmed] Agreed, a blocking or even a buffering software layer that bab=
y=0A=
>>>> sits the hardware does not fundamentally change the parameters of the=
=0A=
>>>> system as a whole. It just moves workflow management complexity down=
=0A=
>>>> into the DPDK software layer. Rather there are real latency and=0A=
>>>> throughput advantages (because of caching) that I want to expose.=0A=
>>>>=0A=
> [Fiona] ok, so I think we've agreed that this can be an option, as long a=
s not required of=0A=
> PMDs and enabled under an explicit capability - named something like=0A=
> ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS=0A=
> @Ahmed, we'll leave it up to you to define details.=0A=
> What's necessary is API text to describe the expected behaviour on any er=
ror conditions,=0A=
> the pause/resume API, whether an API is expected to clean up if resume do=
esn't happen=0A=
> and if there's any time limit on this, etc=0A=
> But I wouldn't expect any changes to existing burst APIs, and all PMDs an=
d applications=0A=
> must be able to handle the default behaviour, i.e. with this capability d=
isabled.=0A=
> Specifically even if a PMD has this capability, if an application ignores=
 it and only sends=0A=
> one op at a time, if a PMD returns OUT_OF_SPACE_RECOVERABLE the stream sh=
ould=0A=
> not be in a paused state and the PMD should not wait for a resume() to ha=
ndle the =0A=
> next op sent for that stream.=0A=
> Does that make sense?=0A=
[Ahmed] That make sense. When this mode is enabled then additional=0A=
functions must be called to resume the work, even if only one op was in=0A=
flight. When this mode is not enabled then the PMD assumes that the=0A=
caller will never enqueue a stateful op before receiving a response to=0A=
the one that precedes it in a stream=0A=
>=0A=
>>>> /// snip ///=0A=
>>>=0A=
>=0A=
=0A=