From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM01-BN3-obe.outbound.protection.outlook.com (mail-bn3nam01on0089.outbound.protection.outlook.com [104.47.33.89]) by dpdk.org (Postfix) with ESMTP id 8B4E11B3BF for ; Fri, 16 Feb 2018 08:17:03 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=1hIwTnT5OJQjHNT1eTgBc9UcJA1HSQPhn4Jx62PWRD4=; b=epo30Is0FA2GXdv+4Hl5a4LSMqCEX7+faCHBbQsbREzegx9xdezIqFKuNbQO20hcrfjzD9fSGvbMPK22LfB+q0i/vSfxu7BdHGqGKz4MuR5SQLPMlvN2trFygzYwZTl8kKTpocrIQuljAunAs3+a9Zuoxg6pXRqEBh8POPYVblc= Received: from MWHPR0701MB3641.namprd07.prod.outlook.com (10.167.236.162) by MWHPR0701MB3739.namprd07.prod.outlook.com (10.167.237.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.506.18; Fri, 16 Feb 2018 07:17:00 +0000 Received: from MWHPR0701MB3641.namprd07.prod.outlook.com ([fe80::3992:467f:ba7c:dc07]) by MWHPR0701MB3641.namprd07.prod.outlook.com ([fe80::3992:467f:ba7c:dc07%13]) with mapi id 15.20.0506.019; Fri, 16 Feb 2018 07:16:58 +0000 From: "Verma, Shally" To: Ahmed Mansour , "Trahe, Fiona" , "dev@dpdk.org" CC: "Athreya, Narayana Prasad" , "Gupta, Ashish" , "Sahu, Sunila" , "De Lara Guarch, Pablo" , "Challa, Mahipal" , "Jain, Deepak K" , Hemant Agrawal , Roy Pledge , Youri Querry Thread-Topic: [RFC v2] doc compression API for DPDK Thread-Index: AdOFUW8Wdt99b3u6RKydGSrxJwvtHghlU6IQ Date: Fri, 16 Feb 2018 07:16:58 +0000 Message-ID: References: <348A99DA5F5B7549AA880327E580B435892F589D@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B4358931F82B@IRSMSX101.ger.corp.intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Shally.Verma@cavium.com; x-originating-ip: [117.98.159.251] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; MWHPR0701MB3739; 6:fK2LVb2jhNCtTiq3YG2jkH01bTUeTde92qDxyyULVbDggmjzSmDvQvAYh9MflEx5Jjwpz9zsb2IPYuZkuQFh9L447zd7ft9oE+bz4IMu5iZAdS+cO8Ric9eLUG+45fhHSxjqn6q72cU1hEhjuwJwGjC0MK0a9+aBpUP6NILqwUM/ZPX3EBT0sFU0Jq6Xsjy/f4Caw/5uECrpAT7IMVw6qO+VlCKyWmblMP3LrREw2RX249IgNH5G2fj2JIf2i7HOychY4Toz9XRp8sLowkvCmxzi7zDTbsjH/My1wPIhdHx5NOo/9547ASacqgcaun1Jv+kAHE5t7ywI1Lgsw2QUq7mtDIHwfTCXmj8bxhUeZOyfpeOc13ATjUJE6Nlmp+CU; 5:lMB7CHTMpEkXKX5TqSv820dHe3oBbaBcqa9QIY6JWrnTJoW+BTkuHLX2XQ9X7Ccedh3x639TGliIBkiEkRCu+bFAJ4lbCCNbtXSb3uhmtZ8QVdzfyv62ggH69J73GFl42ur+uPAKcXyYnAMcRlWRze4fSbBYBCXvUXw42d4L+as=; 24:z4I18pkbKwtsvtwCatR8CSHTi1dXJ8HPnBAzwGhCCbjCSPOGJu+BXtc4mcVu0fS1Vq4c1nQkcdUW9pAeF/3jcl1V7KuLOoOhhxXHmhXqbBU=; 7:X+qu1wf1WzdntstlIrFC8qDN7BWjn8Z9rvypgZcSHfEE+cH85w+JdEi+3P5tsflm/KUSQHH5ddHWllb89qAXNP/2XltexNxL5twXb79ZE2dNy6N8jTWSX2YR2vh/QnaGuPodeYyu1g/TSsLivUU6+s6XDqobhratG3QFrVKAIMAQEwVXygv+1+sWp9p6g/ixJFUUTAisx9ylcB2qQX5QoN+RCAPVzVpa/fyMdETu7QyNrVn6+HwsT3GelzJnS1tK x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(376002)(39850400004)(346002)(396003)(366004)(39380400002)(13464003)(57704003)(51444003)(189003)(199004)(8656006)(7736002)(5890100001)(99286004)(305945005)(9686003)(59450400001)(81156014)(3280700002)(26005)(72206003)(97736004)(74316002)(6506007)(53546011)(105586002)(8676002)(8936002)(81166006)(4326008)(6436002)(53946003)(14454004)(102836004)(55016002)(86362001)(478600001)(6246003)(2906002)(68736007)(53936002)(54906003)(110136005)(66066001)(33656002)(561944003)(76176011)(5250100002)(106356001)(186003)(2900100001)(316002)(229853002)(2950100002)(3846002)(2501003)(5660300001)(25786009)(93886005)(6116002)(3660700001)(7696005)(559001)(579004); DIR:OUT; SFP:1101; SCL:1; SRVR:MWHPR0701MB3739; H:MWHPR0701MB3641.namprd07.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; x-ms-office365-filtering-correlation-id: ca3a51dc-c8e4-440c-ae74-08d5750d443e x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(4534165)(4627221)(201703031133081)(201702281549075)(5600026)(4604075)(3008032)(2017052603307)(7153060)(7193020); SRVR:MWHPR0701MB3739; x-ms-traffictypediagnostic: MWHPR0701MB3739: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(60795455431006)(244540007438412)(20558992708506)(278428928389397)(185117386973197)(211171220733660)(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040501)(2401047)(8121501046)(5005006)(93006095)(93001095)(10201501046)(3002001)(3231101)(944501161)(6041288)(20161123562045)(20161123560045)(20161123558120)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:MWHPR0701MB3739; BCL:0; PCL:0; RULEID:; SRVR:MWHPR0701MB3739; x-forefront-prvs: 0585417D7B received-spf: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: Qgt7bvP1EI9Wu9tpjGzcBPKmy7SIxqFpQ7D7gVTNPxf0SIUdg6/NeJeXA94OtqIi4WRDWHYM4B6utjB5mlZwEQ== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-Network-Message-Id: ca3a51dc-c8e4-440c-ae74-08d5750d443e X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Feb 2018 07:16:58.1526 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR0701MB3739 Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Feb 2018 07:17:04 -0000 Hi Fiona, Ahmed >-----Original Message----- >From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] >Sent: 16 February 2018 02:40 >To: Trahe, Fiona ; Verma, Shally ; dev@dpdk.org >Cc: Athreya, Narayana Prasad ; Gupta, A= shish ; Sahu, Sunila >; De Lara Guarch, Pablo ; Challa, Mahipal >; Jain, Deepak K ; Hem= ant Agrawal ; Roy >Pledge ; Youri Querry >Subject: Re: [RFC v2] doc compression API for DPDK > >On 2/15/2018 1:47 PM, Trahe, Fiona wrote: >> Hi Shally, Ahmed, >> Sorry for the delay in replying, >> Comments below >> >>> -----Original Message----- >>> From: Verma, Shally [mailto:Shally.Verma@cavium.com] >>> Sent: Wednesday, February 14, 2018 7:41 AM >>> To: Ahmed Mansour ; Trahe, Fiona ; >>> dev@dpdk.org >>> Cc: Athreya, Narayana Prasad ; Gupta= , Ashish >>> ; Sahu, Sunila ; De La= ra Guarch, Pablo >>> ; Challa, Mahipal ; Jain, Deepak K >>> ; Hemant Agrawal ; Roy= Pledge >>> ; Youri Querry >>> Subject: RE: [RFC v2] doc compression API for DPDK >>> >>> Hi Ahmed, >>> >>>> -----Original Message----- >>>> From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] >>>> Sent: 02 February 2018 01:53 >>>> To: Trahe, Fiona ; Verma, Shally ; dev@dpdk.org >>>> Cc: Athreya, Narayana Prasad ; Gupt= a, Ashish >>> ; Sahu, Sunila >>>> ; De Lara Guarch, Pablo ; Challa, >>> Mahipal >>>> ; Jain, Deepak K ;= Hemant Agrawal >>> ; Roy >>>> Pledge ; Youri Querry >>>> Subject: Re: [RFC v2] doc compression API for DPDK >>>> >>>> On 1/31/2018 2:03 PM, Trahe, Fiona wrote: >>>>> Hi Ahmed, Shally, >>>>> >>>>> ///snip/// >>>>>>>>>>> D.1.1 Stateless and OUT_OF_SPACE >>>>>>>>>>> ------------------------------------------------ >>>>>>>>>>> OUT_OF_SPACE is a condition when output buffer runs out of spac= e >>>>>>>> and >>>>>>>>>> where PMD still has more data to produce. If PMD run into such >>>>>>>> condition, >>>>>>>>>> then it's an error condition in stateless processing. >>>>>>>>>>> In such case, PMD resets itself and return with status >>>>>>>>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=3Dconsumed=3D0 >>>>>>>> i.e. >>>>>>>>>> no input read, no output written. >>>>>>>>>>> Application can resubmit an full input with larger output buffe= r size. >>>>>>>>>> [Ahmed] Can we add an option to allow the user to read the data = that >>>>>>>> was >>>>>>>>>> produced while still reporting OUT_OF_SPACE? this is mainly usef= ul for >>>>>>>>>> decompression applications doing search. >>>>>>>>> [Shally] It is there but applicable for stateful operation type (= please refer to >>>>>>>> handling out_of_space under >>>>>>>>> "Stateful Section"). >>>>>>>>> By definition, "stateless" here means that application (such as I= PCOMP) >>>>>>>> knows maximum output size >>>>>>>>> guaranteedly and ensure that uncompressed data size cannot grow m= ore >>>>>>>> than provided output buffer. >>>>>>>>> Such apps can submit an op with type =3D STATELESS and provide fu= ll input, >>>>>>>> then PMD assume it has >>>>>>>>> sufficient input and output and thus doesn't need to maintain any= contexts >>>>>>>> after op is processed. >>>>>>>>> If application doesn't know about max output size, then it should= process it >>>>>>>> as stateful op i.e. setup op >>>>>>>>> with type =3D STATEFUL and attach a stream so that PMD can mainta= in >>>>>>>> relevant context to handle such >>>>>>>>> condition. >>>>>>>> [Fiona] There may be an alternative that's useful for Ahmed, while= still >>>>>>>> respecting the stateless concept. >>>>>>>> In Stateless case where a PMD reports OUT_OF_SPACE in decompressio= n >>>>>>>> case >>>>>>>> it could also return consumed=3D0, produced =3D x, where x>0. X in= dicates the >>>>>>>> amount of valid data which has >>>>>>>> been written to the output buffer. It is not complete, but if an = application >>>>>>>> wants to search it it may be sufficient. >>>>>>>> If the application still wants the data it must resubmit the whole= input with a >>>>>>>> bigger output buffer, and >>>>>>>> decompression will be repeated from the start, it >>>>>>>> cannot expect to continue on as the PMD has not maintained state,= history >>>>>>>> or data. >>>>>>>> I don't think there would be any need to indicate this in capabili= ties, PMDs >>>>>>>> which cannot provide this >>>>>>>> functionality would always return produced=3Dconsumed=3D0, while P= MDs which >>>>>>>> can could set produced > 0. >>>>>>>> If this works for you both, we could consider a similar case for c= ompression. >>>>>>>> >>>>>>> [Shally] Sounds Fine to me. Though then in that case, consume shoul= d also be updated to actual >>>>>> consumed by PMD. >>>>>>> Setting consumed =3D 0 with produced > 0 doesn't correlate. >>>>>> [Ahmed]I like Fiona's suggestion, but I also do not like the implica= tion >>>>>> of returning consumed =3D 0. At the same time returning consumed =3D= y >>>>>> implies to the user that it can proceed from the middle. I prefer th= e >>>>>> consumed =3D 0 implementation, but I think a different return is nee= ded to >>>>>> distinguish it from OUT_OF_SPACE that the use can recover from. Perh= aps >>>>>> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allo= ws >>>>>> future PMD implementations to provide recover-ability even in STATEL= ESS >>>>>> mode if they so wish. In this model STATELESS or STATEFUL would be a >>>>>> hint for the PMD implementation to make optimizations for each case,= but >>>>>> it does not force the PMD implementation to limit functionality if i= t >>>>>> can provide recover-ability. >>>>> [Fiona] So you're suggesting the following: >>>>> OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.= produced >>>>> can be used and next op in stream should continue on from op.cons= umed+1. >>>>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation. >>>>> Error condition, no recovery possible. >>>>> consumed=3Dproduced=3D0. Application must resubmit all input data= with >>>>> a bigger output buffer. >>>>> OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some= recovery possible. >>>>> - consumed =3D 0, produced > 0. Application must resubmit all in= put data with >>>>> a bigger output buffer. However in decompression case, data u= p to produced >>>>> in dst buffer may be inspected/searched. Never happens in com= pression >>>>> case as output data would be meaningless. >>>>> - consumed > 0, produced > 0. PMD has stored relevant state and = history and so >>>>> can convert to stateful, using op.produced and continuing fro= m consumed+1. >>>>> I don't expect our PMDs to use this last case, but maybe this works f= or others? >>>>> I'm not convinced it's not just adding complexity. It sounds like a v= ersion of stateful >>>>> without a stream, and maybe less efficient? >>>>> If so should it respect the FLUSH flag? Which would have been FULL or= FINAL in the op. >>>>> Or treat it as FLUSH_NONE or SYNC? I don't know why an application wo= uld not >>>>> simply have submitted a STATEFUL request if this is the behaviour it = wants? >>>> [Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entirely >>>> and replacing it with >>>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation. >>>> Error condition, no recovery possible. >>>> - consumed=3D0 produced=3Damount of data produced. Application = must >>>> resubmit all input data with >>>> a bigger output buffer to process all of the op >>>> OUT_OF_SPACE_RECOVERABLE - Normally returned on stateful operation. N= ot >>>> an error. Op.produced >>>> can be used and next op in stream should continue on from op.consum= ed+1. >>>> - consumed > 0, produced > 0. PMD has stored relevant state an= d >>>> history and so >>>> can continue using op.produced and continuing from consumed= +1. >>>> >>>> We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in our >>>> implementation either. >>>> >>>> Regardless of speculative future PMDs. The more important aspect of th= is >>>> for today is that the return status clearly determines >>>> the meaning of "consumed". If it is RECOVERABLE then consumed is >>>> meaningful. if it is TERMINATED then consumed in meaningless. >>>> This way we take away the ambiguity of having OUT_OF_SPACE mean two >>>> different user work flows. >>>> >>>> A speculative future PMD may be designed to return RECOVERABLE for >>>> stateless ops that are attached to streams. >>>> A future PMD may look to see if an op has a stream is attached and wri= te >>>> out the state there and go into recoverable mode. >>>> in essence this leaves the choice up to the implementation and allows >>>> the PMD to take advantage of stateless optimizations >>>> so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump >>>> context as soon as it fully processes an op. It will only >>>> write context out in cases where the op chokes. >>>> This futuristic PMD should ignore the FLUSH since this STATELESS mode = as >>>> indicated by the user and optimize >>> [Shally] IMO, it looks okay to have two separate return code TERMINATED= and RECOVERABLE with >>> definition as you mentioned and seem doable. >>> So then it mean all following conditions: >>> a. stateless with flush =3D full/final, no stream pointer provided , PM= D can return TERMINATED i.e. user >>> has to start all over again, it's a failure (as in current definition) >>> b. stateless with flush =3D full/final, stream pointer provided, here i= t's up to PMD to return either >>> TERMINATED or RECOVERABLE depending upon its ability (note if Recoverab= le, then PMD will maintain >>> states in stream pointer) >>> c. stateful with flush =3D full / NO_SYNC, stream pointer always there,= PMD will >>> TERMINATED/RECOVERABLE depending on STATEFUL_COMPRESSION/DECOMPRESSION = feature flag >>> enabled or not >> [Fiona] I don't think the flush flag is relevant - it could be out of sp= ace on any flush flag, and if out of space >> should ignore the flush flag. >> Is there a need for TERMINATED? - I didn't think it would ever need to b= e returned in stateful case. >> Why the ref to feature flag? If a PMD doesn't support a feature I think= it should fail the op - not with >> out-of space, but unsupported or similar. Or it would fail on stream cr= eation. >[Ahmed] Agreed with Fiona. The flush flag only matters on success. By >definition the PMD should return OUT_OF_SPACE_RECOVERABLE in stateful >mode when it runs out of space. >@Shally If the user did not provide a stream, then the PMD should >probably return TERMINATED every time. I am not sure we should make a >"really smart" PMD which returns RECOVERABLE even if no stream pointer >was given. In that case the PMD must give some ID back to the caller >that the caller can use to "recover" the op. I am not sure how it would >be implemented in the PMD and when does the PMD decide to retire streams >belonging to dead ops that the caller decided not to "recover". >> >>> and one more exception case is: >>> d. stateless with flush =3D full, no stream pointer provided, PMD can r= eturn RECOVERABLE i.e. PMD >>> internally maintained that state somehow and consumed & produced > 0, s= o user can start consumed+1 >>> but there's restriction on user not to alter or change op until it is f= ully processed?! >> [Fiona] Why the need for this case? >> There's always a restriction on user not to alter or change op until it = is fully processed. >> If a PMD can do this - why doesn't it create a stream when that API is c= alled - and then it's same as b? >[Ahmed] Agreed. The user should not touch an op once enqueued until they >receive it in dequeue. We ignore the flush in stateless mode. We assume >it to be final every time. [Shally] Agreed and am not in favour of supporting such implementation eith= er. Just listed out different possibilities up here to better visualise Ahm= ed requirements/applicability of TERMINATED and RECOVERABLE. >> >>> API currently takes care of case a and c, and case b can be supported i= f specification accept another >>> proposal which mention optional usage of stream with stateless. >> [Fiona] API has this, but as we agreed, not optional to call the create_= stream() with an op_type >> parameter (stateful/stateless). PMD can return NULL or provide a stream,= if the latter then that >> stream must be attached to ops. >> >> Until then API takes no difference to >>> case b and c i.e. we can have op such as, >>> - type=3D stateful with flush =3D full/final, stream pointer provided, = PMD can return >>> TERMINATED/RECOVERABLE according to its ability >>> >>> Case d , is something exceptional, if there's requirement in PMDs to su= pport it, then believe it will be >>> doable with concept of different return code. >>> >> [Fiona] That's not quite how I understood it. Can it be simpler and only= following cases? >> a. stateless with flush =3D full/final, no stream pointer provided , PMD= can return TERMINATED i.e. user >> has to start all over again, it's a failure (as in current definitio= n). >> consumed =3D 0, produced=3Damount of data produced. This is usually = 0, but in decompression >> case a PMD may return > 0 and application may find it useful to insp= ect that data. >> b. stateless with flush =3D full/final, stream pointer provided, here it= 's up to PMD to return either >> TERMINATED or RECOVERABLE depending upon its ability (note if Recove= rable, then PMD will maintain >> states in stream pointer) >> c. stateful with flush =3D any, stream pointer always there, PMD will re= turn RECOVERABLE. >> op.produced can be used and next op in stream should continue on fro= m op.consumed+1. >> Consumed=3D0, produced=3D0 is an unusual but allowed case. I'm not s= ure if it could ever happen, but >> no need to change state to TERMINATED in this case. There may be use= ful state/history >> stored in the PMD, even though no output produced yet. >[Ahmed] Agreed [Shally] Sounds good. >> >>>>>>>>>>> D.2 Compression API Stateful operation >>>>>>>>>>> ---------------------------------------------------------- >>>>>>>>>>> A Stateful operation in DPDK compression means application inv= okes >>>>>>>>>> enqueue burst() multiple times to process related chunk of data = either >>>>>>>>>> because >>>>>>>>>>> - Application broke data into several ops, and/or >>>>>>>>>>> - PMD ran into out_of_space situation during input processing >>>>>>>>>>> >>>>>>>>>>> In case of either one or all of the above conditions, PMD is re= quired to >>>>>>>>>> maintain state of op across enque_burst() calls and >>>>>>>>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with >>>>>>>>>> flush value =3D RTE_COMP_NO/SYNC_FLUSH and end at flush value >>>>>>>>>> RTE_COMP_FULL/FINAL_FLUSH. >>>>>>>>>>> D.2.1 Stateful operation state maintenance >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> It is always an ideal expectation from application that it shou= ld parse >>>>>>>>>> through all related chunk of source data making its mbuf-chain a= nd >>>>>>>> enqueue >>>>>>>>>> it for stateless processing. >>>>>>>>>>> However, if it need to break it into several enqueue_burst() ca= lls, then >>>>>>>> an >>>>>>>>>> expected call flow would be something like: >>>>>>>>>>> enqueue_burst( |op.no_flush |) >>>>>>>>>> [Ahmed] The work is now in flight to the PMD.The user will call = dequeue >>>>>>>>>> burst in a loop until all ops are received. Is this correct? >>>>>>>>>> >>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next >>>>>>>>> [Shally] Yes. Ideally every submitted op need to be dequeued. How= ever >>>>>>>> this illustration is specifically in >>>>>>>>> context of stateful op processing to reflect if a stream is broke= n into >>>>>>>> chunks, then each chunk should be >>>>>>>>> submitted as one op at-a-time with type =3D STATEFUL and need to = be >>>>>>>> dequeued first before next chunk is >>>>>>>>> enqueued. >>>>>>>>> >>>>>>>>>>> enqueue_burst( |op.no_flush |) >>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next >>>>>>>>>>> enqueue_burst( |op.full_flush |) >>>>>>>>>> [Ahmed] Why now allow multiple work items in flight? I understan= d that >>>>>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we just >>>>>>>> distinguish >>>>>>>>>> the response in exception cases? >>>>>>>>> [Shally] Multiples ops are allowed in flight, however condition i= s each op in >>>>>>>> such case is independent of >>>>>>>>> each other i.e. belong to different streams altogether. >>>>>>>>> Earlier (as part of RFC v1 doc) we did consider the proposal to p= rocess all >>>>>>>> related chunks of data in single >>>>>>>>> burst by passing them as ops array but later found that as not-so= -useful for >>>>>>>> PMD handling for various >>>>>>>>> reasons. You may please refer to RFC v1 doc review comments for s= ame. >>>>>>>> [Fiona] Agree with Shally. In summary, as only one op can be proce= ssed at a >>>>>>>> time, since each needs the >>>>>>>> state of the previous, to allow more than 1 op to be in-flight at = a time would >>>>>>>> force PMDs to implement internal queueing and exception handling f= or >>>>>>>> OUT_OF_SPACE conditions you mention. >>>>>> [Ahmed] But we are putting the ops on qps which would make them >>>>>> sequential. Handling OUT_OF_SPACE conditions would be a little bit m= ore >>>>>> complex but doable. >>>>> [Fiona] In my opinion this is not doable, could be very inefficient. >>>>> There may be many streams. >>>>> The PMD would have to have an internal queue per stream so >>>>> it could adjust the next src offset and length in the OUT_OF_SPACE ca= se. >>>>> And this may ripple back though all subsequent ops in the stream as e= ach >>>>> source len is increased and its dst buffer is not big enough. >>>> [Ahmed] Regarding multi op OUT_OF_SPACE handling. >>>> The caller would still need to adjust >>>> the src length/output buffer as you say. The PMD cannot handle >>>> OUT_OF_SPACE internally. >>>> After OUT_OF_SPACE occurs, the PMD should reject all ops in this strea= m >>>> until it gets explicit >>>> confirmation from the caller to continue working on this stream. Any o= ps >>>> received by >>>> the PMD should be returned to the caller with status STREAM_PAUSED sin= ce >>>> the caller did not >>>> explicitly acknowledge that it has solved the OUT_OF_SPACE issue. >>>> These semantics can be enabled by adding a new function to the API >>>> perhaps stream_resume(). >>>> This allows the caller to indicate that it acknowledges that it has se= en >>>> the issue and this op >>>> should be used to resolve the issue. Implementations that do not suppo= rt >>>> this mode of use >>>> can push back immediately after one op is in flight. Implementations >>>> that support this use >>>> mode can allow many ops from the same session >>>> >>> [Shally] Is it still in context of having single burst where all op bel= ongs to one stream? If yes, I would still >>> say it would add an overhead to PMDs especially if it is expected to wo= rk closer to HW (which I think is >>> the case with DPDK PMD). >>> Though your approach is doable but why this all cannot be in a layer ab= ove PMD? i.e. a layer above PMD >>> can either pass one-op at a time with burst size =3D 1 OR can make chai= ned mbuf of input and output and >>> pass than as one op. >>> Is it just to ease applications of chained mbuf burden or do you see an= y performance /use-case >>> impacting aspect also? >>> >>> if it is in context where each op belong to different stream in a burst= , then why do we need >>> stream_pause and resume? It is a expectations from app to pass more out= put buffer with consumed + 1 >>> from next call onwards as it has already >>> seen OUT_OF_SPACE. >[Ahmed] Yes, this would add extra overhead to the PMD. Our PMD >implementation rejects all ops that belong to a stream that has entered >"RECOVERABLE" state for one reason or another. The caller must >acknowledge explicitly that it has received news of the problem before >the PMD allows this stream to exit "RECOVERABLE" state. I agree with you >that implementing this functionality in the software layer above the PMD >is a bad idea since the latency reductions are lost. [Shally] Just reiterating, I rather meant other way around i.e. I see it ea= sier to put all such complexity in a layer above PMD. >This setup is useful in latency sensitive applications where the latency >of buffering multiple ops into one op is significant. We found latency >makes a significant difference in search applications where the PMD >competes with software decompression. >> [Fiona] I still have concerns with this and would not want to support in= our PMD. >> TO make sure I understand, you want to send a burst of ops, with several= from same stream. >> If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not process = any >> subsequent ops in that stream. >> Should it return them in a dequeue_burst() with status still NOT_PROCESS= ED? >> Or somehow drop them? How? >> While still processing ops form other streams. >[Ahmed] This is exactly correct. It should return them with >NOT_PROCESSED. Yes, the PMD should continue processing other streams. >> As we want to offload each op to hardware with as little CPU processing = as possible we >> would not want to open up each op to see which stream it's attached to a= nd >> make decisions to do per-stream storage, or drop it, or bypass hw and de= queue without processing. >[Ahmed] I think I might have missed your point here, but I will try to >answer. There is no need to "cushion" ops in DPDK. DPDK should send ops >to the PMD and the PMD should reject until stream_continue() is called. >The next op to be sent by the user will have a special marker in it to >inform the PMD to continue working on this stream. Alternatively the >DPDK layer can be made "smarter" to fail during the enqueue by checking >the stream and its state, but like you say this adds additional CPU >overhead during the enqueue. >I am curious. In a simple synchronous use case. How do we prevent users >from putting multiple ops in flight that belong to a single stream? Do >we just currently say it is undefined behavior? Otherwise we would have >to check the stream and incur the CPU overhead. >> >> Maybe we could add a capability if this behaviour is important for you? >> e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ? >> Our PMD would set this to 0. And expect no more than one op from a state= ful stream >> to be in flight at any time. >[Ahmed] That makes sense. This way the different DPDK implementations do >not have to add extra checking for unsupported cases. [Shally] @ahmed, If I summarise your use-case, this is how to want to PMD t= o support? - a burst *carry only one stream* and all ops then assumed to be belong to = that stream? (please note, here burst is not carrying more than one stream) -PMD will submit one op at a time to HW?=20 -if processed successfully, push it back to completion queue with status = =3D SUCCESS. If failed or run to into OUT_OF_SPACE, then push it to complet= ion queue with status =3D FAILURE/ OUT_OF_SPACE_RECOVERABLE and rest with s= tatus =3D NOT_PROCESSED and return with enqueue count =3D total # of ops su= bmitted originally with burst? -app assumes all have been enqueued, so it go and dequeue all ops -on seeing an op with OUT_OF_SPACE_RECOVERABLE, app resubmit a burst of ops= with call to stream_continue/resume API starting from op which encountered= OUT_OF_SPACE and others as NOT_PROCESSED with updated input and output buf= fer? -repeat until *all* are dequeued with status =3D SUCCESS or *any* with stat= us =3D FAILURE? If anytime failure is seen, then app start whole processing= all over again or just drop this burst?! If all of above is true, then I think we should add another API such as rte= _comp_enque_single_stream() which will be functional under Feature Flag =3D= ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS or better name is SUPPORT_ENQUEUE_SING= LE_STREAM?! >> >> >>>> Regarding the ordering of ops >>>> We do force serialization of ops belonging to a stream in STATEFUL >>>> operation. Related ops do >>>> not go out of order and are given to available PMDs one at a time. >>>> >>>>>> The question is this mode of use useful for real >>>>>> life applications or would we be just adding complexity? The technic= al >>>>>> advantage of this is that processing of Stateful ops is interdepende= nt >>>>>> and PMDs can take advantage of caching and other optimizations to ma= ke >>>>>> processing related ops much faster than switching on every op. PMDs = have >>>>>> maintain state of more than 32 KB for DEFLATE for every stream. >>>>>>>> If the application has all the data, it can put it into chained mb= ufs in a single >>>>>>>> op rather than >>>>>>>> multiple ops, which avoids pushing all that complexity down to the= PMDs. >>>>>> [Ahmed] I think that your suggested scheme of putting all related mb= ufs >>>>>> into one op may be the best solution without the extra complexity of >>>>>> handling OUT_OF_SPACE cases, while still allowing the enqueuer extra >>>>>> time If we have a way of marking mbufs as ready for consumption. The >>>>>> enqueuer may not have all the data at hand but can enqueue the op wi= th a >>>>>> couple of empty mbus marked as not ready for consumption. The enqueu= er >>>>>> will then update the rest of the mbufs to ready for consumption once= the >>>>>> data is added. This introduces a race condition. A second flag for e= ach >>>>>> mbuf can be updated by the PMD to indicate that it processed it or n= ot. >>>>>> This way in cases where the PMD beat the application to the op, the >>>>>> application will just update the op to point to the first unprocesse= d >>>>>> mbuf and resend it to the PMD. >>>>> [Fiona] This doesn't sound safe. You want to add data to a stream aft= er you've >>>>> enqueued the op. You would have to write to op.src.length at a time w= hen the PMD >>>>> might be reading it. Sounds like a lock would be necessary. >>>>> Once the op has been enqueued, my understanding is its ownership is h= anded >>>>> over to the PMD and the application should not touch it until it has = been dequeued. >>>>> I don't think it's a good idea to change this model. >>>>> Can't the application just collect a stream of data in chained mbufs = until it has >>>>> enough to send an op, then construct the op and while waiting for tha= t op to >>>>> complete, accumulate the next batch of chained mbufs? Only construct = the next op >>>>> after the previous one is complete, based on the result of the previo= us one. >>>>> >>>> [Ahmed] Fair enough. I agree with you. I imagined it in a different wa= y >>>> in which each mbuf would have its own length. >>>> The advantage to gain is in applications where there is one PMD user, >>>> the down time between ops can be significant and setting up a single >>>> producer consumer pair significantly reduces the CPU cycles and PMD do= wn >>>> time. >>>> >>>> ////snip//// >