DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ahmed Mansour <ahmed.mansour@nxp.com>
To: "Trahe, Fiona" <fiona.trahe@intel.com>,
	"Verma, Shally" <Shally.Verma@cavium.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "Athreya, Narayana Prasad" <NarayanaPrasad.Athreya@cavium.com>,
	"Gupta, Ashish" <Ashish.Gupta@cavium.com>,
	"Sahu, Sunila" <Sunila.Sahu@cavium.com>,
	"De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>,
	"Challa, Mahipal" <Mahipal.Challa@cavium.com>,
	"Jain, Deepak K" <deepak.k.jain@intel.com>,
	Hemant Agrawal <hemant.agrawal@nxp.com>,
	Roy Pledge <roy.pledge@nxp.com>,
	Youri Querry <youri.querry_1@nxp.com>
Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK
Date: Fri, 16 Feb 2018 21:21:58 +0000	[thread overview]
Message-ID: <AM0PR0402MB38420AA6F8DD32BB465B9A9CE1CB0@AM0PR0402MB3842.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <348A99DA5F5B7549AA880327E580B43589321277@IRSMSX101.ger.corp.intel.com>

>> -----Original Message-----
>> From: Verma, Shally [mailto:Shally.Verma@cavium.com]
>> Sent: Friday, February 16, 2018 7:17 AM
>> To: Ahmed Mansour <ahmed.mansour@nxp.com>; Trahe, Fiona <fiona.trahe@intel.com>;
>> dev@dpdk.org
>> Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya@cavium.com>; Gupta, Ashish
>> <Ashish.Gupta@cavium.com>; Sahu, Sunila <Sunila.Sahu@cavium.com>; De Lara Guarch, Pablo
>> <pablo.de.lara.guarch@intel.com>; Challa, Mahipal <Mahipal.Challa@cavium.com>; Jain, Deepak K
>> <deepak.k.jain@intel.com>; Hemant Agrawal <hemant.agrawal@nxp.com>; Roy Pledge
>> <roy.pledge@nxp.com>; Youri Querry <youri.querry_1@nxp.com>
>> Subject: RE: [RFC v2] doc compression API for DPDK
>>
>> Hi Fiona, Ahmed
>>
>>> -----Original Message-----
>>> From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]
>>> Sent: 16 February 2018 02:40
>>> To: Trahe, Fiona <fiona.trahe@intel.com>; Verma, Shally <Shally.Verma@cavium.com>; dev@dpdk.org
>>> Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya@cavium.com>; Gupta, Ashish
>> <Ashish.Gupta@cavium.com>; Sahu, Sunila
>>> <Sunila.Sahu@cavium.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Challa,
>> Mahipal
>>> <Mahipal.Challa@cavium.com>; Jain, Deepak K <deepak.k.jain@intel.com>; Hemant Agrawal
>> <hemant.agrawal@nxp.com>; Roy
>>> Pledge <roy.pledge@nxp.com>; Youri Querry <youri.querry_1@nxp.com>
>>> Subject: Re: [RFC v2] doc compression API for DPDK
>>>
>>> On 2/15/2018 1:47 PM, Trahe, Fiona wrote:
>>>> Hi Shally, Ahmed,
>>>> Sorry for the delay in replying,
>>>> Comments below
>>>>
>>>>> -----Original Message-----
>>>>> From: Verma, Shally [mailto:Shally.Verma@cavium.com]
>>>>> Sent: Wednesday, February 14, 2018 7:41 AM
>>>>> To: Ahmed Mansour <ahmed.mansour@nxp.com>; Trahe, Fiona <fiona.trahe@intel.com>;
>>>>> dev@dpdk.org
>>>>> Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya@cavium.com>; Gupta, Ashish
>>>>> <Ashish.Gupta@cavium.com>; Sahu, Sunila <Sunila.Sahu@cavium.com>; De Lara Guarch, Pablo
>>>>> <pablo.de.lara.guarch@intel.com>; Challa, Mahipal <Mahipal.Challa@cavium.com>; Jain, Deepak K
>>>>> <deepak.k.jain@intel.com>; Hemant Agrawal <hemant.agrawal@nxp.com>; Roy Pledge
>>>>> <roy.pledge@nxp.com>; Youri Querry <youri.querry_1@nxp.com>
>>>>> Subject: RE: [RFC v2] doc compression API for DPDK
>>>>>
>>>>> Hi Ahmed,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]
>>>>>> Sent: 02 February 2018 01:53
>>>>>> To: Trahe, Fiona <fiona.trahe@intel.com>; Verma, Shally <Shally.Verma@cavium.com>;
>> dev@dpdk.org
>>>>>> Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya@cavium.com>; Gupta, Ashish
>>>>> <Ashish.Gupta@cavium.com>; Sahu, Sunila
>>>>>> <Sunila.Sahu@cavium.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Challa,
>>>>> Mahipal
>>>>>> <Mahipal.Challa@cavium.com>; Jain, Deepak K <deepak.k.jain@intel.com>; Hemant Agrawal
>>>>> <hemant.agrawal@nxp.com>; Roy
>>>>>> Pledge <roy.pledge@nxp.com>; Youri Querry <youri.querry_1@nxp.com>
>>>>>> Subject: Re: [RFC v2] doc compression API for DPDK
>>>>>>
>>>>>> On 1/31/2018 2:03 PM, Trahe, Fiona wrote:
>>>>>>> Hi Ahmed, Shally,
>>>>>>>
>>>>>>> ///snip///
>>>>>>>>>>>>> D.1.1 Stateless and OUT_OF_SPACE
>>>>>>>>>>>>> ------------------------------------------------
>>>>>>>>>>>>> OUT_OF_SPACE is a condition when output buffer runs out of space
>>>>>>>>>> and
>>>>>>>>>>>> where PMD still has more data to produce. If PMD run into such
>>>>>>>>>> condition,
>>>>>>>>>>>> then it's an error condition in stateless processing.
>>>>>>>>>>>>> In such case, PMD resets itself and return with status
>>>>>>>>>>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=consumed=0
>>>>>>>>>> i.e.
>>>>>>>>>>>> no input read, no output written.
>>>>>>>>>>>>> Application can resubmit an full input with larger output buffer size.
>>>>>>>>>>>> [Ahmed] Can we add an option to allow the user to read the data that
>>>>>>>>>> was
>>>>>>>>>>>> produced while still reporting OUT_OF_SPACE? this is mainly useful for
>>>>>>>>>>>> decompression applications doing search.
>>>>>>>>>>> [Shally] It is there but applicable for stateful operation type (please refer to
>>>>>>>>>> handling out_of_space under
>>>>>>>>>>> "Stateful Section").
>>>>>>>>>>> By definition, "stateless" here means that application (such as IPCOMP)
>>>>>>>>>> knows maximum output size
>>>>>>>>>>> guaranteedly and ensure that uncompressed data size cannot grow more
>>>>>>>>>> than provided output buffer.
>>>>>>>>>>> Such apps can submit an op with type = STATELESS and provide full input,
>>>>>>>>>> then PMD assume it has
>>>>>>>>>>> sufficient input and output and thus doesn't need to maintain any contexts
>>>>>>>>>> after op is processed.
>>>>>>>>>>> If application doesn't know about max output size, then it should process it
>>>>>>>>>> as stateful op i.e. setup op
>>>>>>>>>>> with type = STATEFUL and attach a stream so that PMD can maintain
>>>>>>>>>> relevant context to handle such
>>>>>>>>>>> condition.
>>>>>>>>>> [Fiona] There may be an alternative that's useful for Ahmed, while still
>>>>>>>>>> respecting the stateless concept.
>>>>>>>>>> In Stateless case where a PMD reports OUT_OF_SPACE in decompression
>>>>>>>>>> case
>>>>>>>>>> it could also return consumed=0, produced = x, where x>0. X indicates the
>>>>>>>>>> amount of valid data which has
>>>>>>>>>>  been written to the output buffer. It is not complete, but if an application
>>>>>>>>>> wants to search it it may be sufficient.
>>>>>>>>>> If the application still wants the data it must resubmit the whole input with a
>>>>>>>>>> bigger output buffer, and
>>>>>>>>>>  decompression will be repeated from the start, it
>>>>>>>>>>  cannot expect to continue on as the PMD has not maintained state, history
>>>>>>>>>> or data.
>>>>>>>>>> I don't think there would be any need to indicate this in capabilities, PMDs
>>>>>>>>>> which cannot provide this
>>>>>>>>>> functionality would always return produced=consumed=0, while PMDs which
>>>>>>>>>> can could set produced > 0.
>>>>>>>>>> If this works for you both, we could consider a similar case for compression.
>>>>>>>>>>
>>>>>>>>> [Shally] Sounds Fine to me. Though then in that case, consume should also be updated to
>> actual
>>>>>>>> consumed by PMD.
>>>>>>>>> Setting consumed = 0 with produced > 0 doesn't correlate.
>>>>>>>> [Ahmed]I like Fiona's suggestion, but I also do not like the implication
>>>>>>>> of returning consumed = 0. At the same time returning consumed = y
>>>>>>>> implies to the user that it can proceed from the middle. I prefer the
>>>>>>>> consumed = 0 implementation, but I think a different return is needed to
>>>>>>>> distinguish it from OUT_OF_SPACE that the use can recover from. Perhaps
>>>>>>>> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allows
>>>>>>>> future PMD implementations to provide recover-ability even in STATELESS
>>>>>>>> mode if they so wish. In this model STATELESS or STATEFUL would be a
>>>>>>>> hint for the PMD implementation to make optimizations for each case, but
>>>>>>>> it does not force the PMD implementation to limit functionality if it
>>>>>>>> can provide recover-ability.
>>>>>>> [Fiona] So you're suggesting the following:
>>>>>>> OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.produced
>>>>>>>     can be used and next op in stream should continue on from op.consumed+1.
>>>>>>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation.
>>>>>>>     Error condition, no recovery possible.
>>>>>>>     consumed=produced=0. Application must resubmit all input data with
>>>>>>>     a bigger output buffer.
>>>>>>> OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some recovery possible.
>>>>>>>      - consumed = 0, produced > 0. Application must resubmit all input data with
>>>>>>>         a bigger output buffer. However in decompression case, data up to produced
>>>>>>>         in dst buffer may be inspected/searched. Never happens in compression
>>>>>>>         case as output data would be meaningless.
>>>>>>>      - consumed > 0, produced > 0. PMD has stored relevant state and history and so
>>>>>>>         can convert to stateful, using op.produced and continuing from consumed+1.
>>>>>>> I don't expect our PMDs to use this last case, but maybe this works for others?
>>>>>>> I'm not convinced it's not just adding complexity. It sounds like a version of stateful
>>>>>>> without a stream, and maybe less efficient?
>>>>>>> If so should it respect the FLUSH flag? Which would have been FULL or FINAL in the op.
>>>>>>> Or treat it as FLUSH_NONE or SYNC? I don't know why an application would not
>>>>>>> simply have submitted a STATEFUL request if this is the behaviour it wants?
>>>>>> [Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entirely
>>>>>> and replacing it with
>>>>>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation.
>>>>>>        Error condition, no recovery possible.
>>>>>>        - consumed=0 produced=amount of data produced. Application must
>>>>>> resubmit all input data with
>>>>>>          a bigger output buffer to process all of the op
>>>>>> OUT_OF_SPACE_RECOVERABLE -  Normally returned on stateful operation. Not
>>>>>> an error. Op.produced
>>>>>>    can be used and next op in stream should continue on from op.consumed+1.
>>>>>>        -  consumed > 0, produced > 0. PMD has stored relevant state and
>>>>>> history and so
>>>>>>            can continue using op.produced and continuing from consumed+1.
>>>>>>
>>>>>> We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in our
>>>>>> implementation either.
>>>>>>
>>>>>> Regardless of speculative future PMDs. The more important aspect of this
>>>>>> for today is that the return status clearly determines
>>>>>> the meaning of "consumed". If it is RECOVERABLE then consumed is
>>>>>> meaningful. if it is TERMINATED then consumed in meaningless.
>>>>>> This way we take away the ambiguity of having OUT_OF_SPACE mean two
>>>>>> different user work flows.
>>>>>>
>>>>>> A speculative future PMD may be designed to return RECOVERABLE for
>>>>>> stateless ops that are attached to streams.
>>>>>> A future PMD may look to see if an op has a stream is attached and write
>>>>>> out the state there and go into recoverable mode.
>>>>>> in essence this leaves the choice up to the implementation and allows
>>>>>> the PMD to take advantage of stateless optimizations
>>>>>> so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump
>>>>>> context as soon as it fully processes an op. It will only
>>>>>> write context out in cases where the op chokes.
>>>>>> This futuristic PMD should ignore the FLUSH since this STATELESS mode as
>>>>>> indicated by the user and optimize
>>>>> [Shally] IMO, it looks okay to have two separate return code TERMINATED and RECOVERABLE with
>>>>> definition as you mentioned and seem doable.
>>>>> So then it mean all following conditions:
>>>>> a. stateless with flush = full/final, no stream pointer provided , PMD can return TERMINATED i.e.
>> user
>>>>> has to start all over again, it's a failure (as in current definition)
>>>>> b. stateless with flush = full/final, stream pointer provided, here it's up to PMD to return either
>>>>> TERMINATED or RECOVERABLE depending upon its ability (note if Recoverable, then PMD will
>> maintain
>>>>> states in stream pointer)
>>>>> c. stateful with flush = full / NO_SYNC, stream pointer always there, PMD will
>>>>> TERMINATED/RECOVERABLE depending on STATEFUL_COMPRESSION/DECOMPRESSION feature
>> flag
>>>>> enabled or not
>>>> [Fiona] I don't think the flush flag is relevant - it could be out of space on any flush flag, and if out of
>> space
>>>> should ignore the flush flag.
>>>> Is there a need for TERMINATED? - I didn't think it would ever need to be returned in stateful case.
>>>>  Why the ref to feature flag? If a PMD doesn't support a feature I think it should fail the op - not with
>>>>  out-of space, but unsupported or similar. Or it would fail on stream creation.
>>> [Ahmed] Agreed with Fiona. The flush flag only matters on success. By
>>> definition the PMD should return OUT_OF_SPACE_RECOVERABLE in stateful
>>> mode when it runs out of space.
>>> @Shally If the user did not provide a stream, then the PMD should
>>> probably return TERMINATED every time. I am not sure we should make a
>>> "really smart" PMD which returns RECOVERABLE even if no stream pointer
>>> was given. In that case the PMD must give some ID back to the caller
>>> that the caller can use to "recover" the op. I am not sure how it would
>>> be implemented in the PMD and when does the PMD decide to retire streams
>>> belonging to dead ops that the caller decided not to "recover".
>>>>> and one more exception case is:
>>>>> d. stateless with flush = full, no stream pointer provided, PMD can return RECOVERABLE i.e. PMD
>>>>> internally maintained that state somehow and consumed & produced > 0, so user can start
>> consumed+1
>>>>> but there's restriction on user not to alter or change op until it is fully processed?!
>>>> [Fiona] Why the need for this case?
>>>> There's always a restriction on user not to alter or change op until it is fully processed.
>>>> If a PMD can do this - why doesn't it create a stream when that API is called - and then it's same as b?
>>> [Ahmed] Agreed. The user should not touch an op once enqueued until they
>>> receive it in dequeue. We ignore the flush in stateless mode. We assume
>>> it to be final every time.
>> [Shally] Agreed and am not in favour of supporting such implementation either. Just listed out different
>> possibilities up here to better visualise Ahmed requirements/applicability of TERMINATED and
>> RECOVERABLE.
>>
>>>>> API currently takes care of case a and c, and case b can be supported if specification accept another
>>>>> proposal which mention optional usage of stream with stateless.
>>>> [Fiona] API has this, but as we agreed, not optional to call the create_stream() with an op_type
>>>> parameter (stateful/stateless). PMD can return NULL or provide a stream, if the latter then that
>>>> stream must be attached to ops.
>>>>
>>>>  Until then API takes no difference to
>>>>> case b and c i.e. we can have op such as,
>>>>> - type= stateful with flush = full/final, stream pointer provided, PMD can return
>>>>> TERMINATED/RECOVERABLE according to its ability
>>>>>
>>>>> Case d , is something exceptional, if there's requirement in PMDs to support it, then believe it will be
>>>>> doable with concept of different return code.
>>>>>
>>>> [Fiona] That's not quite how I understood it. Can it be simpler and only following cases?
>>>> a. stateless with flush = full/final, no stream pointer provided , PMD can return TERMINATED i.e. user
>>>>     has to start all over again, it's a failure (as in current definition).
>>>>     consumed = 0, produced=amount of data produced. This is usually 0, but in decompression
>>>>     case a PMD may return > 0 and application may find it useful to inspect that data.
>>>> b. stateless with flush = full/final, stream pointer provided, here it's up to PMD to return either
>>>>     TERMINATED or RECOVERABLE depending upon its ability (note if Recoverable, then PMD will
>> maintain
>>>>     states in stream pointer)
>>>> c. stateful with flush = any, stream pointer always there, PMD will return RECOVERABLE.
>>>>     op.produced can be used and next op in stream should continue on from op.consumed+1.
>>>>     Consumed=0, produced=0 is an unusual but allowed case. I'm not sure if it could ever happen, but
>>>>     no need to change state to TERMINATED in this case. There may be useful state/history
>>>>     stored in the PMD, even though no output produced yet.
>>> [Ahmed] Agreed
>> [Shally] Sounds good.
>>
>>>>>>>>>>>>> D.2 Compression API Stateful operation
>>>>>>>>>>>>> ----------------------------------------------------------
>>>>>>>>>>>>>  A Stateful operation in DPDK compression means application invokes
>>>>>>>>>>>> enqueue burst() multiple times to process related chunk of data either
>>>>>>>>>>>> because
>>>>>>>>>>>>> - Application broke data into several ops, and/or
>>>>>>>>>>>>> - PMD ran into out_of_space situation during input processing
>>>>>>>>>>>>>
>>>>>>>>>>>>> In case of either one or all of the above conditions, PMD is required to
>>>>>>>>>>>> maintain state of op across enque_burst() calls and
>>>>>>>>>>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with
>>>>>>>>>>>> flush value = RTE_COMP_NO/SYNC_FLUSH and end at flush value
>>>>>>>>>>>> RTE_COMP_FULL/FINAL_FLUSH.
>>>>>>>>>>>>> D.2.1 Stateful operation state maintenance
>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>> It is always an ideal expectation from application that it should parse
>>>>>>>>>>>> through all related chunk of source data making its mbuf-chain and
>>>>>>>>>> enqueue
>>>>>>>>>>>> it for stateless processing.
>>>>>>>>>>>>> However, if it need to break it into several enqueue_burst() calls, then
>>>>>>>>>> an
>>>>>>>>>>>> expected call flow would be something like:
>>>>>>>>>>>>> enqueue_burst( |op.no_flush |)
>>>>>>>>>>>> [Ahmed] The work is now in flight to the PMD.The user will call dequeue
>>>>>>>>>>>> burst in a loop until all ops are received. Is this correct?
>>>>>>>>>>>>
>>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next
>>>>>>>>>>> [Shally] Yes. Ideally every submitted op need to be dequeued. However
>>>>>>>>>> this illustration is specifically in
>>>>>>>>>>> context of stateful op processing to reflect if a stream is broken into
>>>>>>>>>> chunks, then each chunk should be
>>>>>>>>>>> submitted as one op at-a-time with type = STATEFUL and need to be
>>>>>>>>>> dequeued first before next chunk is
>>>>>>>>>>> enqueued.
>>>>>>>>>>>
>>>>>>>>>>>>> enqueue_burst( |op.no_flush |)
>>>>>>>>>>>>> deque_burst(op) // should dequeue before we enqueue next
>>>>>>>>>>>>> enqueue_burst( |op.full_flush |)
>>>>>>>>>>>> [Ahmed] Why now allow multiple work items in flight? I understand that
>>>>>>>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we just
>>>>>>>>>> distinguish
>>>>>>>>>>>> the response in exception cases?
>>>>>>>>>>> [Shally] Multiples ops are allowed in flight, however condition is each op in
>>>>>>>>>> such case is independent of
>>>>>>>>>>> each other i.e. belong to different streams altogether.
>>>>>>>>>>> Earlier (as part of RFC v1 doc) we did consider the proposal to process all
>>>>>>>>>> related chunks of data in single
>>>>>>>>>>> burst by passing them as ops array but later found that as not-so-useful for
>>>>>>>>>> PMD handling for various
>>>>>>>>>>> reasons. You may please refer to RFC v1 doc review comments for same.
>>>>>>>>>> [Fiona] Agree with Shally. In summary, as only one op can be processed at a
>>>>>>>>>> time, since each needs the
>>>>>>>>>> state of the previous, to allow more than 1 op to be in-flight at a time would
>>>>>>>>>> force PMDs to implement internal queueing and exception handling for
>>>>>>>>>> OUT_OF_SPACE conditions you mention.
>>>>>>>> [Ahmed] But we are putting the ops on qps which would make them
>>>>>>>> sequential. Handling OUT_OF_SPACE conditions would be a little bit more
>>>>>>>> complex but doable.
>>>>>>> [Fiona] In my opinion this is not doable, could be very inefficient.
>>>>>>> There may be many streams.
>>>>>>> The PMD would have to have an internal queue per stream so
>>>>>>> it could adjust the next src offset and length in the OUT_OF_SPACE case.
>>>>>>> And this may ripple back though all subsequent ops in the stream as each
>>>>>>> source len is increased and its dst buffer is not big enough.
>>>>>> [Ahmed] Regarding multi op OUT_OF_SPACE handling.
>>>>>> The caller would still need to adjust
>>>>>> the src length/output buffer as you say. The PMD cannot handle
>>>>>> OUT_OF_SPACE internally.
>>>>>> After OUT_OF_SPACE occurs, the PMD should reject all ops in this stream
>>>>>> until it gets explicit
>>>>>> confirmation from the caller to continue working on this stream. Any ops
>>>>>> received by
>>>>>> the PMD should be returned to the caller with status STREAM_PAUSED since
>>>>>> the caller did not
>>>>>> explicitly acknowledge that it has solved the OUT_OF_SPACE issue.
>>>>>> These semantics can be enabled by adding a new function to the API
>>>>>> perhaps stream_resume().
>>>>>> This allows the caller to indicate that it acknowledges that it has seen
>>>>>> the issue and this op
>>>>>> should be used to resolve the issue. Implementations that do not support
>>>>>> this mode of use
>>>>>> can push back immediately after one op is in flight. Implementations
>>>>>> that support this use
>>>>>> mode can allow many ops from the same session
>>>>>>
>>>>> [Shally] Is it still in context of having single burst where all op belongs to one stream? If yes, I would
>> still
>>>>> say it would add an overhead to PMDs especially if it is expected to work closer to HW (which I think
>> is
>>>>> the case with DPDK PMD).
>>>>> Though your approach is doable but why this all cannot be in a layer above PMD? i.e. a layer above
>> PMD
>>>>> can either pass one-op at a time with burst size = 1 OR can make chained mbuf of input and output
>> and
>>>>> pass than as one op.
>>>>> Is it just to ease applications of chained mbuf burden or do you see any performance /use-case
>>>>> impacting aspect also?
>>>>>
>>>>> if it is in context where each op belong to different stream in a burst, then why do we need
>>>>> stream_pause and resume? It is a expectations from app to pass more output buffer with consumed
>> + 1
>>>>> from next call onwards as it has already
>>>>> seen OUT_OF_SPACE.
>>> [Ahmed] Yes, this would add extra overhead to the PMD. Our PMD
>>> implementation rejects all ops that belong to a stream that has entered
>>> "RECOVERABLE" state for one reason or another. The caller must
>>> acknowledge explicitly that it has received news of the problem before
>>> the PMD allows this stream to exit "RECOVERABLE" state. I agree with you
>>> that implementing this functionality in the software layer above the PMD
>>> is a bad idea since the latency reductions are lost.
>> [Shally] Just reiterating, I rather meant other way around i.e. I see it easier to put all such complexity in a
>> layer above PMD.
>>
>>> This setup is useful in latency sensitive applications where the latency
>>> of buffering multiple ops into one op is significant. We found latency
>>> makes a significant difference in search applications where the PMD
>>> competes with software decompression.
> [Fiona] I see, so when all goes well, you get best-case latency, but when 
> out-of-space occurs latency will probably be worse.
[Ahmed] This is exactly right. This use mode assumes out-of-space is a
rare occurrence. Recovering from it should take similar time to
synchronous implementations. The caller gets OUT_OF_SPACE_RECOVERABLE in
both sync and async use. The caller can fix up the op and send it back
to the PMD to continue work just as would be done in sync. Nonetheless,
the added complexity is not justifiable if out-of-space is very common
since the recoverable state will be the limiting factor that forces
synchronicity.
>>>> [Fiona] I still have concerns with this and would not want to support in our PMD.
>>>> TO make sure I understand, you want to send a burst of ops, with several from same stream.
>>>> If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not process any
>>>> subsequent ops in that stream.
>>>> Should it return them in a dequeue_burst() with status still NOT_PROCESSED?
>>>> Or somehow drop them? How?
>>>> While still processing ops form other streams.
>>> [Ahmed] This is exactly correct. It should return them with
>>> NOT_PROCESSED. Yes, the PMD should continue processing other streams.
>>>> As we want to offload each op to hardware with as little CPU processing as possible we
>>>> would not want to open up each op to see which stream it's attached to and
>>>> make decisions to do per-stream storage, or drop it, or bypass hw and dequeue without processing.
>>> [Ahmed] I think I might have missed your point here, but I will try to
>>> answer. There is no need to "cushion" ops in DPDK. DPDK should send ops
>>> to the PMD and the PMD should reject until stream_continue() is called.
>>> The next op to be sent by the user will have a special marker in it to
>>> inform the PMD to continue working on this stream. Alternatively the
>>> DPDK layer can be made "smarter" to fail during the enqueue by checking
>>> the stream and its state, but like you say this adds additional CPU
>>> overhead during the enqueue.
>>> I am curious. In a simple synchronous use case. How do we prevent users
>> >from putting multiple ops in flight that belong to a single stream? Do
>>> we just currently say it is undefined behavior? Otherwise we would have
>>> to check the stream and incur the CPU overhead.
> [Fiona] We don't do anything to prevent it. It's undefined. IMO on data path in
> DPDK model we expect good behaviour and don't have to error check for things like this.
[Ahmed] This makes sense. We also assume good behavior.
> In our PMD if we got a burst of 20 ops, we allocate 20 spaces on the hw q, then 
> build and send those messages. If we found an op from a stream which already
> had one inflight, we'd have to hold that back, store in a sw stream-specific holding queue,
> only send 19 to hw. We cannot send multiple ops from same stream to
> the hw as it fans them out and does them in parallel.
> Once the enqueue_burst() returns, there is no processing 
> context which would spot that the first has completed
> and send the next op to the hw. On a dequeue_burst() we would spot this, 
> in that context could process the next op in the stream.
> On out of space, instead of processing the next op we would have to transfer
> all unprocessed ops from the stream to the dequeue result.
> Some parts of this are doable, but seems likely to add a lot more latency, 
> we'd need to add extra threads and timers to move ops from the sw
> queue to the hw q to get any benefit, and these constructs would add 
> context switching and CPU cycles. So we prefer to push this responsibility
> to above the API and it can achieve similar.
[Ahmed] I see what you mean. Our workflow is almost exactly the same
with our hardware, but the fanning out is done by the hardware based on
the stream and ops that belong to the same stream are never allowed to
go out of order. Otherwise the data would be corrupted. Likewise the
hardware is responsible for checking the state of the stream and
returning frames as NOT_PROCESSED to the software
>>>> Maybe we could add a capability if this behaviour is important for you?
>>>> e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ?
>>>> Our PMD would set this to 0. And expect no more than one op from a stateful stream
>>>> to be in flight at any time.
>>> [Ahmed] That makes sense. This way the different DPDK implementations do
>>> not have to add extra checking for unsupported cases.
>> [Shally] @ahmed, If I summarise your use-case, this is how to want to PMD to support?
>> - a burst *carry only one stream* and all ops then assumed to be belong to that stream? (please note,
>> here burst is not carrying more than one stream)
[Ahmed] No. In this use case the caller sets up an op and enqueues a
single op. Then before the response comes back from the PMD the caller
enqueues a second op on the same stream.
>> -PMD will submit one op at a time to HW?
[Ahmed] I misunderstood what PMD means. I used it throughout to mean the
HW. I used DPDK to mean the software implementation that talks to the
hardware.
The software will submit all ops immediately. The hardware has to figure
out what to do with the ops depending on what stream they belong to.
>> -if processed successfully, push it back to completion queue with status = SUCCESS. If failed or run to
>> into OUT_OF_SPACE, then push it to completion queue with status = FAILURE/
>> OUT_OF_SPACE_RECOVERABLE and rest with status = NOT_PROCESSED and return with enqueue count
>> = total # of ops submitted originally with burst?
[Ahmed] This is exactly what I had in mind. all ops will be submitted to
the HW. The HW will put all of them on the completion queue with the
correct status exactly as you say.
>> -app assumes all have been enqueued, so it go and dequeue all ops
>> -on seeing an op with OUT_OF_SPACE_RECOVERABLE, app resubmit a burst of ops with call to
>> stream_continue/resume API starting from op which encountered OUT_OF_SPACE and others as
>> NOT_PROCESSED with updated input and output buffer?
[Ahmed] Correct this is what we do today in our proprietary API.
>> -repeat until *all* are dequeued with status = SUCCESS or *any* with status = FAILURE? If anytime
>> failure is seen, then app start whole processing all over again or just drop this burst?!
[Ahmed] The app has the choice on how to proceed. If the issue is
recoverable then the application can continue this stream from where it
stopped. if the failure is unrecoverable then the application should
first fix the problem and start from the beginning of the stream.
>> If all of above is true, then I think we should add another API such as rte_comp_enque_single_stream()
>> which will be functional under Feature Flag = ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS or better
>> name is SUPPORT_ENQUEUE_SINGLE_STREAM?!
[Ahmed] The main advantage in async use is lost if we force all related
ops to be in the same burst. if we do that, then we might as well merge
all the ops into one op. That would reduce the overhead.
The use mode I am proposing is only useful in cases where the data
becomes available after the first enqueue occurred. I want to allow the
caller to enqueue the second set of data as soon as it is available
regardless of whether or not the HW has already started working on the
first op inflight.
> [Fiona] Am curious about Ahmed's response to this. I didn't get that a burst should carry only one stream
> Or get how this makes a difference? As there can be many enqueue_burst() calls done before an dequeue_burst() 
> Maybe you're thinking the enqueue_burst() would be a blocking call that would not return until all the ops
> had been processed? This would turn it into a synchronous call which isn't the intent.
[Ahmed] Agreed, a blocking or even a buffering software layer that baby
sits the hardware does not fundamentally change the parameters of the
system as a whole. It just moves workflow management complexity down
into the DPDK software layer. Rather there are real latency and
throughput advantages (because of caching) that I want to expose.

/// snip ///


  reply	other threads:[~2018-02-16 21:22 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-04 11:45 Verma, Shally
2018-01-09 19:07 ` Ahmed Mansour
2018-01-10 12:55   ` Verma, Shally
2018-01-11 18:53     ` Trahe, Fiona
2018-01-12 13:49       ` Verma, Shally
2018-01-25 18:19         ` Ahmed Mansour
2018-01-29 12:47           ` Verma, Shally
2018-01-31 19:03           ` Trahe, Fiona
2018-02-01  5:40             ` Verma, Shally
2018-02-01 11:54               ` Trahe, Fiona
2018-02-01 20:50                 ` Ahmed Mansour
2018-02-14  5:41                   ` Verma, Shally
2018-02-14 16:54                     ` Ahmed Mansour
2018-02-15  5:53                       ` Verma, Shally
2018-02-15 17:20                         ` Trahe, Fiona
2018-02-15 19:51                           ` Ahmed Mansour
2018-02-16 11:11                             ` Trahe, Fiona
2018-02-01 20:23             ` Ahmed Mansour
2018-02-14  7:41               ` Verma, Shally
2018-02-15 18:47                 ` Trahe, Fiona
2018-02-15 21:09                   ` Ahmed Mansour
2018-02-16  7:16                     ` Verma, Shally
2018-02-16 13:04                       ` Trahe, Fiona
2018-02-16 21:21                         ` Ahmed Mansour [this message]
2018-02-20  9:58                           ` Verma, Shally
2018-02-20 19:56                             ` Ahmed Mansour
2018-02-21 14:35                               ` Trahe, Fiona
2018-02-21 19:35                                 ` Ahmed Mansour
2018-02-22  4:47                                   ` Verma, Shally
2018-02-22 19:35                                     ` Ahmed Mansour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM0PR0402MB38420AA6F8DD32BB465B9A9CE1CB0@AM0PR0402MB3842.eurprd04.prod.outlook.com \
    --to=ahmed.mansour@nxp.com \
    --cc=Ashish.Gupta@cavium.com \
    --cc=Mahipal.Challa@cavium.com \
    --cc=NarayanaPrasad.Athreya@cavium.com \
    --cc=Shally.Verma@cavium.com \
    --cc=Sunila.Sahu@cavium.com \
    --cc=deepak.k.jain@intel.com \
    --cc=dev@dpdk.org \
    --cc=fiona.trahe@intel.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=pablo.de.lara.guarch@intel.com \
    --cc=roy.pledge@nxp.com \
    --cc=youri.querry_1@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).