From: Ahmed Mansour <ahmed.mansour@nxp.com>
To: "Trahe, Fiona" <fiona.trahe@intel.com>,
"Verma, Shally" <Shally.Verma@cavium.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "Athreya, Narayana Prasad" <NarayanaPrasad.Athreya@cavium.com>,
"Gupta, Ashish" <Ashish.Gupta@cavium.com>,
"Sahu, Sunila" <Sunila.Sahu@cavium.com>,
"De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>,
"Challa, Mahipal" <Mahipal.Challa@cavium.com>,
"Jain, Deepak K" <deepak.k.jain@intel.com>,
Hemant Agrawal <hemant.agrawal@nxp.com>,
Roy Pledge <roy.pledge@nxp.com>,
Youri Querry <youri.querry_1@nxp.com>
Subject: Re: [dpdk-dev] [RFC v2] doc compression API for DPDK
Date: Thu, 1 Feb 2018 20:23:25 +0000 [thread overview]
Message-ID: <DB3PR0402MB3852855639B5464D3AE1A441E1FA0@DB3PR0402MB3852.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <348A99DA5F5B7549AA880327E580B43589315232@IRSMSX101.ger.corp.intel.com>
On 1/31/2018 2:03 PM, Trahe, Fiona wrote:
> Hi Ahmed, Shally,
>
> ///snip///
>>>>>>> D.1.1 Stateless and OUT_OF_SPACE
>>>>>>> ------------------------------------------------
>>>>>>> OUT_OF_SPACE is a condition when output buffer runs out of space
>>>> and
>>>>>> where PMD still has more data to produce. If PMD run into such
>>>> condition,
>>>>>> then it's an error condition in stateless processing.
>>>>>>> In such case, PMD resets itself and return with status
>>>>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=consumed=0
>>>> i.e.
>>>>>> no input read, no output written.
>>>>>>> Application can resubmit an full input with larger output buffer size.
>>>>>> [Ahmed] Can we add an option to allow the user to read the data that
>>>> was
>>>>>> produced while still reporting OUT_OF_SPACE? this is mainly useful for
>>>>>> decompression applications doing search.
>>>>> [Shally] It is there but applicable for stateful operation type (please refer to
>>>> handling out_of_space under
>>>>> "Stateful Section").
>>>>> By definition, "stateless" here means that application (such as IPCOMP)
>>>> knows maximum output size
>>>>> guaranteedly and ensure that uncompressed data size cannot grow more
>>>> than provided output buffer.
>>>>> Such apps can submit an op with type = STATELESS and provide full input,
>>>> then PMD assume it has
>>>>> sufficient input and output and thus doesn't need to maintain any contexts
>>>> after op is processed.
>>>>> If application doesn't know about max output size, then it should process it
>>>> as stateful op i.e. setup op
>>>>> with type = STATEFUL and attach a stream so that PMD can maintain
>>>> relevant context to handle such
>>>>> condition.
>>>> [Fiona] There may be an alternative that's useful for Ahmed, while still
>>>> respecting the stateless concept.
>>>> In Stateless case where a PMD reports OUT_OF_SPACE in decompression
>>>> case
>>>> it could also return consumed=0, produced = x, where x>0. X indicates the
>>>> amount of valid data which has
>>>> been written to the output buffer. It is not complete, but if an application
>>>> wants to search it it may be sufficient.
>>>> If the application still wants the data it must resubmit the whole input with a
>>>> bigger output buffer, and
>>>> decompression will be repeated from the start, it
>>>> cannot expect to continue on as the PMD has not maintained state, history
>>>> or data.
>>>> I don't think there would be any need to indicate this in capabilities, PMDs
>>>> which cannot provide this
>>>> functionality would always return produced=consumed=0, while PMDs which
>>>> can could set produced > 0.
>>>> If this works for you both, we could consider a similar case for compression.
>>>>
>>> [Shally] Sounds Fine to me. Though then in that case, consume should also be updated to actual
>> consumed by PMD.
>>> Setting consumed = 0 with produced > 0 doesn't correlate.
>> [Ahmed]I like Fiona's suggestion, but I also do not like the implication
>> of returning consumed = 0. At the same time returning consumed = y
>> implies to the user that it can proceed from the middle. I prefer the
>> consumed = 0 implementation, but I think a different return is needed to
>> distinguish it from OUT_OF_SPACE that the use can recover from. Perhaps
>> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allows
>> future PMD implementations to provide recover-ability even in STATELESS
>> mode if they so wish. In this model STATELESS or STATEFUL would be a
>> hint for the PMD implementation to make optimizations for each case, but
>> it does not force the PMD implementation to limit functionality if it
>> can provide recover-ability.
> [Fiona] So you're suggesting the following:
> OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.produced
> can be used and next op in stream should continue on from op.consumed+1.
> OUT_OF_SPACE_TERMINATED - returned only on stateless operation.
> Error condition, no recovery possible.
> consumed=produced=0. Application must resubmit all input data with
> a bigger output buffer.
> OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some recovery possible.
> - consumed = 0, produced > 0. Application must resubmit all input data with
> a bigger output buffer. However in decompression case, data up to produced
> in dst buffer may be inspected/searched. Never happens in compression
> case as output data would be meaningless.
> - consumed > 0, produced > 0. PMD has stored relevant state and history and so
> can convert to stateful, using op.produced and continuing from consumed+1.
> I don't expect our PMDs to use this last case, but maybe this works for others?
> I'm not convinced it's not just adding complexity. It sounds like a version of stateful
> without a stream, and maybe less efficient?
> If so should it respect the FLUSH flag? Which would have been FULL or FINAL in the op.
> Or treat it as FLUSH_NONE or SYNC? I don't know why an application would not
> simply have submitted a STATEFUL request if this is the behaviour it wants?
[Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entirely
and replacing it with
OUT_OF_SPACE_TERMINATED - returned only on stateless operation.
Error condition, no recovery possible.
- consumed=0 produced=amount of data produced. Application must
resubmit all input data with
a bigger output buffer to process all of the op
OUT_OF_SPACE_RECOVERABLE - Normally returned on stateful operation. Not
an error. Op.produced
can be used and next op in stream should continue on from op.consumed+1.
- consumed > 0, produced > 0. PMD has stored relevant state and
history and so
can continue using op.produced and continuing from consumed+1.
We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in our
implementation either.
Regardless of speculative future PMDs. The more important aspect of this
for today is that the return status clearly determines
the meaning of "consumed". If it is RECOVERABLE then consumed is
meaningful. if it is TERMINATED then consumed in meaningless.
This way we take away the ambiguity of having OUT_OF_SPACE mean two
different user work flows.
A speculative future PMD may be designed to return RECOVERABLE for
stateless ops that are attached to streams.
A future PMD may look to see if an op has a stream is attached and write
out the state there and go into recoverable mode.
in essence this leaves the choice up to the implementation and allows
the PMD to take advantage of stateless optimizations
so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump
context as soon as it fully processes an op. It will only
write context out in cases where the op chokes.
This futuristic PMD should ignore the FLUSH since this STATELESS mode as
indicated by the user and optimize
>>>>>>> D.2 Compression API Stateful operation
>>>>>>> ----------------------------------------------------------
>>>>>>> A Stateful operation in DPDK compression means application invokes
>>>>>> enqueue burst() multiple times to process related chunk of data either
>>>>>> because
>>>>>>> - Application broke data into several ops, and/or
>>>>>>> - PMD ran into out_of_space situation during input processing
>>>>>>>
>>>>>>> In case of either one or all of the above conditions, PMD is required to
>>>>>> maintain state of op across enque_burst() calls and
>>>>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with
>>>>>> flush value = RTE_COMP_NO/SYNC_FLUSH and end at flush value
>>>>>> RTE_COMP_FULL/FINAL_FLUSH.
>>>>>>> D.2.1 Stateful operation state maintenance
>>>>>>> ---------------------------------------------------------------
>>>>>>> It is always an ideal expectation from application that it should parse
>>>>>> through all related chunk of source data making its mbuf-chain and
>>>> enqueue
>>>>>> it for stateless processing.
>>>>>>> However, if it need to break it into several enqueue_burst() calls, then
>>>> an
>>>>>> expected call flow would be something like:
>>>>>>> enqueue_burst( |op.no_flush |)
>>>>>> [Ahmed] The work is now in flight to the PMD.The user will call dequeue
>>>>>> burst in a loop until all ops are received. Is this correct?
>>>>>>
>>>>>>> deque_burst(op) // should dequeue before we enqueue next
>>>>> [Shally] Yes. Ideally every submitted op need to be dequeued. However
>>>> this illustration is specifically in
>>>>> context of stateful op processing to reflect if a stream is broken into
>>>> chunks, then each chunk should be
>>>>> submitted as one op at-a-time with type = STATEFUL and need to be
>>>> dequeued first before next chunk is
>>>>> enqueued.
>>>>>
>>>>>>> enqueue_burst( |op.no_flush |)
>>>>>>> deque_burst(op) // should dequeue before we enqueue next
>>>>>>> enqueue_burst( |op.full_flush |)
>>>>>> [Ahmed] Why now allow multiple work items in flight? I understand that
>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we just
>>>> distinguish
>>>>>> the response in exception cases?
>>>>> [Shally] Multiples ops are allowed in flight, however condition is each op in
>>>> such case is independent of
>>>>> each other i.e. belong to different streams altogether.
>>>>> Earlier (as part of RFC v1 doc) we did consider the proposal to process all
>>>> related chunks of data in single
>>>>> burst by passing them as ops array but later found that as not-so-useful for
>>>> PMD handling for various
>>>>> reasons. You may please refer to RFC v1 doc review comments for same.
>>>> [Fiona] Agree with Shally. In summary, as only one op can be processed at a
>>>> time, since each needs the
>>>> state of the previous, to allow more than 1 op to be in-flight at a time would
>>>> force PMDs to implement internal queueing and exception handling for
>>>> OUT_OF_SPACE conditions you mention.
>> [Ahmed] But we are putting the ops on qps which would make them
>> sequential. Handling OUT_OF_SPACE conditions would be a little bit more
>> complex but doable.
> [Fiona] In my opinion this is not doable, could be very inefficient.
> There may be many streams.
> The PMD would have to have an internal queue per stream so
> it could adjust the next src offset and length in the OUT_OF_SPACE case.
> And this may ripple back though all subsequent ops in the stream as each
> source len is increased and its dst buffer is not big enough.
[Ahmed] Regarding multi op OUT_OF_SPACE handling.
The caller would still need to adjust
the src length/output buffer as you say. The PMD cannot handle
OUT_OF_SPACE internally.
After OUT_OF_SPACE occurs, the PMD should reject all ops in this stream
until it gets explicit
confirmation from the caller to continue working on this stream. Any ops
received by
the PMD should be returned to the caller with status STREAM_PAUSED since
the caller did not
explicitly acknowledge that it has solved the OUT_OF_SPACE issue.
These semantics can be enabled by adding a new function to the API
perhaps stream_resume().
This allows the caller to indicate that it acknowledges that it has seen
the issue and this op
should be used to resolve the issue. Implementations that do not support
this mode of use
can push back immediately after one op is in flight. Implementations
that support this use
mode can allow many ops from the same session
Regarding the ordering of ops
We do force serialization of ops belonging to a stream in STATEFUL
operation. Related ops do
not go out of order and are given to available PMDs one at a time.
>> The question is this mode of use useful for real
>> life applications or would we be just adding complexity? The technical
>> advantage of this is that processing of Stateful ops is interdependent
>> and PMDs can take advantage of caching and other optimizations to make
>> processing related ops much faster than switching on every op. PMDs have
>> maintain state of more than 32 KB for DEFLATE for every stream.
>>>> If the application has all the data, it can put it into chained mbufs in a single
>>>> op rather than
>>>> multiple ops, which avoids pushing all that complexity down to the PMDs.
>> [Ahmed] I think that your suggested scheme of putting all related mbufs
>> into one op may be the best solution without the extra complexity of
>> handling OUT_OF_SPACE cases, while still allowing the enqueuer extra
>> time If we have a way of marking mbufs as ready for consumption. The
>> enqueuer may not have all the data at hand but can enqueue the op with a
>> couple of empty mbus marked as not ready for consumption. The enqueuer
>> will then update the rest of the mbufs to ready for consumption once the
>> data is added. This introduces a race condition. A second flag for each
>> mbuf can be updated by the PMD to indicate that it processed it or not.
>> This way in cases where the PMD beat the application to the op, the
>> application will just update the op to point to the first unprocessed
>> mbuf and resend it to the PMD.
> [Fiona] This doesn't sound safe. You want to add data to a stream after you've
> enqueued the op. You would have to write to op.src.length at a time when the PMD
> might be reading it. Sounds like a lock would be necessary.
> Once the op has been enqueued, my understanding is its ownership is handed
> over to the PMD and the application should not touch it until it has been dequeued.
> I don't think it's a good idea to change this model.
> Can't the application just collect a stream of data in chained mbufs until it has
> enough to send an op, then construct the op and while waiting for that op to
> complete, accumulate the next batch of chained mbufs? Only construct the next op
> after the previous one is complete, based on the result of the previous one.
>
[Ahmed] Fair enough. I agree with you. I imagined it in a different way
in which each mbuf would have its own length.
The advantage to gain is in applications where there is one PMD user,
the down time between ops can be significant and setting up a single
producer consumer pair significantly reduces the CPU cycles and PMD down
time.
////snip////
next prev parent reply other threads:[~2018-02-01 20:23 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-04 11:45 Verma, Shally
2018-01-09 19:07 ` Ahmed Mansour
2018-01-10 12:55 ` Verma, Shally
2018-01-11 18:53 ` Trahe, Fiona
2018-01-12 13:49 ` Verma, Shally
2018-01-25 18:19 ` Ahmed Mansour
2018-01-29 12:47 ` Verma, Shally
2018-01-31 19:03 ` Trahe, Fiona
2018-02-01 5:40 ` Verma, Shally
2018-02-01 11:54 ` Trahe, Fiona
2018-02-01 20:50 ` Ahmed Mansour
2018-02-14 5:41 ` Verma, Shally
2018-02-14 16:54 ` Ahmed Mansour
2018-02-15 5:53 ` Verma, Shally
2018-02-15 17:20 ` Trahe, Fiona
2018-02-15 19:51 ` Ahmed Mansour
2018-02-16 11:11 ` Trahe, Fiona
2018-02-01 20:23 ` Ahmed Mansour [this message]
2018-02-14 7:41 ` Verma, Shally
2018-02-15 18:47 ` Trahe, Fiona
2018-02-15 21:09 ` Ahmed Mansour
2018-02-16 7:16 ` Verma, Shally
2018-02-16 13:04 ` Trahe, Fiona
2018-02-16 21:21 ` Ahmed Mansour
2018-02-20 9:58 ` Verma, Shally
2018-02-20 19:56 ` Ahmed Mansour
2018-02-21 14:35 ` Trahe, Fiona
2018-02-21 19:35 ` Ahmed Mansour
2018-02-22 4:47 ` Verma, Shally
2018-02-22 19:35 ` Ahmed Mansour
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DB3PR0402MB3852855639B5464D3AE1A441E1FA0@DB3PR0402MB3852.eurprd04.prod.outlook.com \
--to=ahmed.mansour@nxp.com \
--cc=Ashish.Gupta@cavium.com \
--cc=Mahipal.Challa@cavium.com \
--cc=NarayanaPrasad.Athreya@cavium.com \
--cc=Shally.Verma@cavium.com \
--cc=Sunila.Sahu@cavium.com \
--cc=deepak.k.jain@intel.com \
--cc=dev@dpdk.org \
--cc=fiona.trahe@intel.com \
--cc=hemant.agrawal@nxp.com \
--cc=pablo.de.lara.guarch@intel.com \
--cc=roy.pledge@nxp.com \
--cc=youri.querry_1@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).