DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: "Mattias Rönnblom" <hofors@lysator.liu.se>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"Jerin Jacob" <jerinjacobk@gmail.com>,
	"Peter Nilsson" <peter.j.nilsson@ericsson.com>,
	svante.jarvstrat@ericsson.com,
	"Harry van Haaren" <harry.van.haaren@intel.com>,
	"Abdullah Sevincer" <abdullah.sevincer@intel.com>,
	"Mattias Rönnblom" <mattias.ronnblom@ericsson.com>
Subject: Re: Eventdev dequeue-enqueue event correlation
Date: Wed, 25 Oct 2023 13:29:32 +0100	[thread overview]
Message-ID: <ZTkKLLeJPz47HYtf@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <1b36dd59-d9d0-434c-922f-6e0df8cafe7d@lysator.liu.se>

On Wed, Oct 25, 2023 at 09:40:54AM +0200, Mattias Rönnblom wrote:
> On 2023-10-24 11:10, Bruce Richardson wrote:
> > On Tue, Oct 24, 2023 at 09:10:30AM +0100, Bruce Richardson wrote:
> > > On Mon, Oct 23, 2023 at 06:10:54PM +0200, Mattias Rönnblom wrote:
> > > > Hi.
> > > > 
> > > > Consider an Eventdev app using atomic-type scheduling doing something like:
> > > > 
> > > >      struct rte_event events[3];
> > > > 
> > > >      rte_event_dequeue_burst(dev_id, port_id, events, 3, 0);
> > > > 
> > > >      /* Assume three events were dequeued, and the application decides
> > > >       * it's best off to processing event 0 and 2 consecutively */
> > > > 
> > > >      process(&events[0]);
> > > >      process(&events[2]);
> > > > 
> > > >      events[0].queue_id++;
> > > >      events[0].op = RTE_EVENT_OP_FORWARD;
> > > >      events[2].queue_id++;
> > > >      events[2].op = RTE_EVENT_OP_FORWARD;
> > > > 
> > > >      rte_event_enqueue_burst(dev_id, port_id, &events[0], 1);
> > > >      rte_event_enqueue_burst(dev_id, port_id, &events[2], 1);
> > > > 
> > > >      process(&events[1]);
> > > >      events[1].queue_id++;
> > > >      events[1].op = RTE_EVENT_OP_FORWARD;
> > > > 
> > > >      rte_event_enqueue_burst(dev_id, port_id, &events[1], 1);
> > > > 
> > > > If one would just read the Eventdev API spec, they might expect this to work
> > > > (especially since impl_opaque hints as potentially be useful for the purpose
> > > > of identifying events).
> > > > 
> > > > However, on certain event devices, it doesn't (and maybe rightly so). If
> > > > event 0 and 2 belongs to the same flow (queue id + flow id pair), and event
> > > > 1 belongs to some other, then this other flow would be "unlocked" at the
> > > > point of the second enqueue operation (and thus be processed on some other
> > > > core, in parallel). The first flow would still be needlessly "locked".
> > > > 
> > > > Such event devices require the order of the enqueued events to be the same
> > > > as the dequeued events, using RTE_EVENT_OP_RELEASE type events as "fillers"
> > > > for dropped events.
> > > > 
> > > > Am I missing something in the Eventdev API documentation?
> > > > 
> > > 
> > > Much more likely is that the documentation is missing something. We should
> > > explicitly clarify this behaviour, as it's required by a number of drivers.
> > > 
> > > > Could an event device use the impl_opaque field to track the identity of an
> > > > event (and thus relax ordering requirements) and still be complaint toward
> > > > the API?
> > > > 
> > > 
> > > Possibly, but the documentation also doesn't report that the impl_opaque
> > > field must be preserved between dequeue and enqueue. When forwarding a
> > > packet it's well possible for an app to extract an mbuf from a dequeued
> > > event and create a new event for sending it back in to the eventdev. For
> 
> Such a behavior would be in violation of a part of the Eventdev API contract
> actually specified. The rte_event struct documentation says about
> impl_opaque that "An implementation may use this field to hold
> implementation specific value to share between dequeue and enqueue
> operation. The application should not modify this field. "
> 
> I see no other way to read this than that "an implementation" here is
> referring to an event device PMD. The requirement that the application can't
> modify this field only make sense in the context of "from dequeue to
> enqueue".
> 

Yep, you are completely correct. For some reason, I had this in my head the
other way round, that it was for internal use between the enqueue and
dequeue. My mistake! :-(

> > > example, if the first stage post-RX is doing classify, it's entirely
> > > possible for every single field in the event header to be different for the
> > > event returned compared to dequeue (flow_id recomputed, event type/source
> > > adjusted, target queue_id and priority updated, op type changed to forward
> > > from new, etc. etc.).
> > > 
> > > > What happens if a RTE_EVENT_OP_NEW event is inserted into the mix of
> > > > OP_FORWARD and OP_RELEASE type events being enqueued? Again I'm not clear on
> > > > what the API says, if anything.
> > > > 
> > > OP_NEW should have no effect on the "history-list" of events previousl
> > > dequeued. Again, our docs should clarify that explicitly. Thanks for
> > > calling all this out.
> > > 
> > Looking at the docs we have, I would propose adding a new subsection "Event
> > Operations", as section 49.1.6 to [1]. There we could explain "New",
> > "Forward" and "Release" events - what they mean for the different queue
> > types and how to use them. That section could also cover the enqueue
> > ordering rules, as the use of event "history" is necessary to explain
> > releases and forwards.
> > 
> > This seem reasonable? If nobody else has already started on updating docs
> > for this, I'm happy enough to give it a stab.
> > 
> 
> Batch dequeues not only provides an opportunity to amortize per-interaction
> overhead with the event device, it also allows the application to reshuffle
> the order in which it decides to process the events.
> 
> Such reshuffling may have a very significant impact on performance. At a
> minimum, cache locality improves, and in case the app is able to "vector
> processing" (e.g., something akin to what fd.io VPP does), the gains may be
> further increased.
> 
> One may argue the app/core should just "do what it's told" by the event
> device. After all, an event device is a work scheduler, and reshuffling
> items of work certainly counts as (micro-)scheduling work.
> 
> However it's much to hope for to expect a fairly generic function,
> especially if it comes in the form of hardware, with a design frozen years
> ago, to be able to arrange the work in whatever is currently optimal order
> for one particular application.
> 
> What such an app can do (or must do, if it has efficiency constraints) is to
> buffer the events on the output side, rearranging them in accordance to the
> yet-seemingly-undocumented Eventdev API contract. That's certainly possible,
> and not very difficult, but it seems to me that this really is the job
> something in the platform (e.g., in Eventdev or the event device PMD).
> 
> One way out of this could be to add an "implicit release-*only*" mode of
> operation for eventdev.
> 
> In such a mode, the RTE_SCHED_TYPE_ATOMIC per-flow "lock" (and its ORDERED
> equivalent, if there is one) would be held until the next dequeue. In such a
> mode, the difference between OP_FORWARD and OP_NEW events would just be the
> back-pressure watermark (new_event_threshold).
> 
> That pre-rte_event_enqueue_burst() buffering would prevent the event device
> from releasing "locks" that could otherwise be released, but the typical
> cost of event device interaction is so high so I have my doubts about how
> useful that feature is. If you are worried about "locks" held for a long
> time, one may need to use short bursts anyway (since worst-case critical
> section length is not reduced by such RELEASEs).
> 
> Another option would be to have the current RTE_EVENT_DEV_CAP_BURST_MODE
> capable PMDs start using the "impl_opaque" field for the purpose of matching
> in and out events. It would require applications to actually start adhering
> to the "don't touch impl_opaque" requirement of the Eventdev API.
> 
> Those "fixes" are not mutually exclusive.
> 
> A side note: it's unfortunate there are no bits in the rte_event struct that
> can be used for "event id"/"event SN"/"event dequeue idx" type information,
> if an app would like to work around this issue with current PMDs.
> 
Lots of good points here. We'll take a look and see what we can do in our
drivers and any other ideas or suggestions.

/Bruce

  reply	other threads:[~2023-10-25 12:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-23 16:10 Mattias Rönnblom
2023-10-24  8:10 ` Bruce Richardson
2023-10-24  9:10   ` Bruce Richardson
2023-10-25  7:40     ` Mattias Rönnblom
2023-10-25 12:29       ` Bruce Richardson [this message]
2024-01-16 14:58       ` Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZTkKLLeJPz47HYtf@bricha3-MOBL.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=abdullah.sevincer@intel.com \
    --cc=dev@dpdk.org \
    --cc=harry.van.haaren@intel.com \
    --cc=hofors@lysator.liu.se \
    --cc=jerinjacobk@gmail.com \
    --cc=mattias.ronnblom@ericsson.com \
    --cc=peter.j.nilsson@ericsson.com \
    --cc=svante.jarvstrat@ericsson.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).