From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 52AF0439A9;
	Tue, 23 Jan 2024 17:19:23 +0100 (CET)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 2E8E5402C5;
	Tue, 23 Jan 2024 17:19:23 +0100 (CET)
Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3])
 by mails.dpdk.org (Postfix) with ESMTP id CE1D5402B0
 for <dev@dpdk.org>; Tue, 23 Jan 2024 17:19:21 +0100 (CET)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 7A6FC1A737
 for <dev@dpdk.org>; Tue, 23 Jan 2024 17:19:21 +0100 (CET)
Received: by mail.lysator.liu.se (Postfix, from userid 1004)
 id 6EEF21A736; Tue, 23 Jan 2024 17:19:21 +0100 (CET)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on
 hermod.lysator.liu.se
X-Spam-Level: 
X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED,AWL,
 T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0
X-Spam-Score: -1.4
Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se
 [62.63.215.114])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id 33A4F1A6BB;
 Tue, 23 Jan 2024 17:19:19 +0100 (CET)
Message-ID: <cabd7136-fdf4-4cc9-9672-2f025f609d1d@lysator.liu.se>
Date: Tue, 23 Jan 2024 17:19:18 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v2 10/11] eventdev: RFC clarify comments on scheduling
 types
To: Bruce Richardson <bruce.richardson@intel.com>, dev@dpdk.org
Cc: jerinj@marvell.com, mattias.ronnblom@ericsson.com,
 abdullah.sevincer@intel.com, sachin.saxena@oss.nxp.com,
 hemant.agrawal@nxp.com, pbhagavatula@marvell.com, pravin.pathak@intel.com
References: <20240118134557.73172-1-bruce.richardson@intel.com>
 <20240119174346.108905-1-bruce.richardson@intel.com>
 <20240119174346.108905-11-bruce.richardson@intel.com>
Content-Language: en-US
From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= <hofors@lysator.liu.se>
In-Reply-To: <20240119174346.108905-11-bruce.richardson@intel.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: ClamAV using ClamSMTP
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On 2024-01-19 18:43, Bruce Richardson wrote:
> The description of ordered and atomic scheduling given in the eventdev
> doxygen documentation was not always clear. Try and simplify this so
> that it is clearer for the end-user of the application
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
> 
> NOTE TO REVIEWERS:
> I've updated this based on my understanding of what these scheduling
> types are meant to do. It matches my understanding of the support
> offered by our Intel DLB2 driver, as well as the SW eventdev, and I
> believe the DSW eventdev too. If it does not match the behaviour of
> other eventdevs, let's have a discussion to see if we can reach a good
> definition of the behaviour that is common.
> ---
>   lib/eventdev/rte_eventdev.h | 47 ++++++++++++++++++++-----------------
>   1 file changed, 25 insertions(+), 22 deletions(-)
> 
> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> index 2c6576e921..cb13602ffb 100644
> --- a/lib/eventdev/rte_eventdev.h
> +++ b/lib/eventdev/rte_eventdev.h
> @@ -1313,26 +1313,24 @@ struct rte_event_vector {
>   #define RTE_SCHED_TYPE_ORDERED          0
>   /**< Ordered scheduling
>    *
> - * Events from an ordered flow of an event queue can be scheduled to multiple
> + * Events from an ordered event queue can be scheduled to multiple

What is the rationale for this change?

An implementation that impose a total order on all events on a 
particular ordered queue will still adhere to the current, more relaxed, 
per-flow ordering semantics.

An application wanting a total order would just set the flow id to 0 on 
all events destined that queue, and it would work on all event devices.

Why don't you just put a note in the DLB driver saying "btw it's total 
order", so any application where per-flow ordering is crucial for 
performance (i.e., where the potentially needless head-of-line blocking 
is an issue) can use multiple queues when running with the DLB.

In the API as-written, the app is free to express more relaxed ordering 
requirements (i.e., to have multiple flows) and it's up to the event 
device to figure out if it's in a position where it can translate this 
to lower latency.

>    * ports for concurrent processing while maintaining the original event order.

Maybe it's worth mentioning what is the original event order. "(i.e., 
the order in which the events were enqueued to the queue)". Especially 
since one like to specify what ordering guarantees one have of events 
enqueued to the same queue on different ports and by different lcores).

I don't know where that information should go though, since it's 
relevant for both atomic and ordered-type queues.

>    * This scheme enables the user to achieve high single flow throughput by
> - * avoiding SW synchronization for ordering between ports which bound to cores.
> - *
> - * The source flow ordering from an event queue is maintained when events are
> - * enqueued to their destination queue within the same ordered flow context.
> - * An event port holds the context until application call
> - * rte_event_dequeue_burst() from the same port, which implicitly releases
> - * the context.
> - * User may allow the scheduler to release the context earlier than that
> - * by invoking rte_event_enqueue_burst() with RTE_EVENT_OP_RELEASE operation.
> - *
> - * Events from the source queue appear in their original order when dequeued
> - * from a destination queue.
> - * Event ordering is based on the received event(s), but also other
> - * (newly allocated or stored) events are ordered when enqueued within the same
> - * ordered context. Events not enqueued (e.g. released or stored) within the
> - * context are  considered missing from reordering and are skipped at this time
> - * (but can be ordered again within another context).
> + * avoiding SW synchronization for ordering between ports which are polled by
> + * different cores.
> + *
> + * As events are scheduled to ports/cores, the original event order from the
> + * source event queue is recorded internally in the scheduler. As events are
> + * returned (via FORWARD type enqueue) to the scheduler, the original event
> + * order is restored before the events are enqueued into their new destination
> + * queue.

Delete the first sentence on implementation.

"As events are re-enqueued to the next queue (with the op field set to 
RTE_EVENT_OP_FORWARD), the event device restores the original event 
order before the events arrive on the destination queue."

> + *
> + * Any events not forwarded, ie. dropped explicitly via RELEASE or implicitly
> + * released by the next dequeue from a port, are skipped by the reordering
> + * stage and do not affect the reordering of returned events.
> + *
> + * The ordering behaviour of NEW events with respect to FORWARD events is
> + * undefined and implementation dependent.

For some reason I find this a little vague. "NEW and FORWARD events 
enqueued to a queue are not ordered in relation to each other (even if 
the flow id is the same)."

I think I agree that NEW shouldn't be ordered vis-a-vi FORWARD, but 
maybe one should say that an event device should avoid excessive 
reordering NEW and FORWARD events.

I think it would also be helpful to address port-to-port ordering 
guarantees (or a lack thereof).

"Events enqueued on one port are not ordered in relation to events 
enqueued on some other port."

Or are they? Not in DSW, at least, and I'm not sure I see a use case for 
such a guarantee, but it's a little counter-intuitive to have them 
potentially re-shuffled.

(This is also relevant for atomic queues.)

>    *
>    * @see rte_event_queue_setup(), rte_event_dequeue_burst(), RTE_EVENT_OP_RELEASE
>    */
> @@ -1340,18 +1338,23 @@ struct rte_event_vector {
>   #define RTE_SCHED_TYPE_ATOMIC           1
>   /**< Atomic scheduling
>    *
> - * Events from an atomic flow of an event queue can be scheduled only to a
> + * Events from an atomic flow, identified by @ref rte_event.flow_id,

A flow is identified by the combination of queue_id and flow_id, so if 
you reference one you should also reference the other.

> + * of an event queue can be scheduled only to a
>    * single port at a time. The port is guaranteed to have exclusive (atomic)
>    * access to the associated flow context, which enables the user to avoid SW
>    * synchronization. Atomic flows also help to maintain event ordering

"help" here needs to go, I think. It sounds like a best-effort affair. 
The atomic queue ordering guarantees (or the lack thereof) should be 
spelled out.

"Event order in an atomic flow is maintained."

> - * since only one port at a time can process events from a flow of an
> + * since only one port at a time can process events from each flow of an
>    * event queue.

Yes, and *but also since* the event device is not reshuffling events 
enqueued to an atomic queue. And that's more complicated than just 
something that falls out of atomicity, especially if you assume that 
FORWARD type enqueues are not ordered with other FORWARD type enqueues 
on a different port.

>    *
> - * The atomic queue synchronization context is dedicated to the port until
> + * The atomic queue synchronization context for a flow is dedicated to the port until

What is an "atomic queue synchronization context" (except for something 
that makes for long sentences).

How about:
"The atomic flow is locked to the port until /../"

You could also used the word "bound" instead of "locked".

>    * application call rte_event_dequeue_burst() from the same port,
>    * which implicitly releases the context. User may allow the scheduler to
>    * release the context earlier than that by invoking rte_event_enqueue_burst()
> - * with RTE_EVENT_OP_RELEASE operation.
> + * with RTE_EVENT_OP_RELEASE operation for each event from that flow. The context
> + * is only released once the last event from the flow, outstanding on the port,
> + * is released. So long as there is one event from an atomic flow scheduled to
> + * a port/core (including any events in the port's dequeue queue, not yet read
> + * by the application), that port will hold the synchronization context.

In case you like the "atomic flow locked/bound to port", this part would 
also need updating.

Maybe here is a good place to add a note on memory ordering and event 
ordering.

"Any memory stores done as a part of event processing will be globally 
visible before the next event in the same atomic flow is dequeued on a 
different lcore."

I.e., enqueue includes write barrier before the event can be seen.

One should probably mentioned a rmb in dequeue as well.

>    *
>    * @see rte_event_queue_setup(), rte_event_dequeue_burst(), RTE_EVENT_OP_RELEASE
>    */