From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D241F431F8; Wed, 25 Oct 2023 09:40:58 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5B841402C8; Wed, 25 Oct 2023 09:40:58 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id AFB5E402BD for ; Wed, 25 Oct 2023 09:40:56 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 538AB1D54B for ; Wed, 25 Oct 2023 09:40:56 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 478871DA16; Wed, 25 Oct 2023 09:40:56 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=ALL_TRUSTED,AWL autolearn=disabled version=3.4.6 X-Spam-Score: -1.5 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 223FE1DA14; Wed, 25 Oct 2023 09:40:54 +0200 (CEST) Message-ID: <1b36dd59-d9d0-434c-922f-6e0df8cafe7d@lysator.liu.se> Date: Wed, 25 Oct 2023 09:40:54 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Eventdev dequeue-enqueue event correlation To: Bruce Richardson Cc: "dev@dpdk.org" , Jerin Jacob , Peter Nilsson , svante.jarvstrat@ericsson.com, Harry van Haaren , Abdullah Sevincer , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= References: Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2023-10-24 11:10, Bruce Richardson wrote: > On Tue, Oct 24, 2023 at 09:10:30AM +0100, Bruce Richardson wrote: >> On Mon, Oct 23, 2023 at 06:10:54PM +0200, Mattias Rönnblom wrote: >>> Hi. >>> >>> Consider an Eventdev app using atomic-type scheduling doing something like: >>> >>> struct rte_event events[3]; >>> >>> rte_event_dequeue_burst(dev_id, port_id, events, 3, 0); >>> >>> /* Assume three events were dequeued, and the application decides >>> * it's best off to processing event 0 and 2 consecutively */ >>> >>> process(&events[0]); >>> process(&events[2]); >>> >>> events[0].queue_id++; >>> events[0].op = RTE_EVENT_OP_FORWARD; >>> events[2].queue_id++; >>> events[2].op = RTE_EVENT_OP_FORWARD; >>> >>> rte_event_enqueue_burst(dev_id, port_id, &events[0], 1); >>> rte_event_enqueue_burst(dev_id, port_id, &events[2], 1); >>> >>> process(&events[1]); >>> events[1].queue_id++; >>> events[1].op = RTE_EVENT_OP_FORWARD; >>> >>> rte_event_enqueue_burst(dev_id, port_id, &events[1], 1); >>> >>> If one would just read the Eventdev API spec, they might expect this to work >>> (especially since impl_opaque hints as potentially be useful for the purpose >>> of identifying events). >>> >>> However, on certain event devices, it doesn't (and maybe rightly so). If >>> event 0 and 2 belongs to the same flow (queue id + flow id pair), and event >>> 1 belongs to some other, then this other flow would be "unlocked" at the >>> point of the second enqueue operation (and thus be processed on some other >>> core, in parallel). The first flow would still be needlessly "locked". >>> >>> Such event devices require the order of the enqueued events to be the same >>> as the dequeued events, using RTE_EVENT_OP_RELEASE type events as "fillers" >>> for dropped events. >>> >>> Am I missing something in the Eventdev API documentation? >>> >> >> Much more likely is that the documentation is missing something. We should >> explicitly clarify this behaviour, as it's required by a number of drivers. >> >>> Could an event device use the impl_opaque field to track the identity of an >>> event (and thus relax ordering requirements) and still be complaint toward >>> the API? >>> >> >> Possibly, but the documentation also doesn't report that the impl_opaque >> field must be preserved between dequeue and enqueue. When forwarding a >> packet it's well possible for an app to extract an mbuf from a dequeued >> event and create a new event for sending it back in to the eventdev. For Such a behavior would be in violation of a part of the Eventdev API contract actually specified. The rte_event struct documentation says about impl_opaque that "An implementation may use this field to hold implementation specific value to share between dequeue and enqueue operation. The application should not modify this field. " I see no other way to read this than that "an implementation" here is referring to an event device PMD. The requirement that the application can't modify this field only make sense in the context of "from dequeue to enqueue". >> example, if the first stage post-RX is doing classify, it's entirely >> possible for every single field in the event header to be different for the >> event returned compared to dequeue (flow_id recomputed, event type/source >> adjusted, target queue_id and priority updated, op type changed to forward >> from new, etc. etc.). >> >>> What happens if a RTE_EVENT_OP_NEW event is inserted into the mix of >>> OP_FORWARD and OP_RELEASE type events being enqueued? Again I'm not clear on >>> what the API says, if anything. >>> >> OP_NEW should have no effect on the "history-list" of events previousl >> dequeued. Again, our docs should clarify that explicitly. Thanks for >> calling all this out. >> > Looking at the docs we have, I would propose adding a new subsection "Event > Operations", as section 49.1.6 to [1]. There we could explain "New", > "Forward" and "Release" events - what they mean for the different queue > types and how to use them. That section could also cover the enqueue > ordering rules, as the use of event "history" is necessary to explain > releases and forwards. > > This seem reasonable? If nobody else has already started on updating docs > for this, I'm happy enough to give it a stab. > Batch dequeues not only provides an opportunity to amortize per-interaction overhead with the event device, it also allows the application to reshuffle the order in which it decides to process the events. Such reshuffling may have a very significant impact on performance. At a minimum, cache locality improves, and in case the app is able to "vector processing" (e.g., something akin to what fd.io VPP does), the gains may be further increased. One may argue the app/core should just "do what it's told" by the event device. After all, an event device is a work scheduler, and reshuffling items of work certainly counts as (micro-)scheduling work. However it's much to hope for to expect a fairly generic function, especially if it comes in the form of hardware, with a design frozen years ago, to be able to arrange the work in whatever is currently optimal order for one particular application. What such an app can do (or must do, if it has efficiency constraints) is to buffer the events on the output side, rearranging them in accordance to the yet-seemingly-undocumented Eventdev API contract. That's certainly possible, and not very difficult, but it seems to me that this really is the job something in the platform (e.g., in Eventdev or the event device PMD). One way out of this could be to add an "implicit release-*only*" mode of operation for eventdev. In such a mode, the RTE_SCHED_TYPE_ATOMIC per-flow "lock" (and its ORDERED equivalent, if there is one) would be held until the next dequeue. In such a mode, the difference between OP_FORWARD and OP_NEW events would just be the back-pressure watermark (new_event_threshold). That pre-rte_event_enqueue_burst() buffering would prevent the event device from releasing "locks" that could otherwise be released, but the typical cost of event device interaction is so high so I have my doubts about how useful that feature is. If you are worried about "locks" held for a long time, one may need to use short bursts anyway (since worst-case critical section length is not reduced by such RELEASEs). Another option would be to have the current RTE_EVENT_DEV_CAP_BURST_MODE capable PMDs start using the "impl_opaque" field for the purpose of matching in and out events. It would require applications to actually start adhering to the "don't touch impl_opaque" requirement of the Eventdev API. Those "fixes" are not mutually exclusive. A side note: it's unfortunate there are no bits in the rte_event struct that can be used for "event id"/"event SN"/"event dequeue idx" type information, if an app would like to work around this issue with current PMDs. > /Bruce > > [1] https://doc.dpdk.org/guides-23.07/prog_guide/eventdev.html