From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7B925A0588; Thu, 16 Apr 2020 19:19:27 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CFB041DD75; Thu, 16 Apr 2020 19:19:26 +0200 (CEST) Received: from mail-io1-f45.google.com (mail-io1-f45.google.com [209.85.166.45]) by dpdk.org (Postfix) with ESMTP id 086971DAC1 for ; Thu, 16 Apr 2020 19:19:24 +0200 (CEST) Received: by mail-io1-f45.google.com with SMTP id b12so21802506ion.8 for ; Thu, 16 Apr 2020 10:19:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=5fyiJbXtuuukEpHqdPfNGoszEzEEob+IxTOqQ4XiNNM=; b=b0oyHZKotSCcpRVSCkwChy2EyKaOmyek5UfEF6zrShToFRwNnXqFVW5rj+cqDzr6+m Fp39JuhX+lEVTjZU+AKgv0Dh+JN09wLdofnI63Zm41AHU3oDaw6G68SMrEG4GlYscKKe IkqBYOlKe/4kORy7R5Z/Gw+uNs+Z8HoXpykcbJR32qDo7V0qni8WPE8NG+f2bDiVAGaI 5pjTlY4G/78M9N6gU09FlYPDHSAe9mx/2WUSuwHWH2g09PjhKfMhmWmANfJTPtD1bQQv XiQH8S0K1V4SjYZHucr4Q+TXP0rqgO4Au31UjQrV2QW676bANluBsWLJsqH5wZMdNr3k ymIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=5fyiJbXtuuukEpHqdPfNGoszEzEEob+IxTOqQ4XiNNM=; b=kvMoTDqCcgYOaML3wdav/aJOBog5FdwqzjjiV6eRH9e/hPSwcqhAekYI0qkD3VpGM3 4RPHC8SJvHd8+AT/j5fSzyu+uhulBf0zSfLEa6GSd7wEKvcXvzkVinNkHWy1g18ZO8MV fMo6MJeqFj3iwZoqzC4MBxgDZlPxNETkxSIiFXCMUIh4y3JFN9Y+MkC+ccddu1BIoiYr kiwa+9JlbqXXlKrWU6hfCFA8axNgTJGyjblzlQdtVRyinw7GyYUOPanyHDDjQ/GHiqoD 0ztcrg9f3FQktsQ80z4n9qwJ/vjEoNtO3kgChX3fiBfGX+ECt5rYboM72pyxN2OMwodq L81g== X-Gm-Message-State: AGi0PuZwYcsaJ8BzsfzRq9fxWyC1UbiXJgQIDJqmt9MOu+pytfnsDOFM 6Srx4I0OnsWfukyhOehbX8TjQdmoCOEzJL/HeU7A1g== X-Google-Smtp-Source: APiQypJhhapUt27l86MJpGHGmu2pHvZg5g63clVpY69l1POyHZbliWQiXWslS3mV1ON/FCadgnPaSMaLcSRR4HPd6ZA= X-Received: by 2002:a05:6602:2f87:: with SMTP id u7mr31216944iow.94.1587057564059; Thu, 16 Apr 2020 10:19:24 -0700 (PDT) MIME-Version: 1.0 References: <20200408175655.18879-1-mattias.ronnblom@ericsson.com> <86ff8b8f-7865-7fe2-f853-e88d2a64347d@ericsson.com> <2d7196e0-9edf-5753-f834-0e231eb3de43@ericsson.com> <38d1df46-e520-1c40-c982-b59c0bbe0b32@ericsson.com> In-Reply-To: From: Jerin Jacob Date: Thu, 16 Apr 2020 22:49:08 +0530 Message-ID: To: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= Cc: dpdk-dev , Jerin Jacob , Gage Eads , Bruce Richardson Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [RFC 1/3] eventdev: allow for event devices requiring maintenance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Tue, Apr 14, 2020 at 11:25 PM Mattias R=C3=B6nnblom wrote: > > On 2020-04-14 18:15, Jerin Jacob wrote: > > On Tue, Apr 14, 2020 at 9:27 PM Mattias R=C3=B6nnblom > > wrote: > >> On 2020-04-10 15:00, Jerin Jacob wrote: > >>> On Thu, Apr 9, 2020 at 7:32 PM Mattias R=C3=B6nnblom > >>> wrote: > >>>> On 2020-04-09 15:32, Jerin Jacob wrote: > >>>>> On Thu, Apr 9, 2020 at 5:51 PM Mattias R=C3=B6nnblom > >>>>> wrote: > >>>>>> On 2020-04-08 21:36, Jerin Jacob wrote: > >>>>>>> On Wed, Apr 8, 2020 at 11:27 PM Mattias R=C3=B6nnblom > >>>>>>> wrote: > >>>>>>>> Extend Eventdev API to allow for event devices which require var= ious > >>>>>>>> forms of internal processing to happen, even when events are not > >>>>>>>> enqueued to or dequeued from a port. > >>>>>>>> > >>>>>>>> Signed-off-by: Mattias R=C3=B6nnblom > >>>>>>>> --- > >>>>>>>> lib/librte_eventdev/rte_eventdev.h | 65 +++++++++++++++= +++++++++++ > >>>>>>>> lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++ > >>>>>>>> 2 files changed, 79 insertions(+) > >>>>>>>> > >>>>>>>> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eve= ntdev/rte_eventdev.h > >>>>>>>> index 226f352ad..d69150792 100644 > >>>>>>>> --- a/lib/librte_eventdev/rte_eventdev.h > >>>>>>>> +++ b/lib/librte_eventdev/rte_eventdev.h > >>>>>>>> @@ -289,6 +289,15 @@ struct rte_event; > >>>>>>>> * single queue to each port or map a single queue to many = port. > >>>>>>>> */ > >>>>>>>> > >>>>>>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9) > >>>>>>>> +/**< Event device requires calls to rte_event_maintain() during > >>>>>>> This scheme would call for DSW specific API handling in fastpath. > >>>>>> Initially this would be so, but buffering events might yield perfo= rmance > >>>>>> benefits for more event devices than DSW. > >>>>>> > >>>>>> > >>>>>> In an application, it's often convenient, but sub-optimal from a > >>>>>> performance point of view, to do single-event enqueue operations. = The > >>>>>> alternative is to use an application-level buffer, and the flush t= his > >>>>>> buffer with rte_event_enqueue_burst(). If you allow the event devi= ce to > >>>>>> buffer, you get the simplicity of single-event enqueue operations,= but > >>>>>> without taking any noticeable performance hit. > >>>>> IMO, It is better to aggregate the burst by the application, as se= nding > >>>>> event by event to the driver to aggregate has performance due to co= st > >>>>> function pointer overhead. > >>>> That's a very slight overhead - but for optimal performance, sure. I= t'll > >>>> come at a cost in terms of code complexity. Just look at the adapter= s. > >>>> They do this already. I think some applications are ready to take th= e > >>>> extra 5-10 clock cycles or so it'll cost them to do the function cal= l > >>>> (provided the event device had buffering support). > >>> So Is there any advantage of moving aggregation logic to PMD? it is c= ostly. > >> > >> What do you mean by aggregation logic? > > aggregation =3D=3D buffering. > > > The reason I put it into DSW to begin with was that it yielded a > significant performance benefit, for situations where the application > would enqueue() few or even single events per call. For DSW it will > translate to lower DPDK event ring overhead. I would imagine it could > improve performance for hardware-based event devices as well. Yes. we are already doing this in application. It makes sense for buffering= . > > > >> > >>>>> Another concern is the frequency of calling rte_event_maintain() fu= nction by > >>>>> the application, as the timing requirements will vary differently b= y > >>>>> the driver to driver and application to application. > >>>>> IMO, It is not portable and I believe the application should not be > >>>>> aware of those details. If the driver needs specific maintenance > >>>>> function for any other reason then better to use DPDK SERVICE core = infra. > >>>> The only thing the application needs to be aware of, is that it need= s to > >>>> call rte_event_maintain() as often as it would have called dequeue()= in > >>>> your "typical worker" example. To make sure this call is cheap-enoug= h is > >>>> up to the driver, and this needs to hold true for all event devices = that > >>>> needs maintenance. > >>> Why not rte_event_maintain() can't do either in dequeue() or enqueue(= ) > >>> in the driver context? Either one of them has to be called > >>> periodically by application > >>> in any case? > >> > >> No, producer-only ports can go idle for long times. For applications > >> that don't "go idle" need not worry about the maintain function. > > If I understand it correctly, If the worker does not call enqueue() or = dequeue() > > for a long time then maintain() needs to be called by the application. > > > > That's where I concern with this API. What is the definition of long > > time frame?(ns or ticks?) > > Will it be changing driver to driver and arch to arch? And it is > > leaking the driver requirements to the application. > > > > It's difficult to quantify exactly, but the rate should be on the same > order of magnitude as you would call dequeue() on a consumer-type worker The challenge is if another driver has a different requirement for maintain= () interms of frequecey if is a public API then how we will abstract? > port. All the RTE_EVENT_DEV_CAP_* are essentially represents such > leakage, where the event device driver and/or the underlying hardware > express various capabilities and limitations. I agree. But in fastpath, we do not bring any such _functional_ difference. If it is on a slow path then no issue at all. > > > I'm not sure it needs to be much more complicated for the application to > handle than the change to the event adapters I included in the patch. > There, it boils down the service function call rate, which would be high > during low load (causing buffers to be flush quickly etc), and a little > lower during high lcore load. > > > >> > >>>> If you plan to use a non-buffering hardware device driver or a soft, > >>>> centralized scheduler that doesn't need this, it will also not set t= he > >>>> flag, and thus the application needs not care about the > >>>> rte_event_maintain() function. DPDK code such as the eventdev adapte= rs > >>>> do need to care, but the increase in complexity is slight, and the c= ost > >>>> of calling rte_maintain_event() on a maintenance-free devices is ver= y > >>>> low (since the then-NULL function pointer is in the eventdev struct, > >>>> likely on a cache-line already dragged in). > >>>> > >>>> > >>>> Unfortunately, DPDK doesn't have a per-core delayed-work mechanism. > >>>> Flushing event buffers (and other DSW "background work") can't be do= ne > >>>> on a service core, since they would work on non-MT-safe data structu= res > >>>> on the worker thread's event ports. > >>> Yes. Otherwise, DSW needs to update to support MT safe. > >> > >> I haven't been looking at this in detail, but I suspect it will be bot= h > >> complex and not very performant. One of problems that need to be solve= d > >> in such a solution, is the "pausing" of flows during migration. All > >> participating lcores needs to ACK that a flow is paused. > > Could you share the patch on the details on how much it costs? > > > I don't have a ready-made solution for making lcore ports thread-safe. OK > > > > > >> > >>>>>>>> + * periods when neither rte_event_dequeue_burst() nor > >>>>>>> The typical worker thread will be > >>>>>>> while (1) { > >>>>>>> rte_event_dequeue_burst(); > >>>>>>> ..proess.. > >>>>>>> rte_event_enqueue_burst(); > >>>>>>> } > >>>>>>> If so, Why DSW driver can't do the maintenance in driver context = in > >>>>>>> dequeue() call. > >>>>>>> > >>>>>> DSW already does maintenance on dequeue, and works well in the abo= ve > >>>>>> scenario. The typical worker does not need to care about the > >>>>>> rte_event_maintain() functions, since it dequeues events on a regu= lar basis. > >>>>>> > >>>>>> > >>>>>> What this RFC addresses is the more atypical (but still fairly com= mon) > >>>>>> case of a port being neither dequeued to or enqueued from on a reg= ular > >>>>>> basis. The timer and ethernet rx adapters are examples of such. > >>>>> If it is an Adapter specific use case problem then maybe, we have > >>>>> an option to fix the problem in adapter specific API usage or in th= at area. > >>>>> > >>>> It's not adapter specific, I think. There might be producer-only por= ts, > >>>> for example, which doesn't provide a constant stream of events, but > >>>> rather intermittent bursts. A traffic generator is one example of su= ch > >>>> an application, and there might be other, less synthetic ones as wel= l. > >>> In that case, the application knows the purpose of the eventdev port. > >>> Is changing eventdev spec to configure "port" or "queue" for that use > >>> case help? Meaning, DSW or > >>> Any driver can get the hint and change the function pointers > >>> accordingly for fastpath. > >>> For instance, do maintenance on enqueue() for such ports or so. > >> > >> This is what DSW does already today. A dequeue() call with a zero-leng= th > >> event array serves the purpose of rte_event_maintain(). It's a bit of = a > >> hack, in my opinion. > > I agree that it is the hack. > > > > One more concern we have we are adding API for the new driver requireme= nts and > > not updating the example application. Sharing the example application > > patch would > > enable us to understand the cost and usage.(i.e Cost wrt to other SW dr= ivers) > > > Good point. I suggest sharing patch based on existing app/adapter usage, based on that,= we can analyze more on abstraction and cost analysis. > > > > > > >> > >>>>>>>> + * rte_event_enqueue_burst() are called on a port. This will al= low the > >>>>>>>> + * event device to perform internal processing, such as flushin= g > >>>>>>>> + * buffered events, return credits to a global pool, or process > >>>>>>>> + * signaling related to load balancing. > >>>>>>>> + */ > >> >