Re: Event device early back-pressure indication

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Mattias Rönnblom" <mattias.ronnblom@ericsson.com>
To: Jerin Jacob <jerinjacobk@gmail.com>
Cc: "Jerin Jacob Kollanukkaran" <jerinj@marvell.com>,
	"timothy.mcdaniel@intel.com" <timothy.mcdaniel@intel.com>,
	"Hemant Agrawal" <hemant.agrawal@nxp.com>,
	"Harry van Haaren" <harry.van.haaren@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Svante Järvstråt" <svante.jarvstrat@ericsson.com>,
	"Heng Wang" <heng.wang@ericsson.com>,
	"Stefan Sundkvist" <stefan.sundkvist@ericsson.com>,
	"Peter Nilsson" <peter.nilsson@ericsson.com>,
	"Maria Lingemark" <maria.lingemark@ericsson.com>,
	"Pavan Nikhilesh" <pbhagavatula@marvell.com>
Subject: Re: Event device early back-pressure indication
Date: Thu, 27 Apr 2023 09:15:38 +0000	[thread overview]
Message-ID: <46944264-0a60-1235-bdd1-58dab271b15b@ericsson.com> (raw)
In-Reply-To: <CALBAE1OcQkqp1Rov8akD7Vy0orjiuvdJbzZfwLsDo0dy7ScSuA@mail.gmail.com>

On 2023-04-19 13:06, Jerin Jacob wrote:
> On Mon, Apr 17, 2023 at 9:06 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>>
>> On 2023-04-17 14:52, Jerin Jacob wrote:
>>> On Thu, Apr 13, 2023 at 12:24 PM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>
>>>>
>>>> void
>>>> rte_event_return_new_credits(...);
>>>>
>>>> Thoughts?
>>>
>>> I see the following cons on this approach.
>>>
>>
>> Does the use case in my original e-mail seem like a reasonable one to
>> you? If yes, is there some way one could solve this problem with a
>> clever use of the current Eventdev API? That would obviously be preferable.
> 
> I think, the use case is reasonable. For me, most easy path to achieve
> the functionality
> is setting rte_event_dev_config::nb_events_limit as for a given
> application always
> targeted to work X number of packets per second. Giving that upfront
> kind of make
> life easy for application writers and drivers at the cost of
> allocating required memory.
> 

Could you unpack that a little? How would you derive the nb_events_limit 
from the targeted pps throughput? In my world, they are pretty much 
orthogonal. nb_events_limit just specifies the maximum number of 
buffered events (i.e., events/packets in-flight in the pipeline).

Are you thinking about a system where you do input rate shaping (e.g, on 
the aggregate flow, e.g., from NIC+timer wheel), to some fixed rate? A 
rate you know with some reasonable certainty can be sustained.

Most non-trivial applications will vary in capacity depending on packet 
size, number of flows, types of flows, flow life time, non-packet 
processing cache or DDR pressure, etc.

In any system where you never risk accepting new items of work at a 
higher pace than the system is able to finish them, any mechanism 
designed to help you deal with work scheduler back pressure (at the 
point of new event enqueue) is pointless of course.

Or maybe you are you thinking of a system where the EAL threads almost 
never enqueues new events (only forward-type events)? In other words, a 
system where NIC and/or timer hardware is the source of almost-all new 
work, and both of those are tightly integrated with the event device?

>>
>>> # Adding multiple APIs in fast path to driver layer may not
>>> performance effective solution.
>>
>> For event devices with a software-managed credit system, pre-allocation
>> would be very cheap. And, if an application prefer to handle back
>> pressure after-the-fact, that option is still available.
> 
> I am worried about exposing PMD calls and application starts calling per packet,
> e.s.p with burst size = 1 for latency critical applications.
> 
> 
>>
>>> # At least for cnxk HW, credits are for device, not per port. So cnxk
>>> HW implementation can not use this scheme.
>>>
>>
>> DSW's credit pool is also per device, but are cached on a per-port
>> basis. Does cnxk driver rely on the hardware to signal "new event" back
>> pressure? (From the driver code it looks like that is the case.)
> 
> Yes. But we can not really cache it per port without introducing
> complex atomic logic.
> 

You could defer back pressure management to software altogether. If you 
trade some accuracy (in terms of exactly how many in-flight events are 
allowed), the mechanism is both pretty straight-forward to implement and 
cycle-efficient.

>>
>>> Alternative solution could be, adding new flag for
>>> rte_enqueue_new_burst(), where drivers waits until credit is available
>>> to reduce the application overhead > and support in different HW implementations if this use case critical.
>>>
>>>    #define RTE_EVENT_FLAG_WAIT_TILL_CREDIT_AVILABLE (UINT32_C(1) << 0)
>>>
>>>
>>
>> This solution only works if the event device is the only source of work
>> for the EAL thread. That is a really nice model, but I wouldn't trust on
>> that to always be the case.
> 
> For non EAL thread, I am assuming it is HW event adapter kind of case,

What case is this? I think we can leave out non-EAL threads (registered 
threads, unregistered can't even call into the Eventdev API), since to 
the extent the use an event device, it will be very limited.

> In such case, they don't need to wait. I think, for SW EAL threads case only
> we need to wait as application is expecting to make sure wait till
> the credit is available to avoid error handling in application.
> 

That sounds potentially very wasteful, if the time it has to wait is 
long. In the worst case, if all lcores hit this limit at the same time, 
the result is a deadlock, where every thread waits for some other 
threads to finish off enough work from the pipeline's backlog, to make 
the in-flight events go under the new_event_threshold.

>>
>> Also, there may be work that should only be performed, if the system is
>> not under very high load. Credits being available, especially combined
>> with a flexible new even threshold would be an indicator.
>>
>> Another way would be to just provide an API call that gave an indication
>> of a particular threshold has been reached (or simply return an
>> approximation of the number of in-flight events). Such a mechanism
>> wouldn't be able to leave any guarantees, but could make a future
>> enqueue operation very likely to succeed.
> 
> Giving rte_event_dev_credits_avaiable(device_id) should be OK provided
> it is not expecting fine-grained accuracy.  But my worry is applications starts
> calling that per packet. Marking correct documentation may help. Not sure.
> 
>>
>>>>
>>>> Best regards,
>>>>           Mattias
>>

     prev parent reply	other threads:[~2023-04-27  9:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-13  6:54 Mattias Rönnblom
2023-04-13 12:55 ` Heng Wang
2023-04-17 12:52 ` Jerin Jacob
2023-04-17 15:36   ` Mattias Rönnblom
2023-04-19 11:06     ` Jerin Jacob
2023-04-27  9:15       ` Mattias Rönnblom [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46944264-0a60-1235-bdd1-58dab271b15b@ericsson.com \
    --to=mattias.ronnblom@ericsson.com \
    --cc=dev@dpdk.org \
    --cc=harry.van.haaren@intel.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=heng.wang@ericsson.com \
    --cc=jerinj@marvell.com \
    --cc=jerinjacobk@gmail.com \
    --cc=maria.lingemark@ericsson.com \
    --cc=pbhagavatula@marvell.com \
    --cc=peter.nilsson@ericsson.com \
    --cc=stefan.sundkvist@ericsson.com \
    --cc=svante.jarvstrat@ericsson.com \
    --cc=timothy.mcdaniel@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).