Event device early back-pressure indication

DPDK patches and discussions
 help / color / mirror / Atom feed

* Event device early back-pressure indication
@ 2023-04-13  6:54 Mattias Rönnblom
  2023-04-13 12:55 ` Heng Wang
  2023-04-17 12:52 ` Jerin Jacob
  0 siblings, 2 replies; 6+ messages in thread
From: Mattias Rönnblom @ 2023-04-13  6:54 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran
  Cc: timothy.mcdaniel, Hemant Agrawal, Harry van Haaren, dev,
	Svante Järvstråt, Heng Wang, Stefan Sundkvist,
	Peter Nilsson, Maria Lingemark

Hi.

Consider this situation:

An application EAL thread receives an eventdev event (or some other 
stimuli), which in turn triggers some action. This action results in a 
number of new events being prepared, and a number of associated state 
changes in the application.

On attempting to enqueue the newly created batch of RTE_EVENT_OP_NEW 
events, it turns out the system is very busy, and the event device back 
pressures (i.e., returns a short count in rte_event_enqueue_new_burst()).

The application may now be a in tough spot, in case:

A) The processing was expensive and/or difficult to reverse (e.g., 
destructive changes were made to a packet).
B) The application does not have the option to discard the events (and 
any related mbufs).

In this situation, it would be very beneficial to the application if the 
event device give could give some assurance that a future enqueue 
operation will succeed (in its entirety).

 From what I understand from today's Eventdev API, there are no good 
options. You *may* be able to do some heuristics based on a event 
device-specific xstat (to infer the event device load), but that is not 
even close to "good". You may also try some application-level buffering, 
but that assumes that the packets/state changes are going to be 
identical, if they are to be sent at a later time. It would drive 
complexity in the app.

One seemingly clean way to solve this issue is to allow pre-allocation 
of RTE_NEW_OP_NEW credits. The eventdev API doesn't talk about credits, 
but at least in the event device implementations I've come across use 
some kind of credit system internally.

uint16_t
rte_event_alloc_new_credits(uint8_t dev_id, uint8_t port_id, uint16_t 
count);

In addition to this function, the application would also need some way 
to indicate, at the point of enqueue, that the credits have already been 
allocated.

I don't see any need for pre-allocating credits for non-RTE_OP_NEW 
events. (Some event devices don't even use credits to track such 
events.) Back pressure on RTE_OP_FORWARD usually spells disaster, in one 
form of the other.

You could use a bit in the rte_event struct for the purpose of signaling 
if its credit is pre-allocated. That would allow this change to happen, 
without any changes to the enqueue function prototypes.

However, this would require the event device to scan the event array.

I'm not sure I think there is a use case for mixing pre-allocated and 
non-pre-allocated events in the same burst.

If this burst-level separation is good enough, one could either change 
the existing rte_enqueue_new_burst() or add a new one. Something like:

uint16_t
rte_enqueue_new_burst(uint8_t dev_id, uint8_t port_id,
                       const struct rte_event ev[],
                       uint16_t nb_events, uint32_t flags);

#define RTE_EVENT_FLAG_PRE_CREDITS_ALLOCATED (UINT32_C(1) << 0)

A related shortcoming of the current eventdev API is that the 
new_event_threshold is tied to a port, which is impractical for 
applications which require different threshold for different kinds of 
events enqueued on the same port. One can use different ports, but that 
approach does not scale, since there may be significant memory and/or 
event device hardware resources tied to ports, and thus you cannot allow 
for a combinatorial explosion of ports.

This issue could be solve by allowing the application to specify the 
new_event_threshold, either per burst, or per event.

Per event doesn't make a lot of sense in practice, I think, since mixing 
events with different back pressure points will create head-of-line 
blocking. An early low-threshold event may prevent higher-indexed high 
threshold event in the same enqueue burst from being enqueued. This is 
the same reason it usually doesn't make sense to mix RTE_OP_NEW and 
RTE_OP_FORWARD events in the same burst.

Although the new_event_threshold seems completely orthogonal to the port 
to me, it could still serve as the default.

In case you find this a useful feature, it could be added to the credit 
allocation function.

uint16_t
rte_event_alloc_new_credits(uint8_t dev_id, uint8_t port_id, uint32_t 
new_event_threshold, uint16_t count);

If that is the only change, the user is required to pre-allocated 
credits to use a flexible new_event_threshold.

It seems to me that that might be something you can live with. Or, you 
add new enqueue_new_burst() variant where a new_event_threshold 
parameter is added.

It may also be useful to have a way to return credits, in case not all 
allocated was actually needed.

void
rte_event_return_new_credits(...);

Thoughts?

Best regards,
	Mattias

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Event device early back-pressure indication
  2023-04-13  6:54 Event device early back-pressure indication Mattias Rönnblom
@ 2023-04-13 12:55 ` Heng Wang
  2023-04-17 12:52 ` Jerin Jacob
  1 sibling, 0 replies; 6+ messages in thread
From: Heng Wang @ 2023-04-13 12:55 UTC (permalink / raw)
  To: Mattias Rönnblom, Jerin Jacob Kollanukkaran
  Cc: timothy.mcdaniel, Hemant Agrawal, Harry van Haaren, dev,
	Svante Järvstråt, Stefan Sundkvist, Peter Nilsson,
	Maria Lingemark

Hi,
  This interaction with eventdev introduces some overhead. Isn't it easier to just create an API to query the available credit for a certain event port?

Regards,
Heng

-----Original Message-----
From: Mattias Rönnblom <mattias.ronnblom@ericsson.com> 
Sent: Thursday, April 13, 2023 8:54 AM
To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
Cc: timothy.mcdaniel@intel.com; Hemant Agrawal <hemant.agrawal@nxp.com>; Harry van Haaren <harry.van.haaren@intel.com>; dev@dpdk.org; Svante Järvstråt <svante.jarvstrat@ericsson.com>; Heng Wang <heng.wang@ericsson.com>; Stefan Sundkvist <stefan.sundkvist@ericsson.com>; Peter Nilsson <peter.nilsson@ericsson.com>; Maria Lingemark <maria.lingemark@ericsson.com>
Subject: Event device early back-pressure indication

Hi.

Consider this situation:

An application EAL thread receives an eventdev event (or some other stimuli), which in turn triggers some action. This action results in a number of new events being prepared, and a number of associated state changes in the application.

On attempting to enqueue the newly created batch of RTE_EVENT_OP_NEW events, it turns out the system is very busy, and the event device back pressures (i.e., returns a short count in rte_event_enqueue_new_burst()).

The application may now be a in tough spot, in case:

A) The processing was expensive and/or difficult to reverse (e.g., destructive changes were made to a packet).
B) The application does not have the option to discard the events (and any related mbufs).

In this situation, it would be very beneficial to the application if the event device give could give some assurance that a future enqueue operation will succeed (in its entirety).

 From what I understand from today's Eventdev API, there are no good options. You *may* be able to do some heuristics based on a event device-specific xstat (to infer the event device load), but that is not even close to "good". You may also try some application-level buffering, but that assumes that the packets/state changes are going to be identical, if they are to be sent at a later time. It would drive complexity in the app.

One seemingly clean way to solve this issue is to allow pre-allocation of RTE_NEW_OP_NEW credits. The eventdev API doesn't talk about credits, but at least in the event device implementations I've come across use some kind of credit system internally.

uint16_t
rte_event_alloc_new_credits(uint8_t dev_id, uint8_t port_id, uint16_t count);

In addition to this function, the application would also need some way to indicate, at the point of enqueue, that the credits have already been allocated.

I don't see any need for pre-allocating credits for non-RTE_OP_NEW events. (Some event devices don't even use credits to track such
events.) Back pressure on RTE_OP_FORWARD usually spells disaster, in one form of the other.

You could use a bit in the rte_event struct for the purpose of signaling if its credit is pre-allocated. That would allow this change to happen, without any changes to the enqueue function prototypes.

However, this would require the event device to scan the event array.

I'm not sure I think there is a use case for mixing pre-allocated and non-pre-allocated events in the same burst.

If this burst-level separation is good enough, one could either change the existing rte_enqueue_new_burst() or add a new one. Something like:

uint16_t
rte_enqueue_new_burst(uint8_t dev_id, uint8_t port_id,
                       const struct rte_event ev[],
                       uint16_t nb_events, uint32_t flags);

#define RTE_EVENT_FLAG_PRE_CREDITS_ALLOCATED (UINT32_C(1) << 0)

A related shortcoming of the current eventdev API is that the new_event_threshold is tied to a port, which is impractical for applications which require different threshold for different kinds of events enqueued on the same port. One can use different ports, but that approach does not scale, since there may be significant memory and/or event device hardware resources tied to ports, and thus you cannot allow for a combinatorial explosion of ports.

This issue could be solve by allowing the application to specify the new_event_threshold, either per burst, or per event.

Per event doesn't make a lot of sense in practice, I think, since mixing events with different back pressure points will create head-of-line blocking. An early low-threshold event may prevent higher-indexed high threshold event in the same enqueue burst from being enqueued. This is the same reason it usually doesn't make sense to mix RTE_OP_NEW and RTE_OP_FORWARD events in the same burst.

Although the new_event_threshold seems completely orthogonal to the port to me, it could still serve as the default.

In case you find this a useful feature, it could be added to the credit allocation function.

uint16_t
rte_event_alloc_new_credits(uint8_t dev_id, uint8_t port_id, uint32_t new_event_threshold, uint16_t count);

If that is the only change, the user is required to pre-allocated credits to use a flexible new_event_threshold.

It seems to me that that might be something you can live with. Or, you add new enqueue_new_burst() variant where a new_event_threshold parameter is added.

It may also be useful to have a way to return credits, in case not all allocated was actually needed.

void
rte_event_return_new_credits(...);

Thoughts?

Best regards,
	Mattias

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Event device early back-pressure indication
  2023-04-13  6:54 Event device early back-pressure indication Mattias Rönnblom
  2023-04-13 12:55 ` Heng Wang
@ 2023-04-17 12:52 ` Jerin Jacob
  2023-04-17 15:36   ` Mattias Rönnblom
  1 sibling, 1 reply; 6+ messages in thread
From: Jerin Jacob @ 2023-04-17 12:52 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Jerin Jacob Kollanukkaran, timothy.mcdaniel, Hemant Agrawal,
	Harry van Haaren, dev, Svante Järvstråt, Heng Wang,
	Stefan Sundkvist, Peter Nilsson, Maria Lingemark

On Thu, Apr 13, 2023 at 12:24 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
>
> void
> rte_event_return_new_credits(...);
>
> Thoughts?

I see the following cons on this approach.

# Adding multiple APIs in fast path to driver layer may not
performance effective solution.
# At least for cnxk HW, credits are for device, not per port. So cnxk
HW implementation can not use this scheme.

Alternative solution could be, adding new flag for
rte_enqueue_new_burst(), where drivers waits until credit is available
to reduce the application overhead
and support in different HW implementations if this use case critical.

 #define RTE_EVENT_FLAG_WAIT_TILL_CREDIT_AVILABLE (UINT32_C(1) << 0)

>
> Best regards,
>         Mattias

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Event device early back-pressure indication
  2023-04-17 12:52 ` Jerin Jacob
@ 2023-04-17 15:36   ` Mattias Rönnblom
  2023-04-19 11:06     ` Jerin Jacob
  0 siblings, 1 reply; 6+ messages in thread
From: Mattias Rönnblom @ 2023-04-17 15:36 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Jerin Jacob Kollanukkaran, timothy.mcdaniel, Hemant Agrawal,
	Harry van Haaren, dev, Svante Järvstråt, Heng Wang,
	Stefan Sundkvist, Peter Nilsson, Maria Lingemark

On 2023-04-17 14:52, Jerin Jacob wrote:
> On Thu, Apr 13, 2023 at 12:24 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>>
>>
>> void
>> rte_event_return_new_credits(...);
>>
>> Thoughts?
> 
> I see the following cons on this approach.
> 

Does the use case in my original e-mail seem like a reasonable one to 
you? If yes, is there some way one could solve this problem with a 
clever use of the current Eventdev API? That would obviously be preferable.

> # Adding multiple APIs in fast path to driver layer may not
> performance effective solution.

For event devices with a software-managed credit system, pre-allocation 
would be very cheap. And, if an application prefer to handle back 
pressure after-the-fact, that option is still available.

> # At least for cnxk HW, credits are for device, not per port. So cnxk
> HW implementation can not use this scheme.
> 

DSW's credit pool is also per device, but are cached on a per-port 
basis. Does cnxk driver rely on the hardware to signal "new event" back 
pressure? (From the driver code it looks like that is the case.)

> Alternative solution could be, adding new flag for
> rte_enqueue_new_burst(), where drivers waits until credit is available
> to reduce the application overhead > and support in different HW implementations if this use case critical.
> 
>   #define RTE_EVENT_FLAG_WAIT_TILL_CREDIT_AVILABLE (UINT32_C(1) << 0)
> 
> 

This solution only works if the event device is the only source of work 
for the EAL thread. That is a really nice model, but I wouldn't trust on 
that to always be the case.

Also, there may be work that should only be performed, if the system is 
not under very high load. Credits being available, especially combined 
with a flexible new even threshold would be an indicator.

Another way would be to just provide an API call that gave an indication 
of a particular threshold has been reached (or simply return an 
approximation of the number of in-flight events). Such a mechanism 
wouldn't be able to leave any guarantees, but could make a future 
enqueue operation very likely to succeed.

>>
>> Best regards,
>>          Mattias

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Event device early back-pressure indication
  2023-04-17 15:36   ` Mattias Rönnblom
@ 2023-04-19 11:06     ` Jerin Jacob
  2023-04-27  9:15       ` Mattias Rönnblom
  0 siblings, 1 reply; 6+ messages in thread
From: Jerin Jacob @ 2023-04-19 11:06 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Jerin Jacob Kollanukkaran, timothy.mcdaniel, Hemant Agrawal,
	Harry van Haaren, dev, Svante Järvstråt, Heng Wang,
	Stefan Sundkvist, Peter Nilsson, Maria Lingemark,
	Pavan Nikhilesh

On Mon, Apr 17, 2023 at 9:06 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> On 2023-04-17 14:52, Jerin Jacob wrote:
> > On Thu, Apr 13, 2023 at 12:24 PM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >>
> >>
> >> void
> >> rte_event_return_new_credits(...);
> >>
> >> Thoughts?
> >
> > I see the following cons on this approach.
> >
>
> Does the use case in my original e-mail seem like a reasonable one to
> you? If yes, is there some way one could solve this problem with a
> clever use of the current Eventdev API? That would obviously be preferable.

I think, the use case is reasonable. For me, most easy path to achieve
the functionality
is setting rte_event_dev_config::nb_events_limit as for a given
application always
targeted to work X number of packets per second. Giving that upfront
kind of make
life easy for application writers and drivers at the cost of
allocating required memory.

>
> > # Adding multiple APIs in fast path to driver layer may not
> > performance effective solution.
>
> For event devices with a software-managed credit system, pre-allocation
> would be very cheap. And, if an application prefer to handle back
> pressure after-the-fact, that option is still available.

I am worried about exposing PMD calls and application starts calling per packet,
e.s.p with burst size = 1 for latency critical applications.


>
> > # At least for cnxk HW, credits are for device, not per port. So cnxk
> > HW implementation can not use this scheme.
> >
>
> DSW's credit pool is also per device, but are cached on a per-port
> basis. Does cnxk driver rely on the hardware to signal "new event" back
> pressure? (From the driver code it looks like that is the case.)

Yes. But we can not really cache it per port without introducing
complex atomic logic.

>
> > Alternative solution could be, adding new flag for
> > rte_enqueue_new_burst(), where drivers waits until credit is available
> > to reduce the application overhead > and support in different HW implementations if this use case critical.
> >
> >   #define RTE_EVENT_FLAG_WAIT_TILL_CREDIT_AVILABLE (UINT32_C(1) << 0)
> >
> >
>
> This solution only works if the event device is the only source of work
> for the EAL thread. That is a really nice model, but I wouldn't trust on
> that to always be the case.

For non EAL thread, I am assuming it is HW event adapter kind of case,
In such case, they don't need to wait. I think, for SW EAL threads case only
we need to wait as application is expecting to make sure wait till
the credit is available to avoid error handling in application.

>
> Also, there may be work that should only be performed, if the system is
> not under very high load. Credits being available, especially combined
> with a flexible new even threshold would be an indicator.
>
> Another way would be to just provide an API call that gave an indication
> of a particular threshold has been reached (or simply return an
> approximation of the number of in-flight events). Such a mechanism
> wouldn't be able to leave any guarantees, but could make a future
> enqueue operation very likely to succeed.

Giving rte_event_dev_credits_avaiable(device_id) should be OK provided
it is not expecting fine-grained accuracy.  But my worry is applications starts
calling that per packet. Marking correct documentation may help. Not sure.

>
> >>
> >> Best regards,
> >>          Mattias
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Event device early back-pressure indication
  2023-04-19 11:06     ` Jerin Jacob
@ 2023-04-27  9:15       ` Mattias Rönnblom
  0 siblings, 0 replies; 6+ messages in thread
From: Mattias Rönnblom @ 2023-04-27  9:15 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Jerin Jacob Kollanukkaran, timothy.mcdaniel, Hemant Agrawal,
	Harry van Haaren, dev, Svante Järvstråt, Heng Wang,
	Stefan Sundkvist, Peter Nilsson, Maria Lingemark,
	Pavan Nikhilesh

On 2023-04-19 13:06, Jerin Jacob wrote:
> On Mon, Apr 17, 2023 at 9:06 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>>
>> On 2023-04-17 14:52, Jerin Jacob wrote:
>>> On Thu, Apr 13, 2023 at 12:24 PM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>
>>>>
>>>> void
>>>> rte_event_return_new_credits(...);
>>>>
>>>> Thoughts?
>>>
>>> I see the following cons on this approach.
>>>
>>
>> Does the use case in my original e-mail seem like a reasonable one to
>> you? If yes, is there some way one could solve this problem with a
>> clever use of the current Eventdev API? That would obviously be preferable.
> 
> I think, the use case is reasonable. For me, most easy path to achieve
> the functionality
> is setting rte_event_dev_config::nb_events_limit as for a given
> application always
> targeted to work X number of packets per second. Giving that upfront
> kind of make
> life easy for application writers and drivers at the cost of
> allocating required memory.
> 

Could you unpack that a little? How would you derive the nb_events_limit 
from the targeted pps throughput? In my world, they are pretty much 
orthogonal. nb_events_limit just specifies the maximum number of 
buffered events (i.e., events/packets in-flight in the pipeline).

Are you thinking about a system where you do input rate shaping (e.g, on 
the aggregate flow, e.g., from NIC+timer wheel), to some fixed rate? A 
rate you know with some reasonable certainty can be sustained.

Most non-trivial applications will vary in capacity depending on packet 
size, number of flows, types of flows, flow life time, non-packet 
processing cache or DDR pressure, etc.

In any system where you never risk accepting new items of work at a 
higher pace than the system is able to finish them, any mechanism 
designed to help you deal with work scheduler back pressure (at the 
point of new event enqueue) is pointless of course.

Or maybe you are you thinking of a system where the EAL threads almost 
never enqueues new events (only forward-type events)? In other words, a 
system where NIC and/or timer hardware is the source of almost-all new 
work, and both of those are tightly integrated with the event device?

>>
>>> # Adding multiple APIs in fast path to driver layer may not
>>> performance effective solution.
>>
>> For event devices with a software-managed credit system, pre-allocation
>> would be very cheap. And, if an application prefer to handle back
>> pressure after-the-fact, that option is still available.
> 
> I am worried about exposing PMD calls and application starts calling per packet,
> e.s.p with burst size = 1 for latency critical applications.
> 
> 
>>
>>> # At least for cnxk HW, credits are for device, not per port. So cnxk
>>> HW implementation can not use this scheme.
>>>
>>
>> DSW's credit pool is also per device, but are cached on a per-port
>> basis. Does cnxk driver rely on the hardware to signal "new event" back
>> pressure? (From the driver code it looks like that is the case.)
> 
> Yes. But we can not really cache it per port without introducing
> complex atomic logic.
> 

You could defer back pressure management to software altogether. If you 
trade some accuracy (in terms of exactly how many in-flight events are 
allowed), the mechanism is both pretty straight-forward to implement and 
cycle-efficient.

>>
>>> Alternative solution could be, adding new flag for
>>> rte_enqueue_new_burst(), where drivers waits until credit is available
>>> to reduce the application overhead > and support in different HW implementations if this use case critical.
>>>
>>>    #define RTE_EVENT_FLAG_WAIT_TILL_CREDIT_AVILABLE (UINT32_C(1) << 0)
>>>
>>>
>>
>> This solution only works if the event device is the only source of work
>> for the EAL thread. That is a really nice model, but I wouldn't trust on
>> that to always be the case.
> 
> For non EAL thread, I am assuming it is HW event adapter kind of case,

What case is this? I think we can leave out non-EAL threads (registered 
threads, unregistered can't even call into the Eventdev API), since to 
the extent the use an event device, it will be very limited.

> In such case, they don't need to wait. I think, for SW EAL threads case only
> we need to wait as application is expecting to make sure wait till
> the credit is available to avoid error handling in application.
> 

That sounds potentially very wasteful, if the time it has to wait is 
long. In the worst case, if all lcores hit this limit at the same time, 
the result is a deadlock, where every thread waits for some other 
threads to finish off enough work from the pipeline's backlog, to make 
the in-flight events go under the new_event_threshold.

>>
>> Also, there may be work that should only be performed, if the system is
>> not under very high load. Credits being available, especially combined
>> with a flexible new even threshold would be an indicator.
>>
>> Another way would be to just provide an API call that gave an indication
>> of a particular threshold has been reached (or simply return an
>> approximation of the number of in-flight events). Such a mechanism
>> wouldn't be able to leave any guarantees, but could make a future
>> enqueue operation very likely to succeed.
> 
> Giving rte_event_dev_credits_avaiable(device_id) should be OK provided
> it is not expecting fine-grained accuracy.  But my worry is applications starts
> calling that per packet. Marking correct documentation may help. Not sure.
> 
>>
>>>>
>>>> Best regards,
>>>>           Mattias
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-04-27  9:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-13  6:54 Event device early back-pressure indication Mattias Rönnblom
2023-04-13 12:55 ` Heng Wang
2023-04-17 12:52 ` Jerin Jacob
2023-04-17 15:36   ` Mattias Rönnblom
2023-04-19 11:06     ` Jerin Jacob
2023-04-27  9:15       ` Mattias Rönnblom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).