* [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK
@ 2016-08-09 1:01 Jerin Jacob
2016-08-09 8:48 ` Bruce Richardson
0 siblings, 1 reply; 7+ messages in thread
From: Jerin Jacob @ 2016-08-09 1:01 UTC (permalink / raw)
To: dev
Cc: thomas.monjalon, bruce.richardson, hemant.agrawal,
shreyansh.jain, jerin.jacob
Hi All,
Find below an RFC API specification which attempts to
define the standard application programming interface
for event driven programming in DPDK and to abstract HW based event devices.
These devices can support event scheduling and flow ordering
in HW and typically found in NW SoCs as an integrated device or
as PCI EP device.
The RFC APIs are inspired from existing ethernet and crypto devices.
Following are the requirements considered to define the RFC API.
1) APIs similar to existing Ethernet and crypto API framework for
○ Device creation, device Identification and device configuration
2) Enumerate libeventdev resources as numbers(0..N) to
○ Avoid ABI issues with handles
○ Event device may have million flow queues so it's not practical to
have handles for each flow queue and its associated name based
lookup in multiprocess case
3) Avoid struct mbuf changes
4) APIs to
○ Enumerate eventdev driver capabilities and resources
○ Enqueue events from l-core
○ Schedule events
○ Synchronize events
○ Maintain ingress order of the events
○ Run to completion support
Find below the URL for the complete API specification.
https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
I have created a supportive document to share the concepts of
event driven programming model and proposed APIs details to get
better reach for the specification.
This presentation will cover introduction to event driven programming model concepts,
characteristics of hardware-based event manager devices,
RFC API proposal, example use case, and benefits of using the event driven programming model.
Find below the URL for the supportive document.
https://rawgit.com/jerinjacobk/libeventdev/master/DPDK-event_driven_programming_framework.pdf
git repo for the above documents:
https://github.com/jerinjacobk/libeventdev/
Looking forward to getting comments from both application and driver
implementation perspective.
What follows is the text version of the above documents, for inline comments and discussion.
I intend to update that specification accordingly.
/**
* Get the total number of event devices that have been successfully
* initialised.
*
* @return
* The total number of usable event devices.
*/
extern uint8_t
rte_eventdev_count(void);
/**
* Get the device identifier for the named event device.
*
* @param name
* Event device name to select the event device identifier.
*
* @return
* Returns event device identifier on success.
* - <0: Failure to find named event device.
*/
extern uint8_t
rte_eventdev_get_dev_id(const char *name);
/*
* Return the NUMA socket to which a device is connected.
*
* @param dev_id
* The identifier of the device.
* @return
* The NUMA socket id to which the device is connected or
* a default of zero if the socket could not be determined.
* - -1: dev_id value is out of range.
*/
extern int
rte_eventdev_socket_id(uint8_t dev_id);
/** Event device information */
struct rte_eventdev_info {
const char *driver_name; /**< Event driver name */
struct rte_pci_device *pci_dev; /**< PCI information */
uint32_t min_sched_wait_ns;
/**< Minimum supported scheduler wait delay in ns by this device */
uint32_t max_sched_wait_ns;
/**< Maximum supported scheduler wait delay in ns by this device */
uint32_t sched_wait_ns;
/**< Configured scheduler wait delay in ns of this device */
uint32_t max_flow_queues_log2;
/**< LOG2 of maximum flow queues supported by this device */
uint8_t max_sched_groups;
/**< Maximum schedule groups supported by this device */
uint8_t max_sched_group_priority_levels;
/**< Maximum schedule group priority levels supported by this device */
}
/**
* Retrieve the contextual information of an event device.
*
* @param dev_id
* The identifier of the device.
* @param[out] dev_info
* A pointer to a structure of type *rte_eventdev_info* to be filled with the
* contextual information of the device.
*/
extern void
rte_eventdev_info_get(uint8_t dev_id, struct rte_eventdev_info *dev_info);
/** Event device configuration structure */
struct rte_eventdev_config {
uint32_t sched_wait_ns;
/**< rte_event_schedule() wait for *sched_wait_ns* ns on this device */
uint32_t nb_flow_queues_log2;
/**< LOG2 of the number of flow queues to configure on this device */
uint8_t nb_sched_groups;
/**< The number of schedule groups to configure on this device */
};
/**
* Configure an event device.
*
* This function must be invoked first before any other function in the
* API. This function can also be re-invoked when a device is in the
* stopped state.
*
* The caller may use rte_eventdev_info_get() to get the capability of each
* resources available in this event device.
*
* @param dev_id
* The identifier of the device to configure.
* @param config
* The event device configuration structure.
*
* @return
* - 0: Success, device configured.
* - <0: Error code returned by the driver configuration function.
*/
extern int
rte_eventdev_configure(uint8_t dev_id, struct rte_eventdev_config *config);
#define RTE_EVENT_SCHED_GRP_PRI_HIGHEST 0
/**< Highest schedule group priority */
#define RTE_EVENT_SCHED_GRP_PRI_NORMAL 128
/**< Normal schedule group priority */
#define RTE_EVENT_SCHED_GRP_PRI_LOWEST 255
/**< Lowest schedule group priority */
struct rte_eventdev_sched_group_conf {
rte_cpuset_t lcore_list;
/**< List of l-cores has membership in this schedule group */
uint8_t priority;
/**< Priority for this schedule group relative to other schedule groups.
If the event device's *max_sched_group_priority_levels* are not in
the range of requested *priority* then event driver can normalize
to required priority value in the range of
[RTE_EVENT_SCHED_GRP_PRI_HIGHEST, RTE_EVENT_SCHED_GRP_PRI_LOWEST]*/
uint8_t enable_all_lcores;
/**< Ignore *core_list* and enable all the l-cores */
};
/**
* Allocate and set up a schedule group for a event device.
*
* @param dev_id
* The identifier of the device.
* @param group_id
* The index of the schedule group to setup. The value must be in the range
* [0, nb_sched_groups - 1] previously supplied to rte_eventdev_configure().
* @param group_conf
* The pointer to the configuration data to be used for the schedule group.
* NULL value is allowed, in which case default configuration used.
* @param socket_id
* The *socket_id* argument is the socket identifier in case of NUMA.
* The value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the
* DMA memory allocated for the receive schedule group.
*
* @return
* - 0: Success, schedule group correctly set up.
* - <0: Schedule group configuration failed
*/
extern int
rte_eventdev_sched_group_setup(uint8_t dev_id, uint8_t group_id,
const struct rte_eventdev_sched_group_conf *group_conf,
int socket_id);
/**
* Get the number of schedule groups on a specific event device
*
* @param dev_id
* Event device identifier.
* @return
* - The number of configured schedule groups
*/
extern uint16_t
rte_eventdev_sched_group_count(uint8_t dev_id);
/**
* Get the priority of the schedule group on a specific event device
*
* @param dev_id
* Event device identifier.
* @param group_id
* Schedule group identifier.
* @return
* - The configured priority of the schedule group in
* [RTE_EVENT_SCHED_GRP_PRI_HIGHEST, RTE_EVENT_SCHED_GRP_PRI_LOWEST] range
*/
extern uint8_t
rte_eventdev_sched_group_priority(uint8_t dev_id, uint8_t group_id);
/**
* Get the configured flow queue id mask of a specific event device
*
* *flow_queue_id_mask* can be used to generate *flow_queue_id* value in the
* range [0 - (2^max_flow_queues_log2 -1)] of a specific event device.
* *flow_queue_id* value will be used in the event enqueue operation
* and comparing scheduled event *flow_queue_id* value against enqueued value.
*
* @param dev_id
* Event device identifier.
* @return
* - The configured flow queue id mask
*/
extern uint32_t
rte_eventdev_flow_queue_id_mask(uint8_t dev_id);
/**
* Start an event device.
*
* The device start step is the last one and consists of setting the schedule
* groups and flow queues to start accepting the events and schedules to l-cores.
*
* On success, all basic functions exported by the API (event enqueue,
* event schedule and so on) can be invoked.
*
* @param dev_id
* Event device identifier
* @return
* - 0: Success, device started.
* - <0: Error code of the driver device start function.
*/
extern int
rte_eventdev_start(uint8_t dev_id);
/**
* Stop an event device. The device can be restarted with a call to
* rte_eventdev_start()
*
* @param dev_id
* Event device identifier.
*/
extern void
rte_eventdev_stop(uint8_t dev_id);
/**
* Close an event device. The device cannot be restarted!
*
* @param dev_id
* Event device identifier
*
* @return
* - 0 on successfully closing device
* - <0 on failure to close device
*/
extern int
rte_eventdev_close(uint8_t dev_id);
/* Scheduler synchronization method */
#define RTE_SCHED_SYNC_ORDERED 0
/**< Ordered flow queue synchronization
*
* Events from an ordered flow queue can be scheduled to multiple l-cores for
* concurrent processing while maintaining the original event order. This
* scheme enables the user to achieve high single flow throughput by avoiding
* SW synchronization for ordering between l-cores.
*
* The source flow queue ordering is maintained when events are enqueued to
* their destination queue(s) within the same ordered queue synchronization
* context. A l-core holds the context until it requests another event from the
* scheduler, which implicitly releases the context. User may allow the
* scheduler to release the context earlier than that by calling
* rte_event_schedule_release()
*
* Events from the source flow queue appear in their original order when
* dequeued from a destination flow queue irrespective of its
* synchronization method. Event ordering is based on the received event(s),
* but also other (newly allocated or stored) events are ordered when enqueued
* within the same ordered context.Events not enqueued (e.g. freed or stored)
* within the context are considered missing from reordering and are skipped at
* this time (but can be ordered again within another context).
*
*/
#define RTE_SCHED_SYNC_ATOMIC 1
/**< Atomic flow queue synchronization
*
* Events from an atomic flow queue can be scheduled only to a single l-core at
* a time. The l-core is guaranteed to have exclusive (atomic) access to the
* associated flow queue context, which enables the user to avoid SW
* synchronization. Atomic flow queue also helps to maintain event ordering
* since only one l-core at a time is able to process events from a flow queue.
*
* The atomic queue synchronization context is dedicated to the l-core until it
* requests another event from the scheduler, which implicitly releases the
* context. User may allow the scheduler to release the context earlier than
* that by calling rte_event_schedule_release()
*
*/
#define RTE_SCHED_SYNC_PARALLEL 2
/**< Parallel flow queue
*
* The scheduler performs priority scheduling, load balancing etc functions
* but does not provide additional event synchronization or ordering.
* It's free to schedule events from single parallel queue to multiple l-core
* for concurrent processing. Application is responsible for flow queue context
* synchronization and event ordering (SW synchronization).
*
*/
/* Event types to classify the event source */
#define RTE_EVENT_TYPE_ETHDEV 0x0
/**< The event generated from ethdev subsystem */
#define RTE_EVENT_TYPE_CRYPTODEV 0x1
/**< The event generated from crypodev subsystem */
#define RTE_EVENT_TYPE_TIMERDEV 0x2
/**< The event generated from timerdev subsystem */
#define RTE_EVENT_TYPE_LCORE 0x3
/**< The event generated from l-core. Application may use *sub_event_type*
* to further classify the event */
#define RTE_EVENT_TYPE_INVALID 0xf
/**< Invalid event type */
#define RTE_EVENT_TYPE_MAX 0x16
/**< The generic rte_event structure to hold the event attributes */
struct rte_event {
union {
uint64_t u64;
struct {
uint32_t flow_queue_id;
/**< Flow queue identifier to choose the flow queue in
* enqueue and schedule operation.
* The value must be the range of
* rte_eventdev_flow_queue_id_mask() */
uint8_t sched_group_id;
/**< Schedule group identifier to choose the schedule
* group in enqueue and schedule operation.
* The value must be in the range
* [0, nb_sched_groups - 1] previously supplied to
* rte_eventdev_configure(). */
uint8_t sched_sync;
/**< Scheduler synchronization method associated
* with flow queue for enqueue and schedule operation */
uint8_t event_type;
/**< Event type to classify the event source */
uint8_t sub_event_type;
/**< Sub-event types based on the event source */
};
};
union {
uintptr_t event;
/**< Opaque event pointer */
struct rte_mbuf *mbuf;
/**< mbuf pointer if the scheduled event is associated with mbuf */
};
}
/**
*
* Enqueue the event object supplied in *rte_event* structure on flow queue
* identified as *flow_queue_id* associated with the schedule group
* *sched_group_id*, scheduler synchronization method and its event types
* on an event device designated by its *dev_id*.
*
* @param dev_id
* Event device identifier.
* @param ev
* Pointer to struct rte_event
* @return
* - 0 on success
* - <0 on failure
*/
extern int
rte_eventdev_enqueue(uint8_t dev_id, struct rte_event *ev);
/**
* Enqueue a burst of events objects supplied in *rte_event* structure
* on an event device designated by its *dev_id*.
*
* The rte_eventdev_enqueue_burst() function is invoked to enqueue
* multiple event objects. Its the burst variant of rte_eventdev_enqueue()
* function
*
* The *num* parameter is the number of event objects to enqueue which are
* supplied in the *ev* array of *rte_event* structure.
*
* The rte_eventdev_enqueue_burst() function returns the number of
* events objects it actually enqueued . A return value equal to
* *num* means that all event objects have been enqueued.
*
* @param dev_id
* The identifier of the device.
* @param ev
* The address of an array of *num* pointers to *rte_event* structure
* which contain the event object enqueue operations to be processed.
* @param num
* The number of event objects to enqueue
*
* @return
* The number of event objects actually enqueued on the event device. The return
* value can be less than the value of the *num* parameter when the
* event devices flow queue is full or if invalid parameters are specified in
* a *rte_event*. If return value is less than *num*, the remaining events at
* the end of ev[] are not consumed, and the caller has to take care of them.
*/
extern int
rte_eventdev_enqueue_burst(uint8_t dev_id, struct rte_event *ev[], int num);
/**
* Schedule an event to the caller l-core from the event device designated by
* its *dev_id*.
*
* rte_event_schedule() does not dictate the specifics of scheduling algorithm as
* each eventdev driver may have different criteria to schedule an event.
* However, in general, from an application perspective scheduler may use
* following scheme to dispatch an event to l-core
*
* 1) Selection of schedule group
* a) The Number of schedule group available in the event device
* b) The caller l-core membership in the schedule group.
* c) Schedule group priority relative to other schedule groups.
* 2) Selection of flow queue and event
* a) The Number of flow queues available in event device
* b) Scheduler synchronization method associated with the flow queue
*
* On successful scheduler event dispatch, The caller l-core holds scheduler
* synchronization context associated with the dispatched event, an explicit
* rte_event_schedule_release() or rte_event_schedule_ctxt_*() or next
* rte_event_schedule() call shall release the context
*
* @param dev_id
* The identifier of the device.
* @param[out] ev
* Pointer to struct rte_event. On successful event dispatch, Implementation
* updates the event attributes
* @param wait
* When true, wait for event till available or *sched_wait_ns* ns which
* previously supplied to rte_eventdev_configure()
*
* @return
* When true, a valid event has been dispatched by the scheduler.
*
*/
extern bool
rte_event_schedule(uint8_t dev_id, struct rte_event *ev, bool wait);
/**
* Schedule an event to the caller l-core from a specific schedule group
* *group_id* of event device designated by its *dev_id*.
*
* Like rte_event_schedule(), but schedule group provided as argument *group_id*
*
* @param dev_id
* The identifier of the device.
* @param group_id
* Schedule group identifier to select the schedule group for event dispatch
* @param[out] ev
* Pointer to struct rte_event. On successful event dispatch, Implementation
* updates the event attributes
* @param wait
* When true, wait for event till available or *sched_wait_ns* ns which
* previously supplied to rte_eventdev_configure()
*
* @return
* When true, a valid event has been dispatched by the scheduler.
*
*/
extern bool
rte_event_schedule_from_group(uint8_t dev_id, uint8_t group_id,
struct rte_event *ev, bool wait);
/**
* Release the current scheduler synchronization context associated with the
* scheduler dispatched event
*
* If current scheduler synchronization context method is *RTE_SCHED_SYNC_ATOMIC*
* then this function hints the scheduler that the user has completed critical
* section processing in the current atomic context.
* The scheduler is now allowed to schedule events from the same flow queue to
* another l-core.
* Early atomic context release may increase parallelism and thus system
* performance, but user needs to design carefully the split into critical vs.
* non-critical sections.
*
* If current scheduler synchronization context method is *RTE_SCHED_SYNC_ORDERED*
* then this function hints the scheduler that the user has done all enqueues
* that need to maintain event order in the current ordered context.
* The scheduler is allowed to release the ordered context of this l-core and
* avoid reordering any following enqueues.
* Early ordered context release may increase parallelism and thus system
* performance, since scheduler may start reordering events sooner than the next
* schedule call.
*
* If current scheduler synchronization context method is *RTE_SCHED_SYNC_PARALLEL*
* then this function is a nop
*
* @param dev_id
* The identifier of the device.
*
*/
extern void
rte_event_schedule_release(uint8_t dev_id);
/**
* Update the current schedule context associated with caller l-core
*
* rte_event_schedule_ctxt_update() can be used to support run-to-completion
* model where the application requires the current *event* to stay on the same
* l-core as it moves through the series of processing stages, provided the
* event type is *RTE_EVENT_TYPE_LCORE*.
*
* In the context of run-to-completion model, rte_eventdev_enqueue()
* and its associated rte_event_schedule() can be replaced by
* rte_event_schedule_ctxt_update() if caller requires to current event to
* stay on caller l-core for new *flow_queue_id* and/or new *sched_sync*
* and/or new *sub_event_type* values
*
* All of the arguments should be equal to their current schedule context values
* unless the application needs the dispatcher to modify the event attribute
* of a dispatched event.
*
* rte_event_schedule_ctxt_update() is a costly operation, by splitting it as
* functions(rte_event_schedule_ctxt_update() and rte_event_schedule_ctxt_wait())
* allows caller to overlap the context update latency with other profitable
* work
*
* @param dev_id
* The identifier of the device.
* @param flow_queue_id
* The new flow queue identifier
* @param sched_sync
* The new schedule synchronization method
* @param sub_event_type
* The new sub_event_type where event_type == RTE_EVENT_TYPE_LCORE
* @param wait
* When true, wait until context update completes
* When false, request to update the attribute may optionally start an
* operation that may not finish when this function returns.
* In that case, this function return '1' to indicate the application to
* call rte_event_schedule_ctxt_wait() before processing with an
* operation that requires the completion of the requested event attribute
* change
* @return
* - <0 on failure
* - 0 on if event attribute update operation has been completed.
* - 1 on if event attribute update operation has begun asynchronously.
*
*/
extern int
rte_event_schedule_ctxt_update(uint8_t dev_id, uint32_t flow_queue_id,
uint8_t sched_sync, uint8_t sub_event_type, bool wait);
/**
* Wait for l-core associated event update operation to complete on the
* event device designated by its *dev_id*.
*
* The caller l-core wait until a previously started event attribute update
* operation from the same l-core till it completes
*
* This function is invoked when rte_event_schedule_ctxt_update() returns '1'
*
* @param dev_id
* The identifier of the device.
*/
extern void
rte_event_schedule_ctxt_wait(uint8_t dev_id);
/**
* Join the caller l-core to a schedule group *group_id* of the event device
* designated by its *dev_id*.
*
* l-core membership in the schedule group can be configured with
* rte_eventdev_sched_group_setup() prior to rte_eventdev_start()
*
* @param dev_id
* The identifier of the device.
* @param group_id
* Schedule group identifier to select the schedule group to join
*
* @return
* - 0 on success
* - <0 on failure
*/
extern int
rte_event_schedule_group_join(uint8_t dev_id, uint8_t group_id);
/**
* Leave the caller l-core from a schedule group *group_id* of the event device
* designated by its *dev_id*.
*
* This function will unsubscribe the calling l-core from receiving events from
* the specified schedule group *group_id*
*
* l-core membership in the schedule group can be configured with
* rte_eventdev_sched_group_setup() prior to rte_eventdev_start()
*
* @param dev_id
* The identifier of the device.
* @param group_id
* Schedule group identifier to select the schedule group to join
*
* @return
* - 0 on success
* - <0 on failure
*/
extern int
rte_event_schedule_group_leave(uint8_t dev_id, uint8_t group_id);
*************** text version of the presentation document ************************
Agenda
Event driven programming model concepts in data plane perspective
Characteristics of HW based event manager devices
libeventdev
Example use case - Simple IPSec outbound processing
Benefits of event driven programming model
Future work
Event driven programming model - Concepts
Event is an asynchronous notification from HW/SW to CPU core
Typical examples of events in dataplane are
Packets from ethernet device
Crypto work completion notification from Crypto HW
Timer expiry notification from Timer HW
CPU generates an event to notify another CPU(used in pipeline mode)
Event driven programming is a programming paradigm in which flow of the program is determined by events
Core 0
queue0
Core 1
Core n
Scheduler
queue N
queue3
queue2
queue1
packet event
Timer expiry ev
Crypto done ev
SW event
Packet event, Timer expiry event and crypto work complete event are the typical HW generated events
Core can also produce the SW event to notify another core for work completion
Queue 0..N stores the events
Scheduler schedules an event to core
Core process the event and enqueue to another downstream queue for further processing or send the event/packet to wire
Event driven programming model - Concepts
Characteristics of HW based event device
Millions of flow queues
Events associated with a single flow queue can be scheduled on multiple CPUs for concurrent processing while maintaining the original event order
Provides synchronization of the events without SW lock schemes
Priority based scheduling to enable the QoS
Event device may have 1 to N schedule groups
Each core can be a member of any subset of schedule groups
Each core decides which schedule group(s) it accepts the events from
Schedule groups provide a means to execute different functions on different cores
Flow queues grouped into schedule groups
Core to schedule group membership can be changed at runtime to support scaling and reduce the latency of critical work by assigning more cores at runtime
Event scheduler is implemented in HW to the save CPU cycles
libeventdev components
Core 0
Core 1
Core n
Scheduler
packet event
Timer expiry ev
Crypto done ev
SW event
flowqueue n
flowqueue2
flowqueue1
flowqueue0
flowqueue n
flowqueue2
flowqueue1
flowqueue0
flowqueue n
flowqueue2
flowqueue1
flowqueue0
Sched
group0
Sched
group1
Sched
group n
enqueue(grp_id, flow_queue_id, schedule_sync.
event_type,
event)
{grp,flow_queueid,schedule_sync, event_type, event}= schedule()
priority x
priority y
priority z
Core 0's Sched
Group bitmask:
100011
Group 0
Group 1
Group n
Core 1's
Sched group bitmask:
000001
Group 0
Each core has group-mask to capture, the list of schedule groups participate in schedule()
API Interface
Southbound eventdev driver interface
libeventdev - flow
Event driver registers with libeventdev subsystem and subsystem provide a unique device id
Application get the device capabilities with rte_eventdev_info_get(dev_id), like
The number of schedule groups
The number of flow queues in a schedule group
Application configures the event device and each schedule groups in the event device, like
The number of schedule groups and the flow queues are required
Priority of each schedule group and list of l-cores associated with it
Connect schedule groups with other HW event producers in the system like ethdev and crypto etc
In fastpath,
HW/SW enqueues the events to flow queues associated with schedule groups
Core gets the event through scheduler by invoking rte_event_scheduler() from lcore
Core process the event and enqueue to another downstream queue for further processing or send the event/packet to wire if it is the last stage of the processing
rte_event_scheduler() schedules the event based on
selection of the schedule group
The caller l-core membership in the schedule group
Schedule group priority relative to other schedule groups.
selection of the flow queue and the event inside the schedule group
Scheduler sync method associated with the flow queue(ATOMIC vs ORDERED/PARALLEL)
Schedule sync methods (How events are Synchronized)
PARALLEL
Events from a parallel flow queue can be scheduled to multiple cores for concurrent processing
Ingress order is not maintained
ATOMIC
Events from an atomic flow queue can schedule only to a single core at a time
Enable critical section in packet processing like sequence number update etc
Ingress order is maintained as outstanding is always one at a time
ORDERED
Events from the ordered flow queue can be scheduled to multiple cores for concurrent processing
Ingress order is maintained
Enable high single flow throughput
ORDERED flow queue for ingress ordering
6
5
4
3
2
1
ORDERED flow queue
Scheduler
Cores processing ordered events in parallel
4
6
3
1
2
5
6
5
4
3
2
1
Any downstream flow queue
rte_event_schedule()
rte_event_queue_enqueue()
The source ORDERED flow queue’s ingress order shall be maintained when events are enqueued to any downstream flow queue
Use case (Simple IPSec Outbound processing)
PHASE1:
POLICY/SA,
ROUTE
Lookup
In parallel
(ORDERED)
Port 0
RX
Port 1
RX
Port 2
RX
Port 3
RX
Port 4
RX
Port 6
RX
Port 0
TX
Port 1
TX
Port 2
TX
Port 3
TX
Port 4
TX
Port 6
TX
PHASE2:
SEQ Number update per SA
(ATOMIC)
PHASE3:
HW assisted IPSec crypto
PHASE4:
Core sends encrypted pks to Tx port queues
(ATOMIC)
Packets enqueued into one of up to 1M flow queues based on a classification criterion(e.g 5 tuple hash)
PHASE1 generates a unique SA based on input packet and SA tables.
Each SA flow will be processed in parallel.
Core enqueues on ATOMIC flow queue for critical section processing per SA
Crypto HW sends the crypto work completion event to notify the core.
Crypto HW processes the crypto operations in background
Core issues IPSec crypto request to HW
Simple IPSec Outbound processing - Cores View
Core n
Core 1
Core 0
while(1) {
event = rte_event_schedule();
process the specific phase
call different enqueue() to send to
- atomic flow queue
- crypto HW engine queue
- TX port queue
}
Scheduler
N
HW crypto assist
Tx port queue
Tx port queue
Tx port queue
Per SA, Core enqueues on ATOMIC flow queue for critical section phase of the flow
On completion of crypto work, HW generates the crypto work completion notification
RX pkt HW enqueues one of millions flow to ORDERED flow queues
Flow queues
SA
Flow queues
Flow queues
SA
Core enqueues the crypto work
API Requirements
APIs similar to existing ethernet and crypto API framework for
Device creation, device Identification and device configuration
Enumerate libeventdev resources as numbers(0..N) to
Avoid ABI issues with handles
event device may have million flow queues so it's not practical to have handles for each flow queue and its associated name based lookup in multiprocess case
Avoid struct mbuf changes
APIs to
Enumerate eventdev driver capabilities and resources
Enqueue events from l-core
Schedule events
Synchronize events
Maintain ingress order of the events
API - Slow path
APIs similar to existing ethernet and crypto API framework for
Device creation - Physical event devices are discovered during the PCI probe/enumeration of the EAL function which is executed at DPDK initialization, based on their PCI device identifier, each unique PCI BDF (bus/bridge, device, function)
Device Identification - A unique device index used to designate the event device in all functions exported by the eventdev API.
Device Capability discovery
rte_eventdev_info_get() - To get the global resources like number of schedule groups and number of flow queues per schedule group etc of the event device
Device configuration
rte_eventdev_configure() - configures the number of schedule groups and the number of flow queues on the schedule groups
rte_eventdev_sched_group_setup() - configures schedule group specific configuration like priority and the list of l-core has membership in the schedule group
Device state change - rte_eventdev_start()/stop()/close() like ethdev device
API - Fast path
bool rte_event_schedule(uint8_t dev_id, struct rte_event *ev, bool wait);
Schedule an event to the caller l-core from a specific schedule group of event device designated by its dev_id
bool rte_event_schedule_from_group(uint8_t dev_id, uint8_t group_id,struct rte_event *ev, wait)
Like rte_event_schedule(), but schedule group provided as argument
void rte_event_schedule_release(uint8_t dev_id);
Release the current scheduler synchronization context associated with the scheduler dispatched event
int rte_event_schedule_group_[join/leave](uint8_t dev_id, uint8_t group_id);
Leave/Joins the caller l-core from/to a schedule group
bool rte_event_schedule_ctxt_update(uint8_t dev_id, uint32_t flow_queue_id, uint8_t sched_sync, uint8_t sub_event_type, bool wait);
rte_event_schedule_ctxt_update() can be used to support run-to-completion model where the application requires the current *event* to stay on the same l-core as it moves through the series of processing stages, provided the event type is RTE_EVENT_TYPE_LCORE
Fast path APIs - Simple IPSec outbound example
#define APP_STATE_SEQ_UPDATE 0
on each lcore
{
struct rte_event ev;
uint32_t flow_queue_id_mask = rte_eventdev_flow_queue_id_mask(eventdev);
while (1) {
ret = rte_event_schedule(eventdev, &ev, true);
if (!ret)
continue;
/* packets from HW rx ports proceed parallely per flow(ORDERED)*/
if (ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
sa = outbound_sa_lookup(ev.mbuf);
modify the packet per SA attributes
find the tx port and tx queue from routing table
/* move to next phase (atomic seq number update per sa) */
ev.flow_queue_id = sa & flow_queue_id_mask;
ev.sched_sync = RTE_SCHED_SYNC_ATOMIC;
ev.sub_event_id = APP_STATE_SEQ_UPDATE;
rte_event_enqueue(evendev, ev);
} else if (ev.event_type == RTE_EVENT_TYPE_LCORE && ev.sub_event_id == APP_STATE_SEQ_UPDATE) {
sa = ev.flow_queue_id;
/* do critical section work per sa */
do_critical_section_work(sa);
/* Issue the crypto request and generate the following on crypto work completion */
ev.flow_queue_id = tx_port;
ev.sub_event_id = tx_queue_id;
ev.sched_sync = RTE_SCHED_SYNC_ATOMIC;
rte_cryptodev_event_enqueue(cryptodev, ev.mbuf, eventdev, ev);
}
} else if((ev.event_type == RTE_EVENT_TYPE_CRYPTODEV)
tx_port = ev.flow_queue_id;
tx_queue_id = ev.sub_evend_id;
send the packet to tx port/queue
}
}
}
rte_event_schedule_ctxt_update() can be used to support run-to-completion model where the application requires the current event to stay on same l-core as it moves through the series of processing stages, provided the event type is RTE_EVENT_TYPE_LCORE(l-core to l-core communication)
For example in the previous use case, the ATOMIC sequence number update per SA can be achieved like below
Scheduler context update is costly operation, by spliting it as two functions(rte_event_schedule_ctxt_update() and rte_event_schedule_ctxt_wait()) allows application to overlap the context switch latency with other profitable work
Run-to-completion model support
/* move to next phase (atomic seq number update per sa) */
ev.flow_queue_id = sa & flow_queue_id_mask;
ev.sched_sync = RTE_SCHED_SYNC_ATOMIC;
ev.sub_event_id = APP_STATE_SEQ_UPDATE;
rte_event_enqueue(evendev, ev);
} else if (ev.event_type == RTE_EVENT_TYPE_LCORE && ev.sub_event_id == APP_STATE_SEQ_UPDATE) {
sa = ev.flow_queue_id;
/* do critical section work per sa */
do_critical_section_work(sa);
/* move to next phase (atomic seq number update per sa) */
rte_event_schedule_ctxt_update(eventdev,
sa & flow_queue_id_mask, RTE_SCHED_SYNC_ATOMIC, APP_STATE_SEQ_UPDATE, true);
/* do critical section work per sa */
do_critical_section_work(sa);
Benefits of event driven programming model
Enable high single flow throughput with ORDERED schedule sync method
The processing stages are not bound to specific cores. It provides better load-balancing and scaling capabilities than traditional pipelining.
Prioritize: Guarantee lcores work on the highest priority event available
Support asynchronous operations which allow the cores to stay busy while hardware manages requests.
Remove the static mappings between core to port/rx queue
Scaling from 1 to N flows are easy as its not bound to specific cores
Future work
Integrate the event device with ethernet, crypto and timer subsystems in DPDK
Ethdev/event device integration is possible by extending new 6WIND’s ingress classification specification where a new action type can establish ethdev’s port to eventdev’s schedule group connection
Cryptodev needs some change at configuration stage to set crypto work complete event delivery mechanism
Spec out timerdev for PCI based timer event devices(timer event devices generates timer expiry event vs callback in the existing SW based timer scheme)
Event driven model operates on a single event at a time. Need to create a helper API to make it burst in nature for the final enqueues to different HW block like ethdev tx-queue
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK
2016-08-09 1:01 [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK Jerin Jacob
@ 2016-08-09 8:48 ` Bruce Richardson
2016-08-09 18:46 ` Jerin Jacob
0 siblings, 1 reply; 7+ messages in thread
From: Bruce Richardson @ 2016-08-09 8:48 UTC (permalink / raw)
To: Jerin Jacob; +Cc: dev, thomas.monjalon, hemant.agrawal, shreyansh.jain
On Tue, Aug 09, 2016 at 06:31:41AM +0530, Jerin Jacob wrote:
> Hi All,
>
> Find below an RFC API specification which attempts to
> define the standard application programming interface
> for event driven programming in DPDK and to abstract HW based event devices.
>
> These devices can support event scheduling and flow ordering
> in HW and typically found in NW SoCs as an integrated device or
> as PCI EP device.
>
> The RFC APIs are inspired from existing ethernet and crypto devices.
> Following are the requirements considered to define the RFC API.
>
> 1) APIs similar to existing Ethernet and crypto API framework for
> ○ Device creation, device Identification and device configuration
> 2) Enumerate libeventdev resources as numbers(0..N) to
> ○ Avoid ABI issues with handles
> ○ Event device may have million flow queues so it's not practical to
> have handles for each flow queue and its associated name based
> lookup in multiprocess case
> 3) Avoid struct mbuf changes
> 4) APIs to
> ○ Enumerate eventdev driver capabilities and resources
> ○ Enqueue events from l-core
> ○ Schedule events
> ○ Synchronize events
> ○ Maintain ingress order of the events
> ○ Run to completion support
>
> Find below the URL for the complete API specification.
>
> https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
>
> I have created a supportive document to share the concepts of
> event driven programming model and proposed APIs details to get
> better reach for the specification.
> This presentation will cover introduction to event driven programming model concepts,
> characteristics of hardware-based event manager devices,
> RFC API proposal, example use case, and benefits of using the event driven programming model.
>
> Find below the URL for the supportive document.
>
> https://rawgit.com/jerinjacobk/libeventdev/master/DPDK-event_driven_programming_framework.pdf
>
> git repo for the above documents:
>
> https://github.com/jerinjacobk/libeventdev/
>
> Looking forward to getting comments from both application and driver
> implementation perspective.
>
Hi Jerin,
thanks for the RFC. Packet distribution and scheduling is something we've been
thinking about here too. This RFC gives us plenty of new ideas to take on board. :-)
While you refer to HW implementations on SOC's, have you given any thought to
how a pure-software implementation of an event API might work? I know that
while a software implemenation can obviously be done for just about any API,
I'd be concerned that the API not get in the way of a very highly
tuned implementation.
We'll look at it in some detail and get back to you with our feedback, as soon
as we can, to start getting the discussion going.
Regards,
/Bruce
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK
2016-08-09 8:48 ` Bruce Richardson
@ 2016-08-09 18:46 ` Jerin Jacob
0 siblings, 0 replies; 7+ messages in thread
From: Jerin Jacob @ 2016-08-09 18:46 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev, thomas.monjalon, hemant.agrawal, shreyansh.jain
On Tue, Aug 09, 2016 at 09:48:46AM +0100, Bruce Richardson wrote:
> On Tue, Aug 09, 2016 at 06:31:41AM +0530, Jerin Jacob wrote:
> > Find below the URL for the complete API specification.
> >
> > https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> >
> > I have created a supportive document to share the concepts of
> > event driven programming model and proposed APIs details to get
> > better reach for the specification.
> > This presentation will cover introduction to event driven programming model concepts,
> > characteristics of hardware-based event manager devices,
> > RFC API proposal, example use case, and benefits of using the event driven programming model.
> >
> > Find below the URL for the supportive document.
> >
> > https://rawgit.com/jerinjacobk/libeventdev/master/DPDK-event_driven_programming_framework.pdf
> >
> > git repo for the above documents:
> >
> > https://github.com/jerinjacobk/libeventdev/
> >
> > Looking forward to getting comments from both application and driver
> > implementation perspective.
> >
>
> Hi Jerin,
>
Hi Bruce,
> thanks for the RFC. Packet distribution and scheduling is something we've been
> thinking about here too. This RFC gives us plenty of new ideas to take on board. :-)
Thanks
> While you refer to HW implementations on SOC's, have you given any thought to
> how a pure-software implementation of an event API might work? I know that
Yes. I have removed almost all hardware specific details from the API
specification. Mostly the APIs are driven by the use case.
I had impression that software based scheme will use
lib_rte_distributor or lib_rte_reorder libraries to get load balancing
and reordering features. However, if we are looking for some converged
solution without impacting the HW models then I think it is a good step
forward.
IMO, Implementing the ORDERED schedule sync method in a performance effective
way in the SW may be tricky. May be we can introduces some capability based
schemes to co-exists the HW and SW solution.
> while a software implemenation can obviously be done for just about any API,
> I'd be concerned that the API not get in the way of a very highly
> tuned implementation.
>
> We'll look at it in some detail and get back to you with our feedback, as soon
> as we can, to start getting the discussion going.
OK
>
> Regards,
> /Bruce
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK
2016-10-07 10:40 ` Hemant Agrawal
@ 2016-10-09 8:27 ` Jerin Jacob
0 siblings, 0 replies; 7+ messages in thread
From: Jerin Jacob @ 2016-10-09 8:27 UTC (permalink / raw)
To: Hemant Agrawal; +Cc: Vangati, Narender, dev
On Fri, Oct 07, 2016 at 10:40:03AM +0000, Hemant Agrawal wrote:
> Hi Jerin/Narender,
Hi Hemant,
Thanks for the review.
>
> Thanks for the proposal and discussions.
>
> I agree with many of the comment made by Narender. Here are some additional comments.
>
> 1. rte_event_schedule - should support option for bulk dequeue. The size of bulk should be a property of device, how much depth it can support.
OK. Will fix it in v2.
>
> 2. The event schedule should also support the option to specify the amount of time, it can wait. The implementation may only support global setting(dequeue_wait_ns) for wait time. They can take any non-zero wait value as to implement wait.
OK. Will fix it in v2.
>
> 3. rte_event_schedule_from_group - there should be one model. Both Push and Pull may not work well together. At least the simultaneous mixed config will not work on NXP hardware scheduler.
OK. Will remove Cavium specific "rte_event_schedule_from_group" API in v2.
>
> 4. Priority of queues within the scheduling group? - Please keep in mind that some hardware supports intra scheduler priority and some only support intra flow_queue priority within a scheduler instance. The events of same flow id should have same priority.
Will try to address some solution based on capability.
>
> 5. w.r.t flow_queue numbers in log2, I will prefer to have absolute number. Not all system may have large number of queues. So the design should keep in account the system will fewer number of queues.
OK. Will fix it in v2.
>
> Regards,
> Hemant
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Wednesday, October 05, 2016 12:55 PM
> > On Tue, Oct 04, 2016 at 09:49:52PM +0000, Vangati, Narender wrote:
> > > Hi Jerin,
> >
> > Hi Narender,
> >
> > Thanks for the comments.I agree with proposed changes; I will address these
> > comments in v2.
> >
> > /Jerin
> >
> >
> > >
> > >
> > >
> > > Here are some comments on the libeventdev RFC.
> > >
> > > These are collated thoughts after discussions with you & others to understand
> > the concepts and rationale for the current proposal.
> > >
> > >
> > >
> > > 1. Concept of flow queues. This is better abstracted as flow ids and not as flow
> > queues which implies there is a queueing structure per flow. A s/w
> > implementation can do atomic load balancing on multiple flow ids more
> > efficiently than maintaining each event in a specific flow queue.
> > >
> > >
> > >
> > > 2. Scheduling group. A scheduling group is more a steam of events, so an event
> > queue might be a better abstraction.
> > >
> > >
> > >
> > > 3. An event queue should support the concept of max active atomic flows
> > (maximum number of active flows this queue can track at any given time) and
> > max active ordered sequences (maximum number of outstanding events waiting
> > to be egress reordered by this queue). This allows a scheduler implementation to
> > dimension/partition its resources among event queues.
> > >
> > >
> > >
> > > 4. An event queue should support concept of a single consumer. In an
> > application, a stream of events may need to be brought together to a single
> > core for some stages of processing, e.g. for TX at the end of the pipeline to
> > avoid NIC reordering of the packets. Having a 'single consumer' event queue for
> > that stage allows the intensive scheduling logic to be short circuited and can
> > improve throughput for s/w implementations.
> > >
> > >
> > >
> > > 5. Instead of tying eventdev access to an lcore, a higher level of abstraction
> > called event port is needed which is the application i/f to the eventdev. Event
> > ports are connected to event queues and is the object the application uses to
> > dequeue and enqueue events. There can be more than one event port per lcore
> > allowing multiple lightweight threads to have their own i/f into eventdev, if the
> > implementation supports it. An event port abstraction also encapsulates
> > dequeue depth and enqueue depth for a scheduler implementations which can
> > schedule multiple events at a time and output events that can be buffered.
> > >
> > >
> > >
> > > 6. An event should support priority. Per event priority is useful for segregating
> > high priority (control messages) traffic from low priority within the same flow.
> > This needs to be part of the event definition for implementations which support
> > it.
> > >
> > >
> > >
> > > 7. Event port to event queue servicing priority. This allows two event ports to
> > connect to the same event queue with different priorities. For implementations
> > which support it, this allows a worker core to participate in two different
> > workflows with different priorities (workflow 1 needing 3.5 cores, workflow 2
> > needing 2.5 cores, and so on).
> > >
> > >
> > >
> > > 8. Define the workflow as schedule/dequeue/enqueue. An implementation is
> > free to define schedule as NOOP. A distributed s/w scheduler can use this to
> > schedule events; also a centralized s/w scheduler can make this a NOOP on non-
> > scheduler cores.
> > >
> > >
> > >
> > > 9. The schedule_from_group API does not fit the workflow.
> > >
> > >
> > >
> > > 10. The ctxt_update/ctxt_wait breaks the normal workflow. If the normal
> > workflow is a dequeue -> do work based on event type -> enqueue, a pin_event
> > argument to enqueue (where the pinned event is returned through the normal
> > dequeue) allows application workflow to remain the same whether or not an
> > implementation supports it.
> > >
> > >
> > >
> > > 11. Burst dequeue/enqueue needed.
> > >
> > >
> > >
> > > 12. Definition of a closed/open system - where open system is memory backed
> > and closed system eventdev has limited capacity. In such systems, it is also
> > useful to denote per event port how many packets can be active in the system.
> > This can serve as a threshold for ethdev like devices so they don't overwhelm
> > core to core events.
> > >
> > >
> > >
> > > 13. There should be sort of device capabilities definition to address different
> > implementations.
> > >
> > >
> > >
> > >
> > > vnr
> > > ---
> > >
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK
2016-10-05 7:24 ` Jerin Jacob
@ 2016-10-07 10:40 ` Hemant Agrawal
2016-10-09 8:27 ` Jerin Jacob
0 siblings, 1 reply; 7+ messages in thread
From: Hemant Agrawal @ 2016-10-07 10:40 UTC (permalink / raw)
To: Jerin Jacob, Vangati, Narender; +Cc: dev
Hi Jerin/Narender,
Thanks for the proposal and discussions.
I agree with many of the comment made by Narender. Here are some additional comments.
1. rte_event_schedule - should support option for bulk dequeue. The size of bulk should be a property of device, how much depth it can support.
2. The event schedule should also support the option to specify the amount of time, it can wait. The implementation may only support global setting(dequeue_wait_ns) for wait time. They can take any non-zero wait value as to implement wait.
3. rte_event_schedule_from_group - there should be one model. Both Push and Pull may not work well together. At least the simultaneous mixed config will not work on NXP hardware scheduler.
4. Priority of queues within the scheduling group? - Please keep in mind that some hardware supports intra scheduler priority and some only support intra flow_queue priority within a scheduler instance. The events of same flow id should have same priority.
5. w.r.t flow_queue numbers in log2, I will prefer to have absolute number. Not all system may have large number of queues. So the design should keep in account the system will fewer number of queues.
Regards,
Hemant
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> Sent: Wednesday, October 05, 2016 12:55 PM
> On Tue, Oct 04, 2016 at 09:49:52PM +0000, Vangati, Narender wrote:
> > Hi Jerin,
>
> Hi Narender,
>
> Thanks for the comments.I agree with proposed changes; I will address these
> comments in v2.
>
> /Jerin
>
>
> >
> >
> >
> > Here are some comments on the libeventdev RFC.
> >
> > These are collated thoughts after discussions with you & others to understand
> the concepts and rationale for the current proposal.
> >
> >
> >
> > 1. Concept of flow queues. This is better abstracted as flow ids and not as flow
> queues which implies there is a queueing structure per flow. A s/w
> implementation can do atomic load balancing on multiple flow ids more
> efficiently than maintaining each event in a specific flow queue.
> >
> >
> >
> > 2. Scheduling group. A scheduling group is more a steam of events, so an event
> queue might be a better abstraction.
> >
> >
> >
> > 3. An event queue should support the concept of max active atomic flows
> (maximum number of active flows this queue can track at any given time) and
> max active ordered sequences (maximum number of outstanding events waiting
> to be egress reordered by this queue). This allows a scheduler implementation to
> dimension/partition its resources among event queues.
> >
> >
> >
> > 4. An event queue should support concept of a single consumer. In an
> application, a stream of events may need to be brought together to a single
> core for some stages of processing, e.g. for TX at the end of the pipeline to
> avoid NIC reordering of the packets. Having a 'single consumer' event queue for
> that stage allows the intensive scheduling logic to be short circuited and can
> improve throughput for s/w implementations.
> >
> >
> >
> > 5. Instead of tying eventdev access to an lcore, a higher level of abstraction
> called event port is needed which is the application i/f to the eventdev. Event
> ports are connected to event queues and is the object the application uses to
> dequeue and enqueue events. There can be more than one event port per lcore
> allowing multiple lightweight threads to have their own i/f into eventdev, if the
> implementation supports it. An event port abstraction also encapsulates
> dequeue depth and enqueue depth for a scheduler implementations which can
> schedule multiple events at a time and output events that can be buffered.
> >
> >
> >
> > 6. An event should support priority. Per event priority is useful for segregating
> high priority (control messages) traffic from low priority within the same flow.
> This needs to be part of the event definition for implementations which support
> it.
> >
> >
> >
> > 7. Event port to event queue servicing priority. This allows two event ports to
> connect to the same event queue with different priorities. For implementations
> which support it, this allows a worker core to participate in two different
> workflows with different priorities (workflow 1 needing 3.5 cores, workflow 2
> needing 2.5 cores, and so on).
> >
> >
> >
> > 8. Define the workflow as schedule/dequeue/enqueue. An implementation is
> free to define schedule as NOOP. A distributed s/w scheduler can use this to
> schedule events; also a centralized s/w scheduler can make this a NOOP on non-
> scheduler cores.
> >
> >
> >
> > 9. The schedule_from_group API does not fit the workflow.
> >
> >
> >
> > 10. The ctxt_update/ctxt_wait breaks the normal workflow. If the normal
> workflow is a dequeue -> do work based on event type -> enqueue, a pin_event
> argument to enqueue (where the pinned event is returned through the normal
> dequeue) allows application workflow to remain the same whether or not an
> implementation supports it.
> >
> >
> >
> > 11. Burst dequeue/enqueue needed.
> >
> >
> >
> > 12. Definition of a closed/open system - where open system is memory backed
> and closed system eventdev has limited capacity. In such systems, it is also
> useful to denote per event port how many packets can be active in the system.
> This can serve as a threshold for ethdev like devices so they don't overwhelm
> core to core events.
> >
> >
> >
> > 13. There should be sort of device capabilities definition to address different
> implementations.
> >
> >
> >
> >
> > vnr
> > ---
> >
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK
2016-10-04 21:49 Vangati, Narender
@ 2016-10-05 7:24 ` Jerin Jacob
2016-10-07 10:40 ` Hemant Agrawal
0 siblings, 1 reply; 7+ messages in thread
From: Jerin Jacob @ 2016-10-05 7:24 UTC (permalink / raw)
To: Vangati, Narender; +Cc: dev
On Tue, Oct 04, 2016 at 09:49:52PM +0000, Vangati, Narender wrote:
> Hi Jerin,
Hi Narender,
Thanks for the comments.I agree with proposed changes; I will address these comments in v2.
/Jerin
>
>
>
> Here are some comments on the libeventdev RFC.
>
> These are collated thoughts after discussions with you & others to understand the concepts and rationale for the current proposal.
>
>
>
> 1. Concept of flow queues. This is better abstracted as flow ids and not as flow queues which implies there is a queueing structure per flow. A s/w implementation can do atomic load balancing on multiple flow ids more efficiently than maintaining each event in a specific flow queue.
>
>
>
> 2. Scheduling group. A scheduling group is more a steam of events, so an event queue might be a better abstraction.
>
>
>
> 3. An event queue should support the concept of max active atomic flows (maximum number of active flows this queue can track at any given time) and max active ordered sequences (maximum number of outstanding events waiting to be egress reordered by this queue). This allows a scheduler implementation to dimension/partition its resources among event queues.
>
>
>
> 4. An event queue should support concept of a single consumer. In an application, a stream of events may need to be brought together to a single core for some stages of processing, e.g. for TX at the end of the pipeline to avoid NIC reordering of the packets. Having a 'single consumer' event queue for that stage allows the intensive scheduling logic to be short circuited and can improve throughput for s/w implementations.
>
>
>
> 5. Instead of tying eventdev access to an lcore, a higher level of abstraction called event port is needed which is the application i/f to the eventdev. Event ports are connected to event queues and is the object the application uses to dequeue and enqueue events. There can be more than one event port per lcore allowing multiple lightweight threads to have their own i/f into eventdev, if the implementation supports it. An event port abstraction also encapsulates dequeue depth and enqueue depth for a scheduler implementations which can schedule multiple events at a time and output events that can be buffered.
>
>
>
> 6. An event should support priority. Per event priority is useful for segregating high priority (control messages) traffic from low priority within the same flow. This needs to be part of the event definition for implementations which support it.
>
>
>
> 7. Event port to event queue servicing priority. This allows two event ports to connect to the same event queue with different priorities. For implementations which support it, this allows a worker core to participate in two different workflows with different priorities (workflow 1 needing 3.5 cores, workflow 2 needing 2.5 cores, and so on).
>
>
>
> 8. Define the workflow as schedule/dequeue/enqueue. An implementation is free to define schedule as NOOP. A distributed s/w scheduler can use this to schedule events; also a centralized s/w scheduler can make this a NOOP on non-scheduler cores.
>
>
>
> 9. The schedule_from_group API does not fit the workflow.
>
>
>
> 10. The ctxt_update/ctxt_wait breaks the normal workflow. If the normal workflow is a dequeue -> do work based on event type -> enqueue, a pin_event argument to enqueue (where the pinned event is returned through the normal dequeue) allows application workflow to remain the same whether or not an implementation supports it.
>
>
>
> 11. Burst dequeue/enqueue needed.
>
>
>
> 12. Definition of a closed/open system - where open system is memory backed and closed system eventdev has limited capacity. In such systems, it is also useful to denote per event port how many packets can be active in the system. This can serve as a threshold for ethdev like devices so they don't overwhelm core to core events.
>
>
>
> 13. There should be sort of device capabilities definition to address different implementations.
>
>
>
>
> vnr
> ---
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK
@ 2016-10-04 21:49 Vangati, Narender
2016-10-05 7:24 ` Jerin Jacob
0 siblings, 1 reply; 7+ messages in thread
From: Vangati, Narender @ 2016-10-04 21:49 UTC (permalink / raw)
To: dev
Hi Jerin,
Here are some comments on the libeventdev RFC.
These are collated thoughts after discussions with you & others to understand the concepts and rationale for the current proposal.
1. Concept of flow queues. This is better abstracted as flow ids and not as flow queues which implies there is a queueing structure per flow. A s/w implementation can do atomic load balancing on multiple flow ids more efficiently than maintaining each event in a specific flow queue.
2. Scheduling group. A scheduling group is more a steam of events, so an event queue might be a better abstraction.
3. An event queue should support the concept of max active atomic flows (maximum number of active flows this queue can track at any given time) and max active ordered sequences (maximum number of outstanding events waiting to be egress reordered by this queue). This allows a scheduler implementation to dimension/partition its resources among event queues.
4. An event queue should support concept of a single consumer. In an application, a stream of events may need to be brought together to a single core for some stages of processing, e.g. for TX at the end of the pipeline to avoid NIC reordering of the packets. Having a 'single consumer' event queue for that stage allows the intensive scheduling logic to be short circuited and can improve throughput for s/w implementations.
5. Instead of tying eventdev access to an lcore, a higher level of abstraction called event port is needed which is the application i/f to the eventdev. Event ports are connected to event queues and is the object the application uses to dequeue and enqueue events. There can be more than one event port per lcore allowing multiple lightweight threads to have their own i/f into eventdev, if the implementation supports it. An event port abstraction also encapsulates dequeue depth and enqueue depth for a scheduler implementations which can schedule multiple events at a time and output events that can be buffered.
6. An event should support priority. Per event priority is useful for segregating high priority (control messages) traffic from low priority within the same flow. This needs to be part of the event definition for implementations which support it.
7. Event port to event queue servicing priority. This allows two event ports to connect to the same event queue with different priorities. For implementations which support it, this allows a worker core to participate in two different workflows with different priorities (workflow 1 needing 3.5 cores, workflow 2 needing 2.5 cores, and so on).
8. Define the workflow as schedule/dequeue/enqueue. An implementation is free to define schedule as NOOP. A distributed s/w scheduler can use this to schedule events; also a centralized s/w scheduler can make this a NOOP on non-scheduler cores.
9. The schedule_from_group API does not fit the workflow.
10. The ctxt_update/ctxt_wait breaks the normal workflow. If the normal workflow is a dequeue -> do work based on event type -> enqueue, a pin_event argument to enqueue (where the pinned event is returned through the normal dequeue) allows application workflow to remain the same whether or not an implementation supports it.
11. Burst dequeue/enqueue needed.
12. Definition of a closed/open system - where open system is memory backed and closed system eventdev has limited capacity. In such systems, it is also useful to denote per event port how many packets can be active in the system. This can serve as a threshold for ethdev like devices so they don't overwhelm core to core events.
13. There should be sort of device capabilities definition to address different implementations.
vnr
---
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-10-09 8:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-09 1:01 [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK Jerin Jacob
2016-08-09 8:48 ` Bruce Richardson
2016-08-09 18:46 ` Jerin Jacob
2016-10-04 21:49 Vangati, Narender
2016-10-05 7:24 ` Jerin Jacob
2016-10-07 10:40 ` Hemant Agrawal
2016-10-09 8:27 ` Jerin Jacob
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).