From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03on0059.outbound.protection.outlook.com [104.47.42.59]) by dpdk.org (Postfix) with ESMTP id 6720C2C31 for ; Tue, 9 Aug 2016 03:01:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=S+XLE21Dytbbs/rz4XmDQONjJvJRSxrJ8/SvOw6bsi0=; b=jpClgnI8d/zvFrXp1L0XNwmrCPvJ0stLS8rj5wG7bhnPjeVHzynJRW8O6YllqiITXFCoAhv4g+p5iBPnbu8wTlpe/b5FfzHtoegc/9ruDUiF5EA945yqjrff8ygZI0Rw6WbeiC39GWDgCnKynCU7VySRK/H7qPEOV02D+MG2GUc= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Jerin.Jacob@cavium.com; Received: from localhost.localdomain (50.233.148.156) by BLUPR0701MB1714.namprd07.prod.outlook.com (10.163.85.140) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.549.15; Tue, 9 Aug 2016 01:01:45 +0000 Date: Tue, 9 Aug 2016 06:31:41 +0530 From: Jerin Jacob To: CC: , , , , Message-ID: <20160809010138.GA8143@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.6.2 (2016-07-01) X-Originating-IP: [50.233.148.156] X-ClientProxiedBy: BY2PR04CA025.namprd04.prod.outlook.com (10.141.249.143) To BLUPR0701MB1714.namprd07.prod.outlook.com (10.163.85.140) X-MS-Office365-Filtering-Correlation-Id: 91750d95-6a6a-4ad9-33e0-08d3bff0bc10 X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1714; 2:ShKkYufyzcOAQ5Bkx3KsYU7NZ5e/D8n/2bRCN8vQkF3gsLveGFj8zgDFQyP6O09B+evFQialq5JJNVlWh6umwl2F0WD8ZldFWsq2yAQUMb1z7GqaFAJkYsjCR/bJFpBWrqCz5FHeo1T2MeAMmH2A/Ihc8fhaePQ/P78strqV1mxHWO1rtYNmzQLuzkBi/Aoa; 3:JKnWdR16JhBJQY9re/660hUDuhYRQEy7TXmhhocXihJKbv3cn3hRmPImvJnnS1nU5paUinEG0xnjTyWnfSXVdsQZmYLS7BLfMizPon3E3ZyBd1eqdNsmHDQ5shbcKLm+; 25:hrSEYzlt5Ou1XH3bZMbRY4nxShzaTnFMX7nM1ZTtfoJcLCvrnxSAauPw3JJHsI6V95fWtEkFXazJOssNgrn+yyGPXw7xK1nlpyH6TwDCefCIAVDj21wQ4Jy8Bkakj3AYpMMmK8aw4cU84CwZ3Z0AVCBPBTD11Mw8yz4+eoLfDmTqXBN+82hk//BzLGk+bd8pvQrenrOQ4ckLJkNm4KpBduxgruDeqs3EhPe5FT85yUIcCR3rZaBk++IWva5AFXCfgQWIMYX8+PWQIZAqbZ9FyTTNnJjjjMk4Dd7/XoUdsnaCRaEfwZouLjMYOHacwFtZenDub+qtBEwG9ReG2ZTjoRaKcQhCs2SOF/DA/5z6UIwqenm++YCOO41C+Zp6x81iWRVDa861nVnMYr5NONEB2/AzW6Ze9pp99RK7UE9Wxjw= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BLUPR0701MB1714; X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1714; 31:rcDvXBRzyNWbs1jhAsuzwlE9vbiB4eh1QUnvN26Kr8G1kxmoRsOfF9g3Reg9cegFcn0ainKm8Mo0xdGHgypH/qP6DT0XpqlEFLHaUhv9PJbY6USf5VeuUNripjtZLVZCZq6YHEqvt5v5nkfDcM7+jG/eRT3doOp4hz7g6aZAIt8km3t9vFmxl7UXSpVBORLyd8kXOjnAoF2z9UiWcSsclBT+TLjDjiOffephoERwbeg=; 20:fj3T9ntB2Wpi6ajndiHcjeWt7jGHiNZsYiJ/jOlhBKuDP3+z+6s+N6E6wm/1kvFGb/1FUwBWh3Rnmkp9/p6e0w8OBN5aOB1pU2eNxvHXYtNfUUFTZNx2uozaE5QolVg/Hte+4XnhFFfxiEy7y3b5S4b25zwMxu9TxvfkgW1UbCQ/RnjYpaYnTYGaRbBQlMzg+5tccNiKTkkHs2XE/wSqr8eQ9DcjDI6i8dH/PHyN7IwqnzC+dxeozHFoPeOFzPHnUV99c752XLNvowKn9KYe76mq2SPEbxt7FDXKI/nEcF5zpFgLL19ZSwhUJYVSgNzLbE4+A82OtATHUUzlG+HLPkz3fLPUuKZGgjXC63OeeNVKGfiSSjBGQrXL0sdc6D4g161OUheR6arlC/kgSIlB4mn26HOQO+N8Ng2muEBAp+mTK23xsEWllz9+XaN/SZTP2E6REAKmqjNmfOJDMA3IxTrENBQfngHUSIH15wXoVGHZcWzqsZ7Wdjy9+MX8GjCjiZ/dN489wcBFu+rRP8roLtQ56Qt2kg85bMQJT4BA/KH5AQmesxWZy4dXbPyXuMW2s6DJY4TglhSg070ZzrCgmwgvxzw5hIHWPf8P8ERhrf8= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(166708455590820)(21532816269658)(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046); SRVR:BLUPR0701MB1714; BCL:0; PCL:0; RULEID:; SRVR:BLUPR0701MB1714; X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1714; 4:ffkx5R9lLLoBf+hCyELw0+mlUl/QWQUdt4ws2xF6XPtH1n9i+bTfzkzsFDG8s6Z1j3bwE09jtt7BZxPGHW39c0GdMZJ0gRNmSG6TUEsyhwSIVY0XOYEvJlAyGMSD3/dLzRPy1SWIJswYSj0AXCJHxx7TL1uw7QrOiMTUa20nH7dUTCZBsVQ4AbAaYqCH2g9auebMH29/mI4/RvK+hIiPgs28RxchOPfNxSnFBHS5eIK6ZVLAeprGO67eESsphytsCm9ZJYtTQoGpMfFNxxdle5EbsIMllxvdssJ0t1z+Yr+WASPWkFGpH/44XeQmZrSXsMfocvlXY6WUj4FVTA1sVe0RhY/ZYBGlKFglUn9kwO6hbzvfgue6Dm/T9wkynOn7tyTWVdo33+OHlBNCptFGy5boGHg2RwiRKNipxU9gNIKva6c8CLdgrAxYS1ccK+x77vzpsr0vLcc1lsPyx0uRLCjxqFJ7p5P8JDPh8GQ49Bg= X-Forefront-PRVS: 0029F17A3F X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(4630300001)(6009001)(6069001)(7916002)(53754006)(199003)(43544003)(189002)(83506001)(81166006)(66066001)(586003)(189998001)(81156014)(50986999)(54356999)(2906002)(47776003)(101416001)(77096005)(31430400001)(4326007)(2870700001)(110136002)(61506002)(107886002)(15975445007)(229853001)(9686002)(68736007)(2351001)(19580395003)(8676002)(92566002)(561944003)(97736004)(1076002)(23676002)(50466002)(7736002)(42186005)(4001350100001)(3846002)(106356001)(305945005)(33656002)(7846002)(105586002)(6116002)(8666005)(4001430100002)(18370500001)(7059030); DIR:OUT; SFP:1101; SCL:1; SRVR:BLUPR0701MB1714; H:localhost.localdomain; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtCTFVQUjA3MDFNQjE3MTQ7MjM6OFp4eFl5cWtwcVVLZUY5Q1daUzdTNTUz?= =?utf-8?B?TmZTaFFRUUNsNEh2a3lVZktpK2xCbXpKN3Z6Q2M3bHZ4VVdJc0NnbHF0V1Na?= =?utf-8?B?Q0FyZ1BHSkk4ZmdXcGZTMG1hOExoZURMdGhqeFZoRUxra0VSZ1JNem45MWxt?= =?utf-8?B?d21QZy9wNEFYT0ZuUnZlZHQ0UEY3S2Q3RzFCVitZOWdFcWptVUVOcURIbThI?= =?utf-8?B?c1VZWGIxK2JQelIvRlFMSE9QeGdHZ3MyTWw1bzdLbG0vM1JiK1p4NGxpSW1G?= =?utf-8?B?YkZ3eWtyY0x3NldhZ1ZQSFJvQ1loMHNuaWNTMDRKVkVPQVlaQ2ZMcEN3RFZ2?= =?utf-8?B?bzdheWFnRWxHcGc3QU8vTUpHL0dTakxMQXkxZHZycFdLb0RYUG40YkNCa09l?= =?utf-8?B?dVVlZ0hzdE5hLzFxOGRadm14UUR1b0tXa3IwbnVEMXYwb3NUV0FkUjZkemUy?= =?utf-8?B?QWV0MlRnMlh5aTU3L3BPZDRQa0ptMS9TN1U2eVhEQUg5Zm9CL2IvaG5JZzZD?= =?utf-8?B?Z0p5aHdpVzZKWTZtTnpRQUZEK1VmbXBNeUxHZTBkVTh0c3JoVXN4UUtDZlcw?= =?utf-8?B?TUpYNHpYenJVZHR6ajdWMzBjanJWRDdVMzg5MlNOdmJ1c0ZKM3pmUGs0QXQ1?= =?utf-8?B?TlZ5azBMUVM0VEFuT0sxMVlORkpOdXgzZDV1TmxnVWdtbTBjZHJ0UDQzMmhF?= =?utf-8?B?RXNBaGdvNFByRWxLZ3dVWEd5VkhoWWo4UGJMSWlCaWdMMEZncDNxNDI3UWEv?= =?utf-8?B?NkhPVmZjTFRwd0JRL2pDczRFMUkxZ3VzZ0hJSnlDNndCREhxRzdnSzgvWnNJ?= =?utf-8?B?VFh2N2NQeC9GRmpCMUN4RFNMclM3WGlZUFdWWUFmYU41VWM5c3pnalptNjU4?= =?utf-8?B?ZDRiTy9UckU4U2Nua1liNlNNcFU5c2g3TkZmL3FrcEhoN3U3OG1jQkordmFC?= =?utf-8?B?YTNYMWxNeG12bVk5bjFZSVJOSzBKWGNkRndFWXBTQWUrdTA3SWJYbnYrKzla?= =?utf-8?B?WWxNOCtwSUY4bURqTTZmQ25vMTZ5OHc4WWJKUjJKN0ZiNzR3MWRwQWx2aGt5?= =?utf-8?B?NURmQ0FMK3VNVEd2bi9lcmlFb0xLVjBjelVPcUdJVDBIZTNCL0FkSWpaVTZI?= =?utf-8?B?L3RMd3ZNMGlzNHBlRkY5NTJ2YzVIZ1d6YXdlTFFOTTJOeGZ5bnZDWjJ1bHc2?= =?utf-8?B?NnpxUFVHcmJGZkhPK0J0U2RhelBieitQOVEzNzZaajV4aGR0U2dPQXQxK080?= =?utf-8?B?b2NleFpzNWxTWXd0amd6dU9uOVc0cmNhZXg4ejdNYW5vY0Zid3IwSWFPTlZB?= =?utf-8?B?VE9oZkpnU0d2N1dkdXd3WHE4eHhVSy9WVk9aY1pWZExKeDlZR0IwcHhTL1FT?= =?utf-8?B?Syt0eDRDWXRJTHUwRjhiOEx6R0g0bFJoQW1RTFBSWC9mTkxVdWJvbmNRMjlH?= =?utf-8?B?OWkwVlhKMWwxOHp3MzlqSnFKTGhYL3FTMlNmMXk1U3RuTUZmazlmYzJuMFNL?= =?utf-8?B?WXNIc2tMY0tzOGtGQWRibkh4ZlNZUE4xcFdOdDUzVXJSdEhSRWJodHpoNFhx?= =?utf-8?B?NzVxUDFJT3ZKbXF3NFR2WExTU1k1K0s5RHJ5NllkTWoyc2kvVmFwN0ZhbEk0?= =?utf-8?B?WXF1NGhvbFkxZkU5V3Nkc3pMczhjWlVJOHFKVUtyMHcrdlNYRGwxS2hoQTJa?= =?utf-8?B?bk5DYWpJclRBcitqYnowWHpuY3dKdWFMbGUzZ0t2ZFhNdU40SXJGK243L2Iw?= =?utf-8?Q?dMhQq0MzvnK5UgfP1v6YST6vSNanPH++D4uT8rg=3D?= X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1714; 6:Phd4BSav+NdlaxKPpebO/v5jhumA17Y470Iu/OzpM87ib01SV34ctI9LsXeKCWAMg92p+IDHCJdvEpJTJcRHyOdzv3EslKHd3ayq/NDMwtuIDTGbm28d5X37btcdLs4mlI6QFe5+/Bb4+GgqBCWppZa7cTQHmqYfkWdnZh1Abi14cCoAHg1hPeseeJ/mx+e/5uy81SnPUlTFwlU8So64kTG8e4iE3c6915h50yJRoAKng6/t96uieh1MlXQiRu5XEm0MtmgSwqOAjG9sg5OrRWwXaGwkrgB8i8jVPo63BoE=; 5:XygWh9ihdUQSibKwVcRcGE89XlP++aVt6Nyq5gPABYZf/Egtj/Mi1dnA5LoTXasfYTpRP4WE/eEM7ec0/QWZBU0N4xG7/YXIYy4yMzpuzOe8zmgSyl6gVnUJLaoVgZwzVSMBOqCcjD9eybMN4l7SGw==; 24:+pn+WSZkJWqbFiEbcFjlPJWCUcoCIr2Ix27JyyAGrUtY/HvdwA9bqzrJ0g6lscKsqxAjDGvRA/kMUqWTal/Q3VxHC6fm6NtuSy05+oxTbl8=; 7:ggIIIQkFJIbVNr/ciLw/nvYp2lS3qcdR9nEKaN3IoW1ueindNckUVbWoGO1s3sCnq+NX5DiD5SnkF8A4xIn+e8Oe3oDaLCiSDiNxFdXj7EMxcujDjluIwiGKkuMGlm5BwYfRf46i/4PjtKDgyXinnx2tnP37lEAmx7tyhECMBStj7rHb7d376qn8HGTIHkNBxPXPUzygeK7grcKURxIq+wYf3etZcu3cNEN3wAMgdSWFcIUVErNLSteq8cN1mX6B SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2016 01:01:45.2121 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR0701MB1714 Subject: [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2016 01:01:50 -0000 Hi All, Find below an RFC API specification which attempts to define the standard application programming interface for event driven programming in DPDK and to abstract HW based event devices. These devices can support event scheduling and flow ordering in HW and typically found in NW SoCs as an integrated device or as PCI EP device. The RFC APIs are inspired from existing ethernet and crypto devices. Following are the requirements considered to define the RFC API. 1) APIs similar to existing Ethernet and crypto API framework for ○ Device creation, device Identification and device configuration 2) Enumerate libeventdev resources as numbers(0..N) to ○ Avoid ABI issues with handles ○ Event device may have million flow queues so it's not practical to have handles for each flow queue and its associated name based lookup in multiprocess case 3) Avoid struct mbuf changes 4) APIs to ○ Enumerate eventdev driver capabilities and resources ○ Enqueue events from l-core ○ Schedule events ○ Synchronize events ○ Maintain ingress order of the events ○ Run to completion support Find below the URL for the complete API specification. https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h I have created a supportive document to share the concepts of event driven programming model and proposed APIs details to get better reach for the specification. This presentation will cover introduction to event driven programming model concepts, characteristics of hardware-based event manager devices, RFC API proposal, example use case, and benefits of using the event driven programming model. Find below the URL for the supportive document. https://rawgit.com/jerinjacobk/libeventdev/master/DPDK-event_driven_programming_framework.pdf git repo for the above documents: https://github.com/jerinjacobk/libeventdev/ Looking forward to getting comments from both application and driver implementation perspective. What follows is the text version of the above documents, for inline comments and discussion. I intend to update that specification accordingly. /** * Get the total number of event devices that have been successfully * initialised. * * @return * The total number of usable event devices. */ extern uint8_t rte_eventdev_count(void); /** * Get the device identifier for the named event device. * * @param name * Event device name to select the event device identifier. * * @return * Returns event device identifier on success. * - <0: Failure to find named event device. */ extern uint8_t rte_eventdev_get_dev_id(const char *name); /* * Return the NUMA socket to which a device is connected. * * @param dev_id * The identifier of the device. * @return * The NUMA socket id to which the device is connected or * a default of zero if the socket could not be determined. * - -1: dev_id value is out of range. */ extern int rte_eventdev_socket_id(uint8_t dev_id); /** Event device information */ struct rte_eventdev_info { const char *driver_name; /**< Event driver name */ struct rte_pci_device *pci_dev; /**< PCI information */ uint32_t min_sched_wait_ns; /**< Minimum supported scheduler wait delay in ns by this device */ uint32_t max_sched_wait_ns; /**< Maximum supported scheduler wait delay in ns by this device */ uint32_t sched_wait_ns; /**< Configured scheduler wait delay in ns of this device */ uint32_t max_flow_queues_log2; /**< LOG2 of maximum flow queues supported by this device */ uint8_t max_sched_groups; /**< Maximum schedule groups supported by this device */ uint8_t max_sched_group_priority_levels; /**< Maximum schedule group priority levels supported by this device */ } /** * Retrieve the contextual information of an event device. * * @param dev_id * The identifier of the device. * @param[out] dev_info * A pointer to a structure of type *rte_eventdev_info* to be filled with the * contextual information of the device. */ extern void rte_eventdev_info_get(uint8_t dev_id, struct rte_eventdev_info *dev_info); /** Event device configuration structure */ struct rte_eventdev_config { uint32_t sched_wait_ns; /**< rte_event_schedule() wait for *sched_wait_ns* ns on this device */ uint32_t nb_flow_queues_log2; /**< LOG2 of the number of flow queues to configure on this device */ uint8_t nb_sched_groups; /**< The number of schedule groups to configure on this device */ }; /** * Configure an event device. * * This function must be invoked first before any other function in the * API. This function can also be re-invoked when a device is in the * stopped state. * * The caller may use rte_eventdev_info_get() to get the capability of each * resources available in this event device. * * @param dev_id * The identifier of the device to configure. * @param config * The event device configuration structure. * * @return * - 0: Success, device configured. * - <0: Error code returned by the driver configuration function. */ extern int rte_eventdev_configure(uint8_t dev_id, struct rte_eventdev_config *config); #define RTE_EVENT_SCHED_GRP_PRI_HIGHEST 0 /**< Highest schedule group priority */ #define RTE_EVENT_SCHED_GRP_PRI_NORMAL 128 /**< Normal schedule group priority */ #define RTE_EVENT_SCHED_GRP_PRI_LOWEST 255 /**< Lowest schedule group priority */ struct rte_eventdev_sched_group_conf { rte_cpuset_t lcore_list; /**< List of l-cores has membership in this schedule group */ uint8_t priority; /**< Priority for this schedule group relative to other schedule groups. If the event device's *max_sched_group_priority_levels* are not in the range of requested *priority* then event driver can normalize to required priority value in the range of [RTE_EVENT_SCHED_GRP_PRI_HIGHEST, RTE_EVENT_SCHED_GRP_PRI_LOWEST]*/ uint8_t enable_all_lcores; /**< Ignore *core_list* and enable all the l-cores */ }; /** * Allocate and set up a schedule group for a event device. * * @param dev_id * The identifier of the device. * @param group_id * The index of the schedule group to setup. The value must be in the range * [0, nb_sched_groups - 1] previously supplied to rte_eventdev_configure(). * @param group_conf * The pointer to the configuration data to be used for the schedule group. * NULL value is allowed, in which case default configuration used. * @param socket_id * The *socket_id* argument is the socket identifier in case of NUMA. * The value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the * DMA memory allocated for the receive schedule group. * * @return * - 0: Success, schedule group correctly set up. * - <0: Schedule group configuration failed */ extern int rte_eventdev_sched_group_setup(uint8_t dev_id, uint8_t group_id, const struct rte_eventdev_sched_group_conf *group_conf, int socket_id); /** * Get the number of schedule groups on a specific event device * * @param dev_id * Event device identifier. * @return * - The number of configured schedule groups */ extern uint16_t rte_eventdev_sched_group_count(uint8_t dev_id); /** * Get the priority of the schedule group on a specific event device * * @param dev_id * Event device identifier. * @param group_id * Schedule group identifier. * @return * - The configured priority of the schedule group in * [RTE_EVENT_SCHED_GRP_PRI_HIGHEST, RTE_EVENT_SCHED_GRP_PRI_LOWEST] range */ extern uint8_t rte_eventdev_sched_group_priority(uint8_t dev_id, uint8_t group_id); /** * Get the configured flow queue id mask of a specific event device * * *flow_queue_id_mask* can be used to generate *flow_queue_id* value in the * range [0 - (2^max_flow_queues_log2 -1)] of a specific event device. * *flow_queue_id* value will be used in the event enqueue operation * and comparing scheduled event *flow_queue_id* value against enqueued value. * * @param dev_id * Event device identifier. * @return * - The configured flow queue id mask */ extern uint32_t rte_eventdev_flow_queue_id_mask(uint8_t dev_id); /** * Start an event device. * * The device start step is the last one and consists of setting the schedule * groups and flow queues to start accepting the events and schedules to l-cores. * * On success, all basic functions exported by the API (event enqueue, * event schedule and so on) can be invoked. * * @param dev_id * Event device identifier * @return * - 0: Success, device started. * - <0: Error code of the driver device start function. */ extern int rte_eventdev_start(uint8_t dev_id); /** * Stop an event device. The device can be restarted with a call to * rte_eventdev_start() * * @param dev_id * Event device identifier. */ extern void rte_eventdev_stop(uint8_t dev_id); /** * Close an event device. The device cannot be restarted! * * @param dev_id * Event device identifier * * @return * - 0 on successfully closing device * - <0 on failure to close device */ extern int rte_eventdev_close(uint8_t dev_id); /* Scheduler synchronization method */ #define RTE_SCHED_SYNC_ORDERED 0 /**< Ordered flow queue synchronization * * Events from an ordered flow queue can be scheduled to multiple l-cores for * concurrent processing while maintaining the original event order. This * scheme enables the user to achieve high single flow throughput by avoiding * SW synchronization for ordering between l-cores. * * The source flow queue ordering is maintained when events are enqueued to * their destination queue(s) within the same ordered queue synchronization * context. A l-core holds the context until it requests another event from the * scheduler, which implicitly releases the context. User may allow the * scheduler to release the context earlier than that by calling * rte_event_schedule_release() * * Events from the source flow queue appear in their original order when * dequeued from a destination flow queue irrespective of its * synchronization method. Event ordering is based on the received event(s), * but also other (newly allocated or stored) events are ordered when enqueued * within the same ordered context.Events not enqueued (e.g. freed or stored) * within the context are considered missing from reordering and are skipped at * this time (but can be ordered again within another context). * */ #define RTE_SCHED_SYNC_ATOMIC 1 /**< Atomic flow queue synchronization * * Events from an atomic flow queue can be scheduled only to a single l-core at * a time. The l-core is guaranteed to have exclusive (atomic) access to the * associated flow queue context, which enables the user to avoid SW * synchronization. Atomic flow queue also helps to maintain event ordering * since only one l-core at a time is able to process events from a flow queue. * * The atomic queue synchronization context is dedicated to the l-core until it * requests another event from the scheduler, which implicitly releases the * context. User may allow the scheduler to release the context earlier than * that by calling rte_event_schedule_release() * */ #define RTE_SCHED_SYNC_PARALLEL 2 /**< Parallel flow queue * * The scheduler performs priority scheduling, load balancing etc functions * but does not provide additional event synchronization or ordering. * It's free to schedule events from single parallel queue to multiple l-core * for concurrent processing. Application is responsible for flow queue context * synchronization and event ordering (SW synchronization). * */ /* Event types to classify the event source */ #define RTE_EVENT_TYPE_ETHDEV 0x0 /**< The event generated from ethdev subsystem */ #define RTE_EVENT_TYPE_CRYPTODEV 0x1 /**< The event generated from crypodev subsystem */ #define RTE_EVENT_TYPE_TIMERDEV 0x2 /**< The event generated from timerdev subsystem */ #define RTE_EVENT_TYPE_LCORE 0x3 /**< The event generated from l-core. Application may use *sub_event_type* * to further classify the event */ #define RTE_EVENT_TYPE_INVALID 0xf /**< Invalid event type */ #define RTE_EVENT_TYPE_MAX 0x16 /**< The generic rte_event structure to hold the event attributes */ struct rte_event { union { uint64_t u64; struct { uint32_t flow_queue_id; /**< Flow queue identifier to choose the flow queue in * enqueue and schedule operation. * The value must be the range of * rte_eventdev_flow_queue_id_mask() */ uint8_t sched_group_id; /**< Schedule group identifier to choose the schedule * group in enqueue and schedule operation. * The value must be in the range * [0, nb_sched_groups - 1] previously supplied to * rte_eventdev_configure(). */ uint8_t sched_sync; /**< Scheduler synchronization method associated * with flow queue for enqueue and schedule operation */ uint8_t event_type; /**< Event type to classify the event source */ uint8_t sub_event_type; /**< Sub-event types based on the event source */ }; }; union { uintptr_t event; /**< Opaque event pointer */ struct rte_mbuf *mbuf; /**< mbuf pointer if the scheduled event is associated with mbuf */ }; } /** * * Enqueue the event object supplied in *rte_event* structure on flow queue * identified as *flow_queue_id* associated with the schedule group * *sched_group_id*, scheduler synchronization method and its event types * on an event device designated by its *dev_id*. * * @param dev_id * Event device identifier. * @param ev * Pointer to struct rte_event * @return * - 0 on success * - <0 on failure */ extern int rte_eventdev_enqueue(uint8_t dev_id, struct rte_event *ev); /** * Enqueue a burst of events objects supplied in *rte_event* structure * on an event device designated by its *dev_id*. * * The rte_eventdev_enqueue_burst() function is invoked to enqueue * multiple event objects. Its the burst variant of rte_eventdev_enqueue() * function * * The *num* parameter is the number of event objects to enqueue which are * supplied in the *ev* array of *rte_event* structure. * * The rte_eventdev_enqueue_burst() function returns the number of * events objects it actually enqueued . A return value equal to * *num* means that all event objects have been enqueued. * * @param dev_id * The identifier of the device. * @param ev * The address of an array of *num* pointers to *rte_event* structure * which contain the event object enqueue operations to be processed. * @param num * The number of event objects to enqueue * * @return * The number of event objects actually enqueued on the event device. The return * value can be less than the value of the *num* parameter when the * event devices flow queue is full or if invalid parameters are specified in * a *rte_event*. If return value is less than *num*, the remaining events at * the end of ev[] are not consumed, and the caller has to take care of them. */ extern int rte_eventdev_enqueue_burst(uint8_t dev_id, struct rte_event *ev[], int num); /** * Schedule an event to the caller l-core from the event device designated by * its *dev_id*. * * rte_event_schedule() does not dictate the specifics of scheduling algorithm as * each eventdev driver may have different criteria to schedule an event. * However, in general, from an application perspective scheduler may use * following scheme to dispatch an event to l-core * * 1) Selection of schedule group * a) The Number of schedule group available in the event device * b) The caller l-core membership in the schedule group. * c) Schedule group priority relative to other schedule groups. * 2) Selection of flow queue and event * a) The Number of flow queues available in event device * b) Scheduler synchronization method associated with the flow queue * * On successful scheduler event dispatch, The caller l-core holds scheduler * synchronization context associated with the dispatched event, an explicit * rte_event_schedule_release() or rte_event_schedule_ctxt_*() or next * rte_event_schedule() call shall release the context * * @param dev_id * The identifier of the device. * @param[out] ev * Pointer to struct rte_event. On successful event dispatch, Implementation * updates the event attributes * @param wait * When true, wait for event till available or *sched_wait_ns* ns which * previously supplied to rte_eventdev_configure() * * @return * When true, a valid event has been dispatched by the scheduler. * */ extern bool rte_event_schedule(uint8_t dev_id, struct rte_event *ev, bool wait); /** * Schedule an event to the caller l-core from a specific schedule group * *group_id* of event device designated by its *dev_id*. * * Like rte_event_schedule(), but schedule group provided as argument *group_id* * * @param dev_id * The identifier of the device. * @param group_id * Schedule group identifier to select the schedule group for event dispatch * @param[out] ev * Pointer to struct rte_event. On successful event dispatch, Implementation * updates the event attributes * @param wait * When true, wait for event till available or *sched_wait_ns* ns which * previously supplied to rte_eventdev_configure() * * @return * When true, a valid event has been dispatched by the scheduler. * */ extern bool rte_event_schedule_from_group(uint8_t dev_id, uint8_t group_id, struct rte_event *ev, bool wait); /** * Release the current scheduler synchronization context associated with the * scheduler dispatched event * * If current scheduler synchronization context method is *RTE_SCHED_SYNC_ATOMIC* * then this function hints the scheduler that the user has completed critical * section processing in the current atomic context. * The scheduler is now allowed to schedule events from the same flow queue to * another l-core. * Early atomic context release may increase parallelism and thus system * performance, but user needs to design carefully the split into critical vs. * non-critical sections. * * If current scheduler synchronization context method is *RTE_SCHED_SYNC_ORDERED* * then this function hints the scheduler that the user has done all enqueues * that need to maintain event order in the current ordered context. * The scheduler is allowed to release the ordered context of this l-core and * avoid reordering any following enqueues. * Early ordered context release may increase parallelism and thus system * performance, since scheduler may start reordering events sooner than the next * schedule call. * * If current scheduler synchronization context method is *RTE_SCHED_SYNC_PARALLEL* * then this function is a nop * * @param dev_id * The identifier of the device. * */ extern void rte_event_schedule_release(uint8_t dev_id); /** * Update the current schedule context associated with caller l-core * * rte_event_schedule_ctxt_update() can be used to support run-to-completion * model where the application requires the current *event* to stay on the same * l-core as it moves through the series of processing stages, provided the * event type is *RTE_EVENT_TYPE_LCORE*. * * In the context of run-to-completion model, rte_eventdev_enqueue() * and its associated rte_event_schedule() can be replaced by * rte_event_schedule_ctxt_update() if caller requires to current event to * stay on caller l-core for new *flow_queue_id* and/or new *sched_sync* * and/or new *sub_event_type* values * * All of the arguments should be equal to their current schedule context values * unless the application needs the dispatcher to modify the event attribute * of a dispatched event. * * rte_event_schedule_ctxt_update() is a costly operation, by splitting it as * functions(rte_event_schedule_ctxt_update() and rte_event_schedule_ctxt_wait()) * allows caller to overlap the context update latency with other profitable * work * * @param dev_id * The identifier of the device. * @param flow_queue_id * The new flow queue identifier * @param sched_sync * The new schedule synchronization method * @param sub_event_type * The new sub_event_type where event_type == RTE_EVENT_TYPE_LCORE * @param wait * When true, wait until context update completes * When false, request to update the attribute may optionally start an * operation that may not finish when this function returns. * In that case, this function return '1' to indicate the application to * call rte_event_schedule_ctxt_wait() before processing with an * operation that requires the completion of the requested event attribute * change * @return * - <0 on failure * - 0 on if event attribute update operation has been completed. * - 1 on if event attribute update operation has begun asynchronously. * */ extern int rte_event_schedule_ctxt_update(uint8_t dev_id, uint32_t flow_queue_id, uint8_t sched_sync, uint8_t sub_event_type, bool wait); /** * Wait for l-core associated event update operation to complete on the * event device designated by its *dev_id*. * * The caller l-core wait until a previously started event attribute update * operation from the same l-core till it completes * * This function is invoked when rte_event_schedule_ctxt_update() returns '1' * * @param dev_id * The identifier of the device. */ extern void rte_event_schedule_ctxt_wait(uint8_t dev_id); /** * Join the caller l-core to a schedule group *group_id* of the event device * designated by its *dev_id*. * * l-core membership in the schedule group can be configured with * rte_eventdev_sched_group_setup() prior to rte_eventdev_start() * * @param dev_id * The identifier of the device. * @param group_id * Schedule group identifier to select the schedule group to join * * @return * - 0 on success * - <0 on failure */ extern int rte_event_schedule_group_join(uint8_t dev_id, uint8_t group_id); /** * Leave the caller l-core from a schedule group *group_id* of the event device * designated by its *dev_id*. * * This function will unsubscribe the calling l-core from receiving events from * the specified schedule group *group_id* * * l-core membership in the schedule group can be configured with * rte_eventdev_sched_group_setup() prior to rte_eventdev_start() * * @param dev_id * The identifier of the device. * @param group_id * Schedule group identifier to select the schedule group to join * * @return * - 0 on success * - <0 on failure */ extern int rte_event_schedule_group_leave(uint8_t dev_id, uint8_t group_id); *************** text version of the presentation document ************************ Agenda Event driven programming model concepts in data plane perspective Characteristics of HW based event manager devices libeventdev Example use case - Simple IPSec outbound processing Benefits of event driven programming model Future work Event driven programming model - Concepts Event is an asynchronous notification from HW/SW to CPU core Typical examples of events in dataplane are Packets from ethernet device Crypto work completion notification from Crypto HW Timer expiry notification from Timer HW CPU generates an event to notify another CPU(used in pipeline mode) Event driven programming is a programming paradigm in which flow of the program is determined by events Core 0 queue0 Core 1 Core n Scheduler queue N queue3 queue2 queue1 packet event Timer expiry ev Crypto done ev SW event Packet event, Timer expiry event and crypto work complete event are the typical HW generated events Core can also produce the SW event to notify another core for work completion Queue 0..N stores the events Scheduler schedules an event to core Core process the event and enqueue to another downstream queue for further processing or send the event/packet to wire Event driven programming model - Concepts Characteristics of HW based event device Millions of flow queues Events associated with a single flow queue can be scheduled on multiple CPUs for concurrent processing while maintaining the original event order Provides synchronization of the events without SW lock schemes Priority based scheduling to enable the QoS Event device may have 1 to N schedule groups Each core can be a member of any subset of schedule groups Each core decides which schedule group(s) it accepts the events from Schedule groups provide a means to execute different functions on different cores Flow queues grouped into schedule groups Core to schedule group membership can be changed at runtime to support scaling and reduce the latency of critical work by assigning more cores at runtime Event scheduler is implemented in HW to the save CPU cycles libeventdev components Core 0 Core 1 Core n Scheduler packet event Timer expiry ev Crypto done ev SW event flowqueue n flowqueue2 flowqueue1 flowqueue0 flowqueue n flowqueue2 flowqueue1 flowqueue0 flowqueue n flowqueue2 flowqueue1 flowqueue0 Sched group0 Sched group1 Sched group n enqueue(grp_id, flow_queue_id, schedule_sync. event_type, event) {grp,flow_queueid,schedule_sync, event_type, event}= schedule() priority x priority y priority z Core 0's Sched Group bitmask: 100011 Group 0 Group 1 Group n Core 1's Sched group bitmask: 000001 Group 0 Each core has group-mask to capture, the list of schedule groups participate in schedule() API Interface Southbound eventdev driver interface libeventdev - flow Event driver registers with libeventdev subsystem and subsystem provide a unique device id Application get the device capabilities with rte_eventdev_info_get(dev_id), like The number of schedule groups The number of flow queues in a schedule group Application configures the event device and each schedule groups in the event device, like The number of schedule groups and the flow queues are required Priority of each schedule group and list of l-cores associated with it Connect schedule groups with other HW event producers in the system like ethdev and crypto etc In fastpath, HW/SW enqueues the events to flow queues associated with schedule groups Core gets the event through scheduler by invoking rte_event_scheduler() from lcore Core process the event and enqueue to another downstream queue for further processing or send the event/packet to wire if it is the last stage of the processing rte_event_scheduler() schedules the event based on selection of the schedule group The caller l-core membership in the schedule group Schedule group priority relative to other schedule groups. selection of the flow queue and the event inside the schedule group Scheduler sync method associated with the flow queue(ATOMIC vs ORDERED/PARALLEL) Schedule sync methods (How events are Synchronized) PARALLEL Events from a parallel flow queue can be scheduled to multiple cores for concurrent processing Ingress order is not maintained ATOMIC Events from an atomic flow queue can schedule only to a single core at a time Enable critical section in packet processing like sequence number update etc Ingress order is maintained as outstanding is always one at a time ORDERED Events from the ordered flow queue can be scheduled to multiple cores for concurrent processing Ingress order is maintained Enable high single flow throughput ORDERED flow queue for ingress ordering 6 5 4 3 2 1 ORDERED flow queue Scheduler Cores processing ordered events in parallel 4 6 3 1 2 5 6 5 4 3 2 1 Any downstream flow queue rte_event_schedule() rte_event_queue_enqueue() The source ORDERED flow queue’s ingress order shall be maintained when events are enqueued to any downstream flow queue Use case (Simple IPSec Outbound processing) PHASE1: POLICY/SA, ROUTE Lookup In parallel (ORDERED) Port 0 RX Port 1 RX Port 2 RX Port 3 RX Port 4 RX Port 6 RX Port 0 TX Port 1 TX Port 2 TX Port 3 TX Port 4 TX Port 6 TX PHASE2: SEQ Number update per SA (ATOMIC) PHASE3: HW assisted IPSec crypto PHASE4: Core sends encrypted pks to Tx port queues (ATOMIC) Packets enqueued into one of up to 1M flow queues based on a classification criterion(e.g 5 tuple hash) PHASE1 generates a unique SA based on input packet and SA tables. Each SA flow will be processed in parallel. Core enqueues on ATOMIC flow queue for critical section processing per SA Crypto HW sends the crypto work completion event to notify the core. Crypto HW processes the crypto operations in background Core issues IPSec crypto request to HW Simple IPSec Outbound processing - Cores View Core n Core 1 Core 0 while(1) { event = rte_event_schedule(); process the specific phase call different enqueue() to send to - atomic flow queue - crypto HW engine queue - TX port queue } Scheduler N HW crypto assist Tx port queue Tx port queue Tx port queue Per SA, Core enqueues on ATOMIC flow queue for critical section phase of the flow On completion of crypto work, HW generates the crypto work completion notification RX pkt HW enqueues one of millions flow to ORDERED flow queues Flow queues SA Flow queues Flow queues SA Core enqueues the crypto work API Requirements APIs similar to existing ethernet and crypto API framework for Device creation, device Identification and device configuration Enumerate libeventdev resources as numbers(0..N) to Avoid ABI issues with handles event device may have million flow queues so it's not practical to have handles for each flow queue and its associated name based lookup in multiprocess case Avoid struct mbuf changes APIs to Enumerate eventdev driver capabilities and resources Enqueue events from l-core Schedule events Synchronize events Maintain ingress order of the events API - Slow path APIs similar to existing ethernet and crypto API framework for Device creation - Physical event devices are discovered during the PCI probe/enumeration of the EAL function which is executed at DPDK initialization, based on their PCI device identifier, each unique PCI BDF (bus/bridge, device, function) Device Identification - A unique device index used to designate the event device in all functions exported by the eventdev API. Device Capability discovery rte_eventdev_info_get() - To get the global resources like number of schedule groups and number of flow queues per schedule group etc of the event device Device configuration rte_eventdev_configure() - configures the number of schedule groups and the number of flow queues on the schedule groups rte_eventdev_sched_group_setup() - configures schedule group specific configuration like priority and the list of l-core has membership in the schedule group Device state change - rte_eventdev_start()/stop()/close() like ethdev device API - Fast path bool rte_event_schedule(uint8_t dev_id, struct rte_event *ev, bool wait); Schedule an event to the caller l-core from a specific schedule group of event device designated by its dev_id bool rte_event_schedule_from_group(uint8_t dev_id, uint8_t group_id,struct rte_event *ev, wait) Like rte_event_schedule(), but schedule group provided as argument void rte_event_schedule_release(uint8_t dev_id); Release the current scheduler synchronization context associated with the scheduler dispatched event int rte_event_schedule_group_[join/leave](uint8_t dev_id, uint8_t group_id); Leave/Joins the caller l-core from/to a schedule group bool rte_event_schedule_ctxt_update(uint8_t dev_id, uint32_t flow_queue_id, uint8_t sched_sync, uint8_t sub_event_type, bool wait); rte_event_schedule_ctxt_update() can be used to support run-to-completion model where the application requires the current *event* to stay on the same l-core as it moves through the series of processing stages, provided the event type is RTE_EVENT_TYPE_LCORE Fast path APIs - Simple IPSec outbound example #define APP_STATE_SEQ_UPDATE 0 on each lcore { struct rte_event ev; uint32_t flow_queue_id_mask = rte_eventdev_flow_queue_id_mask(eventdev); while (1) { ret = rte_event_schedule(eventdev, &ev, true); if (!ret) continue; /* packets from HW rx ports proceed parallely per flow(ORDERED)*/ if (ev.event_type == RTE_EVENT_TYPE_ETHDEV) { sa = outbound_sa_lookup(ev.mbuf); modify the packet per SA attributes find the tx port and tx queue from routing table /* move to next phase (atomic seq number update per sa) */ ev.flow_queue_id = sa & flow_queue_id_mask; ev.sched_sync = RTE_SCHED_SYNC_ATOMIC; ev.sub_event_id = APP_STATE_SEQ_UPDATE; rte_event_enqueue(evendev, ev); } else if (ev.event_type == RTE_EVENT_TYPE_LCORE && ev.sub_event_id == APP_STATE_SEQ_UPDATE) { sa = ev.flow_queue_id; /* do critical section work per sa */ do_critical_section_work(sa); /* Issue the crypto request and generate the following on crypto work completion */ ev.flow_queue_id = tx_port; ev.sub_event_id = tx_queue_id; ev.sched_sync = RTE_SCHED_SYNC_ATOMIC; rte_cryptodev_event_enqueue(cryptodev, ev.mbuf, eventdev, ev); } } else if((ev.event_type == RTE_EVENT_TYPE_CRYPTODEV) tx_port = ev.flow_queue_id; tx_queue_id = ev.sub_evend_id; send the packet to tx port/queue } } } rte_event_schedule_ctxt_update() can be used to support run-to-completion model where the application requires the current event to stay on same l-core as it moves through the series of processing stages, provided the event type is RTE_EVENT_TYPE_LCORE(l-core to l-core communication) For example in the previous use case, the ATOMIC sequence number update per SA can be achieved like below Scheduler context update is costly operation, by spliting it as two functions(rte_event_schedule_ctxt_update() and rte_event_schedule_ctxt_wait()) allows application to overlap the context switch latency with other profitable work Run-to-completion model support /* move to next phase (atomic seq number update per sa) */ ev.flow_queue_id = sa & flow_queue_id_mask; ev.sched_sync = RTE_SCHED_SYNC_ATOMIC; ev.sub_event_id = APP_STATE_SEQ_UPDATE; rte_event_enqueue(evendev, ev); } else if (ev.event_type == RTE_EVENT_TYPE_LCORE && ev.sub_event_id == APP_STATE_SEQ_UPDATE) { sa = ev.flow_queue_id; /* do critical section work per sa */ do_critical_section_work(sa); /* move to next phase (atomic seq number update per sa) */ rte_event_schedule_ctxt_update(eventdev, sa & flow_queue_id_mask, RTE_SCHED_SYNC_ATOMIC, APP_STATE_SEQ_UPDATE, true); /* do critical section work per sa */ do_critical_section_work(sa); Benefits of event driven programming model Enable high single flow throughput with ORDERED schedule sync method The processing stages are not bound to specific cores. It provides better load-balancing and scaling capabilities than traditional pipelining. Prioritize: Guarantee lcores work on the highest priority event available Support asynchronous operations which allow the cores to stay busy while hardware manages requests. Remove the static mappings between core to port/rx queue Scaling from 1 to N flows are easy as its not bound to specific cores Future work Integrate the event device with ethernet, crypto and timer subsystems in DPDK Ethdev/event device integration is possible by extending new 6WIND’s ingress classification specification where a new action type can establish ethdev’s port to eventdev’s schedule group connection Cryptodev needs some change at configuration stage to set crypto work complete event delivery mechanism Spec out timerdev for PCI based timer event devices(timer event devices generates timer expiry event vs callback in the existing SW based timer scheme) Event driven model operates on a single event at a time. Need to create a helper API to make it burst in nature for the final enqueues to different HW block like ethdev tx-queue