From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 8A7782E41 for ; Wed, 1 Apr 2015 14:45:34 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP; 01 Apr 2015 05:44:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,503,1422950400"; d="scan'208";a="549535738" Received: from orsmsx110.amr.corp.intel.com ([10.22.240.8]) by orsmga003.jf.intel.com with ESMTP; 01 Apr 2015 05:44:56 -0700 Received: from orsmsx153.amr.corp.intel.com (10.22.226.247) by ORSMSX110.amr.corp.intel.com (10.22.240.8) with Microsoft SMTP Server (TLS) id 14.3.224.2; Wed, 1 Apr 2015 05:44:55 -0700 Received: from fmsmsx156.amr.corp.intel.com (10.18.116.74) by ORSMSX153.amr.corp.intel.com (10.22.226.247) with Microsoft SMTP Server (TLS) id 14.3.224.2; Wed, 1 Apr 2015 05:44:55 -0700 Received: from fmsmsx113.amr.corp.intel.com ([169.254.13.13]) by fmsmsx156.amr.corp.intel.com ([10.18.116.74]) with mapi id 14.03.0224.002; Wed, 1 Apr 2015 05:44:55 -0700 From: "Wiles, Keith" To: "dev@dpdk.org" Thread-Topic: [RFC] Adding multiple device types to DPDK. Thread-Index: AQHQbHmnRJG0cJWo7kasKgozQeIWPg== Date: Wed, 1 Apr 2015 12:44:54 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.254.35.22] Content-Type: text/plain; charset="iso-8859-1" Content-ID: <63751EDD909C1144B25A7835B3BA0E9C@intel.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: [dpdk-dev] [RFC] Adding multiple device types to DPDK. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Apr 2015 12:45:35 -0000 Hi all, (hoping format of the text is maintained) Bruce and myself are submitting this RFC in hopes of providing discussion points for the idea. Please do not get carried away with the code included, it was to help everyone understand the proposal/RFC. The RFC is to describe a proposed change we are looking to make to DPDK to add more device types. We would like to add in to DPDK the idea of a generic packet-device or =B3pktdev=B2, which can be thought of as a thin la= yer for all device classes. For other device types such as potentially a =B3cryptodev=B2 or =B3dpidev=B2. One of the main goals is to not effect performance and not require any current application to be modified. The pktdev layer is providing a light framework for developers to add a device to DPDK. Reason for Change ----------------- The reason why we are looking to introduce these concepts to DPDK are: * Expand the scope of DPDK so that it can provide APIs for more than just packet acquisition and transmission, but also provide APIs that can be used to work with other hardware and software offloads, such as cryptographic accelerators, or accelerated libraries for cryptographic functions. [The reason why both software and hardware are mentioned is so that the same APIs can be used whether or not a hardware accelerator is actually available]. * Provide a minimal common basis for device abstraction in DPDK, that can be used to unify the different types of packet I/O devices already existing in DPDK. To this end, the ethdev APIs are a good starting point, but the ethdev library contains too many functions which are NIC-specific to be a general-purpose set of APIs across all devices. Note: The idea was previously touched on here: http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/13545 Description of Proposed Change ------------------------------ The basic idea behind "pktdev" is to abstract out a few common routines and structures/members of structures by starting with ethdev structures as a starting point, cut it down to little more than a few members in each structure then possible add just rx_burst and tx_burst. Then use the structures as a starting point for writing a device type. Currently we have the rx_burst/tx_burst routines moved to the pktdev and it see like move a couple more common functions maybe resaonable. It could be the Rx/Tx routines in pktdev should be left as is, but in the code below is a possible reason to abstract a few routines into a common set of files. >>From there, we have the ethdev type which adds in the existing functions specific to Ethernet devices, and also, for example, a cryptodev which may add in functions specific for cryptographic offload. As now, with the ethdev, the specific drivers provide concrete implementations of the functionality exposed by the interface. This hierarchy is shown in the diagram below, using the existing ethdev and ixgbe drivers as a reference, alongside a hypothetical cryptodev class and driver implementation (catchingly called) "X": ,---------------------. | struct rte_pktdev | +---------------------+ | rte_pkt_rx_burst() | .-------| rte_pkt_tx_burst() |-----------. | `---------------------' | | | | | ,-------------------------------. ,------------------------------. | struct rte_ethdev | | struct rte_cryptodev | +-------------------------------+ +------------------------------+ | rte_eth_dev_configure() | | rte_crypto_init_sym_session()| | rte_eth_allmulticast_enable() | | rte_crypto_del_sym_session() | | rte_eth_filter_ctrl() | | | `-------------------------------' `---------------.--------------' | | | | ,---------'---------------------. ,---------------'--------------. | struct rte_pmd_ixgbe | | struct rte_pmd_X | +-------------------------------+ +------------------------------+ | .configure -> ixgbe_configure | | .init_session -> X_init_ses()| | .tx_burst -> ixgbe_xmit_pkts | | .tx_burst -> X_handle_pkts() | `-------------------------------' `------------------------------' We are not attempting to create a real class model here only looking at creating a very basic common set of APIs and structures for other device types. In terms of code changes for this, we obviously need to add in new interface libraries for pktdev and cryptodev. The pktdev library can define a skeleton structure for the first few elements of the nested structures to ensure consistency. Each of the defines below illustrate the common members in device structures, which gives some basic structure the device framework. Each of the defines are placed at the top of the devices matching structures and allows the devices to contain common and private data. The pkdev structures overlay the first common set of members for each device type. For example: ------------ We are using macros to reduce code changes to DPDK, but nested structures are a better solution: #define RTE_PKT_COMMON_DEV(_t) \ pkt_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */ \ pkt_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */ \ struct rte_##_t##_dev_data *data; /**< Pointer to device data */ \ const struct _t##_driver *driver; /**< Driver for this device */ \ struct _t##_dev_ops *dev_ops; /**< Functions exported by PMD */ \ struct rte_pci_device *pci_dev; /**< PCI info. supplied by probing */ \ /** User application callback for interrupts if present */ \ struct rte_##_t##_dev_cb_list link_intr_cbs; \ /** =20 \ * User-supplied functions called from rx_burst to post-process \ * received packets before passing them to the user \ */ =20 \ struct rte_##_t##_rxtx_callback **post_rx_burst_cbs; \ /** =20 \ * User-supplied functions called from tx_burst to pre-process \ * received packets before passing them to the driver for transmission. \ */ =20 \ struct rte_##_t##_rxtx_callback **pre_tx_burst_cbs; \ enum rte_pkt_dev_type dev_type; /**< Flag indicating the device type */ \ uint8_t attached /**< Flag indicating the port is attached */ /* Possible alignment or a hole in the structure */ #define RTE_PKT_NAME_MAX_LEN (32) #define RTE_PKT_COMMON_DEV_DATA \ char name[RTE_PKT_NAME_MAX_LEN]; /**< Unique identifier name */ \ =20 \ void **rx_queues; /**< Array of pointers to RX queues. */ \ void **tx_queues; /**< Array of pointers to TX queues. */ \ uint16_t nb_rx_queues; /**< Number of RX queues. */ \ uint16_t nb_tx_queues; /**< Number of TX queues. */ \ =20 \ uint16_t flags; /**< Bit fields for xyzdev's to use. */ \ uint16_t mtu; /**< Maximum Transmission Unit. */ \ uint8_t unit_id; /**< Unit ID for this instance */ \ uint8_t _filler[7]; /* alignment filler */ \ =20 \ /* 64bit alignment starts here */ \ void *dev_private; /**< PMD-specific private data */ \ uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation failures. */ \ uint32_t min_rx_buf_size; /**< Common rx buffer size handled by all queues */ \ uint32_t _pad0 #define port_id unit_id #define RTE_PKT_COMMON_DEV_INFO \ struct rte_pci_device *pci_dev; /**< Device PCI information. */ \ const char *driver_name; /**< Device Driver name. */ \ unsigned int if_index; /**< Index to bound host interface, or 0 if none. */ \ /* Use if_indextoname() to translate into an interface name. */ \ uint32_t _pad0 The above is attempting to collect the common members to be place into the top of private device structures as we feel these members should be fairly common among the device types. /** * @internal * The generic data structure associated with each device. * * Pointers to burst-oriented packet receive and transmit functions are * located at the beginning of the structure, along with the pointer to * where all the data elements for the particular device are stored in shared * memory. This split allows the function pointer and driver data to be per- * process, while the actual configuration data for the device is shared. */ struct rte_pkt_dev { RTE_PKT_COMMON_DEV(pkt); }; /** * @internal * The data part, with no function pointers, associated with each device. * * This structure is safe to place in shared memory to be common among different * processes in a multi-process configuration. */ struct rte_pkt_dev_data { RTE_PKT_COMMON_DEV_DATA; }; ------ The existing ethdev code can then have a minor updates such as those shown below: struct rte_eth_dev_info { RTE_PKT_COMMON_DEV_INFO; /* Private device data maybe here */ uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */ uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */ ... struct rte_eth_dev_data { RTE_PKT_COMMON_DEV_DATA; /**< Define located in */ /* Private device data maybe here */ struct rte_eth_dev_sriov sriov; /**< SRIOV data */ struct rte_eth_link dev_link; /**< Link-level information & status */ ... struct rte_eth_dev { RTE_PKT_COMMON_DEV(eth); /* Private device data maybe here */ }; /* Bit defines for flags in common pkt structure */ #define promiscuous 0x0008 /**< RX promiscuous mode ON(1) / OFF(0). */ #define scattered_rx 0x0004 /**< RX of scattered packets is ON(1) / OFF(0) */ #define all_multicast 0x0002 /**< RX all multicast mode ON(1) / OFF(0). */ #define dev_started 0x0001 /**< Device state: STARTED(1)/STOPPED(0) */=20 The advantage of doing a common set of member is the existing ethdev structures and APIs can remain exactly the same, but every ethdev is also a pktdev, which can be used as either as appropriate. Similarly for a type of crypto devices, or dpi devices (or software rings or KNI devices, if we so desire), we can base them off this common minimal framework and use them all in a similar manner. Moving some basic common functions and structures into a common set of files gives everyone a clean starting point for a new device plus adding a light framework. The pktdev code is normally not called directly from the application, but called from the device itself via a define in the device header files. The pktdev RX/TX routines can be called from the application, but the application needs to get the device structure pointer based on the port id. The cryptodev API maybe very different from other devices and following some type of Open Crypto API. The goal is not to restrict the device API, but try to give some type of structure to tghe design. Does it make sense to have a mbuf based Rx/Tx API, maybe not. Could the mbuf based APIs be hidden in the pktdev code, very possible. We have a lot of options here. How the two Rx/Tx routines are defined: --------------------------------------- /** * * Retrieve a burst of input packets from a receive queue of an Ethernet * device. * */ #define rte_eth_rx_burst(_pid, _qid, _pkts, _nb_pkts) \ rte_pkt_rx_burst((struct rte_pkt_dev *)&rte_eth_devices[_pid], _qid, _pkts, _nb_pkts) /** * Send a burst of output packets on a transmit queue of an Ethernet device. * */ #define rte_eth_tx_burst(_pid, _qid, _pkts, _nb_pkts) \ rte_pkt_tx_burst((struct rte_pkt_dev *)&rte_eth_device[_pid], _qid, _pkts, _nb_pkts) A snip of code showing some advantages and use case of using pktdev API: ------------------------------------------------------------------------ Not the complete code and it has not been tested and is only an example how one could use the design. /* * The lcore main. This is the main thread that does the work, reading from * an input port and writing to an output port. */ static __attribute__((noreturn)) void do_work(const struct pipeline_params *p) { printf("\nCore %u forwarding packets. %s -> %s\n", rte_lcore_id(), p->src->data->name, p->dst->data->name); /* Run until the application is quit or killed. */ for (;;) { /* * Receive packets on a src device and forward them on out * the dst device. */ /* Get burst of RX packets, from first port of pair. */ struct rte_mbuf *bufs[BURST_SIZE]; const uint16_t nb_rx =3D rte_pkt_rx_burst(p->src, 0, bufs, BURST_SIZE); if (unlikely(nb_rx =3D=3D 0)) continue; /* Send burst of TX packets, to second port of pair. */ const uint16_t nb_tx =3D rte_pkt_tx_burst(p->dst, 0, bufs, nb_rx); /* Free any unsent packets. */ if (unlikely(nb_tx < nb_rx)) { uint16_t buf; for (buf =3D nb_tx; buf < nb_rx; buf++) rte_pktmbuf_free(bufs[buf]); } } } /* * The main function, which does initialization and calls the per-lcore * functions. */ int main(int argc, char *argv[]) { struct pipeline_params p[RTE_MAX_LCORE]; struct rte_mempool *mbuf_pool; unsigned nb_ports, lcore_id; uint8_t portid; /* Initialize the Environment Abstraction Layer (EAL). */ int ret =3D rte_eal_init(argc, argv); if (ret < 0) rte_exit(EXIT_FAILURE, "Error with EAL initialization\n"); argc -=3D ret; argv +=3D ret; /* Check that there is an even number of ports to send/receive on. */ nb_ports =3D rte_eth_dev_count(); if (nb_ports < 2 || (nb_ports & 1)) rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n"); /* Creates a new mempool in memory to hold the mbufs. */ mbuf_pool =3D rte_mempool_create("MBUF_POOL", NUM_MBUFS * nb_ports, MBUF_SIZE, MBUF_CACHE_SIZE, sizeof(struct rte_pktmbuf_pool_private), rte_pktmbuf_pool_init, NULL, rte_pktmbuf_init, NULL, rte_socket_id(), 0); if (mbuf_pool =3D=3D NULL) rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); /* Initialize all ports. */ for (portid =3D 0; portid < nb_ports; portid++) if (port_init(portid, mbuf_pool) !=3D 0) rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", portid); struct rte_pkt_dev *in =3D rte_eth_get_dev(0); RTE_LCORE_FOREACH_SLAVE(lcore_id){ char name[RTE_RING_NAMESIZE]; snprintf(name, sizeof(name), "RING_from_%u", lcore_id); struct rte_pkt_dev *out =3D rte_ring_get_dev( rte_ring_create(name, 4096, rte_socket_id(), 0)); p[lcore_id].src =3D in; p[lcore_id].dst =3D out; rte_eal_remote_launch((lcore_function_t *)do_work, &p[lcore_id], lcore_id); in =3D out; // next pipeline stage reads from my output. } //now finish pipeline on master lcore lcore_id =3D rte_lcore_id(); p[lcore_id].src =3D in; p[lcore_id].dst =3D rte_eth_get_dev(1); do_work(&p[lcore_id]); return 0; } Changes to rte_ethdev.[ch] -------------------------- The most changes to rte_ethdev.[ch] was to use the new defines from rte_pkt.h. All of the references to the globals in ethdev had to be replaced with a reference to a global structure in ethdev. Moving the global or private data into a device specific structure seemed reasonable to reduce name space issues with new devices. The rx_burst/tx_burst routines were removed as they now exist in the rte_pktdev.c file. If we use nested structures instead of macros then more of the code will need to be converted or macros used to convert the members to address the nested structures. Example: #define rx_pkt_burst dev_data.rx_pkt_burst #define tx_pkt_burst dev_data.tx_pkt_burst Impact to Existing Applications ------------------------------- None. The existing APIs should all remain unchanged, only the underlying library code needs to change. [Obviously changes to apps will be needed to take advantage of new device classes as we make them available]. The crypto API could be similar to the Open Crypto APIs and they seem reasonable, but also using mbufs to hold data is just trying to use that container type to provide some common structure to the system. Some of the crypto data with be in the form of packets and some in the form of chunks of data, which the API should account for in the design. My goal is to provide a light weight framework for adding more devices and not try to make everthing look like Ethernet device. Regards, ++Keith and Bruce