DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Wiles, Keith" <keith.wiles@intel.com>
To: "dev@dpdk.org" <dev@dpdk.org>
Subject: [dpdk-dev] [RFC] Adding multiple device types to DPDK.
Date: Wed, 1 Apr 2015 12:44:54 +0000	[thread overview]
Message-ID: <D1408516.1A07B%keith.wiles@intel.com> (raw)

Hi all, (hoping format of the text is maintained)

Bruce and myself are submitting this RFC in hopes of providing discussion
points for the idea. Please do not get carried away with the code
included, it was to help everyone understand the proposal/RFC.

The RFC is to describe a proposed change we are looking to make to DPDK to
add more device types. We would like to add in to DPDK the idea of a
generic packet-device or ³pktdev², which can be thought of as a thin layer
for all device classes. For other device types such as potentially a
³cryptodev² or ³dpidev². One of the main goals is to not effect
performance and not require any current application to be modified. The
pktdev layer is providing a light framework for developers to add a device
to DPDK.

Reason for Change

The reason why we are looking to introduce these concepts to DPDK are:

* Expand the scope of DPDK so that it can provide APIs for more than just
packet acquisition and transmission, but also provide APIs that can be
used to work with other hardware and software offloads, such as
cryptographic accelerators, or accelerated libraries for cryptographic
functions. [The reason why both software and hardware are mentioned is so
that the same APIs can be used whether or not a hardware accelerator is
actually available].
* Provide a minimal common basis for device abstraction in DPDK, that can
be used to unify the different types of packet I/O devices already
existing in DPDK. To this end, the ethdev APIs are a good starting point,
but the ethdev library contains too many functions which are NIC-specific
to be a general-purpose set of APIs across all devices.
     Note: The idea was previously touched on here:

Description of Proposed Change

The basic idea behind "pktdev" is to abstract out a few common routines
and structures/members of structures by starting with ethdev structures as
a starting point, cut it down to little more than a few members in each
structure then possible add just rx_burst and tx_burst. Then use the
structures as a starting point for writing a device type. Currently we
have the rx_burst/tx_burst routines moved to the pktdev and it see like
move a couple more common functions maybe resaonable. It could be the
Rx/Tx routines in pktdev should be left as is, but in the code below is a
possible reason to abstract a few routines into a common set of files.

>From there, we have the ethdev type which adds in the existing functions
specific to Ethernet devices, and also, for example, a cryptodev which may
add in functions specific for cryptographic offload. As now, with the
ethdev, the specific drivers provide concrete implementations of the
functionality exposed by the interface. This hierarchy is shown in the
diagram below, using the existing ethdev and ixgbe drivers as a reference,
alongside a hypothetical cryptodev class and driver implementation
(catchingly called) "X":

                    | struct rte_pktdev   |
                    | rte_pkt_rx_burst()  |
            .-------| rte_pkt_tx_burst()  |-----------.
            |       `---------------------'           |
            |                                         |
            |                                         |
  ,-------------------------------.    ,------------------------------.
  |    struct rte_ethdev          |    |      struct rte_cryptodev    |
  +-------------------------------+    +------------------------------+
  | rte_eth_dev_configure()       |    | rte_crypto_init_sym_session()|
  | rte_eth_allmulticast_enable() |    | rte_crypto_del_sym_session() |
  | rte_eth_filter_ctrl()         |    |                              |
  `-------------------------------'    `---------------.--------------'
            |                                          |
            |                                          |
  ,---------'---------------------.    ,---------------'--------------.
  |    struct rte_pmd_ixgbe       |    |      struct rte_pmd_X        |
  +-------------------------------+    +------------------------------+
  | .configure -> ixgbe_configure |    | .init_session -> X_init_ses()|
  | .tx_burst  -> ixgbe_xmit_pkts |    | .tx_burst -> X_handle_pkts() |
  `-------------------------------'    `------------------------------'

We are not attempting to create a real class model here only looking at
creating a very basic common set of APIs and structures for other device

In terms of code changes for this, we obviously need to add in new
interface libraries for pktdev and cryptodev. The pktdev library can
define a skeleton structure for the first few elements of the nested
structures to ensure consistency. Each of the defines below illustrate the
common members in device structures, which gives some basic structure the
device framework. Each of the defines are placed at the top of the devices
matching structures and allows the devices to contain common and private
data. The pkdev structures overlay the first common set of members for
each device type.

For example:

We are using macros to reduce code changes to DPDK, but nested structures
are a better solution:

#define RTE_PKT_COMMON_DEV(_t)
    pkt_rx_burst_t              rx_pkt_burst;   /**< Pointer to PMD
receive function. */    \
    pkt_tx_burst_t              tx_pkt_burst;   /**< Pointer to PMD
transmit function. */   \
    struct rte_##_t##_dev_data  *data;          /**< Pointer to device
data */              \
    const struct _t##_driver    *driver;        /**< Driver for this
device */              \
    struct _t##_dev_ops         *dev_ops;       /**< Functions exported by
PMD */           \
    struct rte_pci_device       *pci_dev;       /**< PCI info. supplied by
probing */       \
    /** User application callback for interrupts if present */
    struct rte_##_t##_dev_cb_list   link_intr_cbs;
     * User-supplied functions called from rx_burst to post-process
     * received packets before passing them to the user
    struct rte_##_t##_rxtx_callback **post_rx_burst_cbs;
     * User-supplied functions called from tx_burst to pre-process
     * received packets before passing them to the driver for
transmission.                 \
    struct rte_##_t##_rxtx_callback **pre_tx_burst_cbs;
    enum rte_pkt_dev_type       dev_type;       /**< Flag indicating the
device type */     \
    uint8_t                     attached        /**< Flag indicating the
port is attached */
    /* Possible alignment or a hole in the structure */

#define RTE_PKT_NAME_MAX_LEN (32)

    char name[RTE_PKT_NAME_MAX_LEN]; /**< Unique identifier name */
    void **rx_queues;               /**< Array of pointers to RX queues.
*/     \
    void **tx_queues;               /**< Array of pointers to TX queues.
*/     \
    uint16_t nb_rx_queues;          /**< Number of RX queues. */
    uint16_t nb_tx_queues;          /**< Number of TX queues. */
    uint16_t flags;                 /**< Bit fields for xyzdev's to use.
*/     \
    uint16_t mtu;                   /**< Maximum Transmission Unit. */
    uint8_t unit_id;                /**< Unit ID for this instance */
    uint8_t _filler[7];             /* alignment filler */
    /* 64bit alignment starts here */
    void    *dev_private;           /**< PMD-specific private data */
    uint64_t rx_mbuf_alloc_failed;  /**< RX ring mbuf allocation failures.
*/   \
    uint32_t min_rx_buf_size;       /**< Common rx buffer size handled by
all queues */ \
    uint32_t _pad0

#define port_id     unit_id

    struct rte_pci_device   *pci_dev;       /**< Device PCI information.
*/     \
    const char              *driver_name;   /**< Device Driver name. */
    unsigned int            if_index;       /**< Index to bound host
interface, or 0 if none. */ \
        /* Use if_indextoname() to translate into an interface name. */
    uint32_t _pad0

The above is attempting to collect the common members to be place into the
top of private device structures as we feel these members should be fairly
common among the device types.

* @internal
* The generic data structure associated with each device.
* Pointers to burst-oriented packet receive and transmit functions are
* located at the beginning of the structure, along with the pointer to
* where all the data elements for the particular device are stored in
* memory. This split allows the function pointer and driver data to be per-
* process, while the actual configuration data for the device is shared.
struct rte_pkt_dev {

* @internal
* The data part, with no function pointers, associated with each device.
* This structure is safe to place in shared memory to be common among
* processes in a multi-process configuration.
struct rte_pkt_dev_data {


The existing ethdev code can then have a minor updates such as those shown

struct rte_eth_dev_info {

    /* Private device data maybe here */
    uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
    uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */

struct rte_eth_dev_data {
    RTE_PKT_COMMON_DEV_DATA; /**< Define located in <rte_pkt.h> */

    /* Private device data maybe here */
    struct rte_eth_dev_sriov sriov; /**< SRIOV data */

    struct rte_eth_link dev_link; /**< Link-level information & status */

struct rte_eth_dev {
    /* Private device data maybe here */

/* Bit defines for flags in common pkt structure */
#define promiscuous     0x0008   /**< RX promiscuous mode ON(1) / OFF(0).
#define scattered_rx    0x0004   /**< RX of scattered packets is ON(1) /
OFF(0) */
#define all_multicast   0x0002   /**< RX all multicast mode ON(1) /
OFF(0). */
#define dev_started     0x0001   /**< Device state: STARTED(1)/STOPPED(0)

The advantage of doing a common set of member is the existing ethdev
structures and APIs can remain exactly the same, but every ethdev is also
a pktdev, which can be used as either as appropriate. Similarly for a type
of crypto devices, or dpi devices (or software rings or KNI devices, if we
so desire), we can base them off this common minimal framework and use
them all in a similar manner.

Moving some basic common functions and structures into a common set of
files gives everyone a clean starting point for a new device plus adding a
light framework. The pktdev code is normally not called directly from the
application, but called from the device itself via a define in the device
header files. The pktdev RX/TX routines can be called from the
application, but the application needs to get the device structure pointer
based on the port id.

The cryptodev API maybe very different from other devices and following
some type of Open Crypto API. The goal is not to restrict the device API,
but try to give some type of structure to tghe design. Does it make sense
to have a mbuf based Rx/Tx API, maybe not. Could the mbuf based APIs be
hidden in the pktdev code, very possible. We have a lot of options here.

How the two Rx/Tx routines are defined:

 * Retrieve a burst of input packets from a receive queue of an Ethernet
 * device.
#define rte_eth_rx_burst(_pid, _qid, _pkts, _nb_pkts) \
    rte_pkt_rx_burst((struct rte_pkt_dev *)&rte_eth_devices[_pid], _qid,
_pkts, _nb_pkts)

 * Send a burst of output packets on a transmit queue of an Ethernet
#define rte_eth_tx_burst(_pid, _qid, _pkts, _nb_pkts) \
    rte_pkt_tx_burst((struct rte_pkt_dev *)&rte_eth_device[_pid], _qid,
_pkts, _nb_pkts)

A snip of code showing some advantages and use case of using pktdev API:

Not the complete code and it has not been tested and is only an example
how one could use the design.

 * The lcore main. This is the main thread that does the work, reading from
 * an input port and writing to an output port.
static __attribute__((noreturn)) void
do_work(const struct pipeline_params *p)
    printf("\nCore %u forwarding packets. %s -> %s\n",

    /* Run until the application is quit or killed. */
    for (;;) {
         * Receive packets on a src device and forward them on out
         * the dst device.
        /* Get burst of RX packets, from first port of pair. */
        struct rte_mbuf *bufs[BURST_SIZE];
        const uint16_t nb_rx = rte_pkt_rx_burst(p->src, 0,
                bufs, BURST_SIZE);

        if (unlikely(nb_rx == 0))

        /* Send burst of TX packets, to second port of pair. */
        const uint16_t nb_tx = rte_pkt_tx_burst(p->dst, 0,
                bufs, nb_rx);

        /* Free any unsent packets. */
        if (unlikely(nb_tx < nb_rx)) {
            uint16_t buf;
            for (buf = nb_tx; buf < nb_rx; buf++)

 * The main function, which does initialization and calls the per-lcore
 * functions.
main(int argc, char *argv[])
    struct pipeline_params p[RTE_MAX_LCORE];
    struct rte_mempool *mbuf_pool;
    unsigned nb_ports, lcore_id;
    uint8_t portid;

    /* Initialize the Environment Abstraction Layer (EAL). */
    int ret = rte_eal_init(argc, argv);
    if (ret < 0)
        rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");

    argc -= ret;
    argv += ret;

    /* Check that there is an even number of ports to send/receive on. */
    nb_ports = rte_eth_dev_count();
    if (nb_ports < 2 || (nb_ports & 1))
        rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");

    /* Creates a new mempool in memory to hold the mbufs. */
    mbuf_pool = rte_mempool_create("MBUF_POOL",
                       NUM_MBUFS * nb_ports,
                       sizeof(struct rte_pktmbuf_pool_private),
                       rte_pktmbuf_pool_init, NULL,
                       rte_pktmbuf_init,      NULL,

    if (mbuf_pool == NULL)
        rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");

    /* Initialize all ports. */
    for (portid = 0; portid < nb_ports; portid++)
        if (port_init(portid, mbuf_pool) != 0)
            rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n",

    struct rte_pkt_dev *in = rte_eth_get_dev(0);
        char name[RTE_RING_NAMESIZE];
        snprintf(name, sizeof(name), "RING_from_%u", lcore_id);
        struct rte_pkt_dev *out = rte_ring_get_dev(
                rte_ring_create(name, 4096, rte_socket_id(), 0));

        p[lcore_id].src = in;
        p[lcore_id].dst = out;
        rte_eal_remote_launch((lcore_function_t *)do_work,
                &p[lcore_id], lcore_id);
        in = out; // next pipeline stage reads from my output.
    //now finish pipeline on master lcore
    lcore_id = rte_lcore_id();
    p[lcore_id].src = in;
    p[lcore_id].dst = rte_eth_get_dev(1);

    return 0;

Changes to rte_ethdev.[ch]

The most changes to rte_ethdev.[ch] was to use the new defines from
rte_pkt.h. All of the references to the globals in ethdev had to be
replaced with a reference to a global structure in ethdev. Moving the
global or private data into a device specific structure seemed reasonable
to reduce name space issues with new devices. The rx_burst/tx_burst
routines were removed as they now exist in the rte_pktdev.c file. If we
use nested structures instead of macros then more of the code will need to
be converted or macros used to convert the members to address the nested

#define rx_pkt_burst    dev_data.rx_pkt_burst
#define tx_pkt_burst    dev_data.tx_pkt_burst

Impact to Existing Applications

None. The existing APIs should all remain unchanged, only the underlying
library code needs to change. [Obviously changes to apps will be needed to
take advantage of new device classes as we make them available].

The crypto API could be similar to the Open Crypto APIs and they seem
reasonable, but also using mbufs to hold data is just trying to use that
container type to provide some common structure to the system. Some of the
crypto data with be in the form of packets and some in the form of chunks
of data, which the API should account for in the design.

My goal is to provide a light weight framework for adding more devices and
not try to make everthing look like Ethernet device.

++Keith and Bruce

             reply	other threads:[~2015-04-01 12:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-01 12:44 Wiles, Keith [this message]
2015-04-03 17:00 ` Neil Horman
2015-04-03 22:32   ` Wiles, Keith
2015-04-04 13:11     ` Neil Horman
2015-04-04 15:16       ` Wiles, Keith
2015-04-05 19:37         ` Neil Horman
2015-04-05 22:20           ` Wiles, Keith
2015-04-06  1:48             ` Neil Horman
2015-04-02 14:16 Wiles, Keith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D1408516.1A07B%keith.wiles@intel.com \
    --to=keith.wiles@intel.com \
    --cc=dev@dpdk.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).