* [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type @ 2015-05-11 16:29 Bruce Richardson 2015-05-11 16:29 ` [dpdk-dev] [RFC PATCHv2 1/2] Add example pktdev implementation Bruce Richardson ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Bruce Richardson @ 2015-05-11 16:29 UTC (permalink / raw) To: dev Hi all, after a small amount of offline discussion with Marc Sune, here is an alternative proposal for a higher-level interface - aka pktdev - to allow a common Rx/Tx API across device types handling mbufs [for now, ethdev, ring and KNI]. The key code is in the first patch fo the set - the second is an example of a trivial usecase. What is different about this to previously: * wrapper class, so no changes to any existing ring, ethdev implementations * use of function pointers for RX/TX with an API that maps to ethdev - this means there is little/no additional overhead for ethdev calls - inline special case for rings, to accelerate that. Since we are at a higher level, we can special case process some things if appropriate. This means the impact to ring ops is one (predictable) branch per burst * elimination of the queue abstraction. For the ring and KNI, there is no concept of queues, so we just wrap the functions directly (no need even for wrapper functions, the api's match so we can call directly). This also means: - adding in features per-queue, is far easier as we don't need to worry about having arrays of multiple queues. For example: - adding in buffering on TX (or RX) is easier since again we only have a single queue. * thread safety is made easier using a wrapper. For a MP ring, we can create multiple pktdevs around it, and each thread will then be able to use their own copy, with their own buffering etc. However, at this point, I'm just looking for general feedback on this as an approach. I think it's quite flexible - even more so than the earlier proposal we had. It's less proscriptive and doesn't make any demands on any other libs. Comments/thoughts welcome. Bruce Richardson (2): Add example pktdev implementation example app showing pktdevs used in a chain config/common_bsdapp | 5 + config/common_linuxapp | 5 + examples/pktdev/Makefile | 57 +++++++++++ examples/pktdev/basicfwd.c | 221 +++++++++++++++++++++++++++++++++++++++++ lib/Makefile | 1 + lib/librte_pktdev/Makefile | 53 ++++++++++ lib/librte_pktdev/rte_pktdev.h | 200 +++++++++++++++++++++++++++++++++++++ 7 files changed, 542 insertions(+) create mode 100644 examples/pktdev/Makefile create mode 100644 examples/pktdev/basicfwd.c create mode 100644 lib/librte_pktdev/Makefile create mode 100644 lib/librte_pktdev/rte_pktdev.h -- 2.1.0 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [dpdk-dev] [RFC PATCHv2 1/2] Add example pktdev implementation 2015-05-11 16:29 [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type Bruce Richardson @ 2015-05-11 16:29 ` Bruce Richardson 2015-05-11 16:29 ` [dpdk-dev] [RFC PATCHv2 2/2] example app showing pktdevs used in a chain Bruce Richardson 2015-05-19 11:31 ` [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type Bruce Richardson 2 siblings, 0 replies; 19+ messages in thread From: Bruce Richardson @ 2015-05-11 16:29 UTC (permalink / raw) To: dev This commit demonstrates what a minimal API for all packet handling might look like. It provides common APIs for RX and TX, by wrapping the types as appropriate. Implementations provided for ring, ethdev and kni. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> --- config/common_bsdapp | 5 ++ config/common_linuxapp | 5 ++ lib/Makefile | 1 + lib/librte_pktdev/Makefile | 53 +++++++++++ lib/librte_pktdev/rte_pktdev.h | 200 +++++++++++++++++++++++++++++++++++++++++ 5 files changed, 264 insertions(+) create mode 100644 lib/librte_pktdev/Makefile create mode 100644 lib/librte_pktdev/rte_pktdev.h diff --git a/config/common_bsdapp b/config/common_bsdapp index c2374c0..64fcdc8 100644 --- a/config/common_bsdapp +++ b/config/common_bsdapp @@ -132,6 +132,11 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y CONFIG_RTE_LIBRTE_KVARGS=y # +# Compile generic packet handling device library +# +CONFIG_RTE_LIBRTE_PKTDEV=y + +# # Compile generic ethernet library # CONFIG_RTE_LIBRTE_ETHER=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 0078dc9..399f15d 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -129,6 +129,11 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y CONFIG_RTE_LIBRTE_KVARGS=y # +# Compile generic packet handling device library +# +CONFIG_RTE_LIBRTE_PKTDEV=y + +# # Compile generic ethernet library # CONFIG_RTE_LIBRTE_ETHER=y diff --git a/lib/Makefile b/lib/Makefile index d94355d..4db5ee0 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -32,6 +32,7 @@ include $(RTE_SDK)/mk/rte.vars.mk DIRS-y += librte_compat +DIRS-$(CONFIG_RTE_LIBRTE_PKTDEV) += librte_pktdev DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring diff --git a/lib/librte_pktdev/Makefile b/lib/librte_pktdev/Makefile new file mode 100644 index 0000000..858d3e3 --- /dev/null +++ b/lib/librte_pktdev/Makefile @@ -0,0 +1,53 @@ +# BSD LICENSE +# +# Copyright(c) 2015 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# library name +# +LIB = libpktdev.a + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) + +EXPORT_MAP := rte_pktdev_version.map + +LIBABIVER := 1 + +# +# Export include files +# +SYMLINK-y-include += rte_pktdev.h + +DEPDIRS-y += lib/librte_ring lib/librte_kni lib/librte_ether + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/lib/librte_pktdev/rte_pktdev.h b/lib/librte_pktdev/rte_pktdev.h new file mode 100644 index 0000000..eba7989 --- /dev/null +++ b/lib/librte_pktdev/rte_pktdev.h @@ -0,0 +1,200 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_PKTDEV_H_ +#define _RTE_PKTDEV_H_ + +#include <stdint.h> + +/** + * @file + * + * RTE Packet Processing Device API + */ + +#ifdef __cplusplus +extern "C" { +#endif + +#include <rte_malloc.h> + +#include <rte_ring.h> +#include <rte_ethdev.h> +#include <rte_kni.h> + +/* forward definition of mbuf structure. We don't need full mbuf header here */ +struct rte_mbuf; + +typedef uint16_t (*pkt_rx_burst_t)(void *rx_handle, + struct rte_mbuf **rx_pkts, uint16_t nb_pkts); +/**< @internal Retrieve packets from a queue of a device. */ + +typedef uint16_t (*pkt_tx_burst_t)(void *tx_handle, + struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +/**< @internal Send packets on a queue of a device. */ + +enum rte_pkt_dev_type { + RTE_PKT_DEV_TYPE_NONE = 0, + RTE_PKT_DEV_TYPE_ETHDEV, + RTE_PKT_DEV_TYPE_RING, + RTE_PKT_DEV_TYPE_KNI, + + RTE_PKT_DEV_TYPE_MAX +}; + +struct rte_pkt_dev { + enum rte_pkt_dev_type type; + pkt_rx_burst_t rx_pkt_burst; /**< Pointer to receive function. */ + pkt_tx_burst_t tx_pkt_burst; /**< Pointer to transmit function. */ + void *rx_handle; + void *tx_handle; +}; + +/** + * + * Retrieve a burst of input packets from a receive queue of a + * device. The retrieved packets are stored in *rte_mbuf* structures whose + * pointers are supplied in the *rx_pkts* array. + * + * @param dev + * The device to be polled for packets + * @param queue_id + * The index of the receive queue from which to retrieve input packets. + * @param rx_pkts + * The address of an array of pointers to *rte_mbuf* structures that + * must be large enough to store *nb_pkts* pointers in it. + * @param nb_pkts + * The maximum number of packets to retrieve. + * @return + * The number of packets actually retrieved, which is the number + * of pointers to *rte_mbuf* structures effectively supplied to the + * *rx_pkts* array. + */ +static inline uint16_t +rte_pkt_rx_burst(struct rte_pkt_dev *dev, struct rte_mbuf **rx_pkts, + uint16_t nb_pkts) +{ + /* special case ring, so that call gets inlined for performance */ + if (dev->type == RTE_PKT_DEV_TYPE_RING) + return rte_ring_dequeue_burst(dev->rx_handle, (void *)rx_pkts, + nb_pkts); + return (*dev->rx_pkt_burst)(dev->rx_handle, rx_pkts, nb_pkts); +} + +/** + * Send a burst of output packets on a transmit queue of a device. + * + * @param dev + * The device to be given the packets. + * @param queue_id + * The index of the queue through which output packets must be sent. + * @param tx_pkts + * The address of an array of *nb_pkts* pointers to *rte_mbuf* structures + * which contain the output packets. + * @param nb_pkts + * The maximum number of packets to transmit. + * @return + * The number of output packets actually stored in transmit descriptors of + * the transmit ring. The return value can be less than the value of the + * *tx_pkts* parameter when the transmit ring is full or has been filled up. + */ +static inline uint16_t +rte_pkt_tx_burst(struct rte_pkt_dev *dev, struct rte_mbuf **tx_pkts, + uint16_t nb_pkts) +{ + /* special case ring, so that call gets inlined for performance */ + if (dev->type == RTE_PKT_DEV_TYPE_RING) + return rte_ring_enqueue_burst(dev->tx_handle, (void *)tx_pkts, + nb_pkts); + return (*dev->tx_pkt_burst)(dev->tx_handle, tx_pkts, nb_pkts); +} + +static inline struct rte_pkt_dev * +rte_pkt_dev_from_ring(struct rte_ring *r) +{ + struct rte_pkt_dev *d = rte_zmalloc(NULL, sizeof(*d), 0); + if (d == NULL) + return d; + + d->type = RTE_PKT_DEV_TYPE_RING; + d->rx_pkt_burst = (pkt_rx_burst_t)rte_ring_dequeue_burst; + d->tx_pkt_burst = (pkt_rx_burst_t)rte_ring_enqueue_burst; + d->rx_handle = r; + d->tx_handle = r; + + return d; +} + +static inline struct rte_pkt_dev * +rte_pkt_dev_from_kni(struct rte_kni *k) +{ + struct rte_pkt_dev *d = rte_zmalloc(NULL, sizeof(*d), 0); + if (d == NULL) + return d; + + d->type = RTE_PKT_DEV_TYPE_KNI; + d->rx_pkt_burst = (pkt_rx_burst_t)rte_kni_rx_burst; + d->tx_pkt_burst = (pkt_rx_burst_t)rte_kni_tx_burst; + d->rx_handle = k; + d->tx_handle = k; + + return d; +} + +static inline struct rte_pkt_dev * +rte_pkt_dev_from_ethdev(struct rte_eth_dev *e, uint16_t rxq, uint16_t txq) +{ + struct rte_pkt_dev *d = rte_zmalloc(NULL, sizeof(*d), 0); + if (d == NULL) + return d; + + d->type = RTE_PKT_DEV_TYPE_ETHDEV; + d->rx_pkt_burst = e->rx_pkt_burst; + d->tx_pkt_burst = e->tx_pkt_burst; + d->rx_handle = e->data->rx_queues[rxq]; + d->tx_handle = e->data->tx_queues[txq]; + + return d; +} + +static inline struct rte_pkt_dev * +rte_pkt_dev_from_ethport(uint8_t port_id, uint16_t rxq, uint16_t txq) +{ + return rte_pkt_dev_from_ethdev(&rte_eth_devices[port_id], rxq, txq); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_PKTDEV_H_ */ -- 2.1.0 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [dpdk-dev] [RFC PATCHv2 2/2] example app showing pktdevs used in a chain 2015-05-11 16:29 [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type Bruce Richardson 2015-05-11 16:29 ` [dpdk-dev] [RFC PATCHv2 1/2] Add example pktdev implementation Bruce Richardson @ 2015-05-11 16:29 ` Bruce Richardson 2015-05-19 11:31 ` [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type Bruce Richardson 2 siblings, 0 replies; 19+ messages in thread From: Bruce Richardson @ 2015-05-11 16:29 UTC (permalink / raw) To: dev This is a trivial example showing code which is using ethdevs and rings in a neutral manner, with the same piece of pipeline code passing mbufs along a chain without ever having to query its source or destination type. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> --- examples/pktdev/Makefile | 57 ++++++++++++ examples/pktdev/basicfwd.c | 221 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 278 insertions(+) create mode 100644 examples/pktdev/Makefile create mode 100644 examples/pktdev/basicfwd.c diff --git a/examples/pktdev/Makefile b/examples/pktdev/Makefile new file mode 100644 index 0000000..4a5d99f --- /dev/null +++ b/examples/pktdev/Makefile @@ -0,0 +1,57 @@ +# BSD LICENSE +# +# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overridden by command line or environment +RTE_TARGET ?= x86_64-native-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +# binary name +APP = basicfwd + +# all source are stored in SRCS-y +SRCS-y := basicfwd.c + +CFLAGS += $(WERROR_FLAGS) + +# workaround for a gcc bug with noreturn attribute +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) +CFLAGS_main.o += -Wno-return-type +endif + +EXTRA_CFLAGS += -O3 -g -Wfatal-errors + +include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/pktdev/basicfwd.c b/examples/pktdev/basicfwd.c new file mode 100644 index 0000000..91c0c3b --- /dev/null +++ b/examples/pktdev/basicfwd.c @@ -0,0 +1,221 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <stdint.h> +#include <inttypes.h> +#include <rte_eal.h> +#include <rte_ethdev.h> +#include <rte_cycles.h> +#include <rte_lcore.h> +#include <rte_mbuf.h> +#include <rte_pktdev.h> + +#define RX_RING_SIZE 128 +#define TX_RING_SIZE 512 + +#define NUM_MBUFS 8191 +#define MBUF_SIZE (1600 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM) +#define MBUF_CACHE_SIZE 250 +#define BURST_SIZE 32 + +static const struct rte_eth_conf port_conf_default = { + .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN } +}; + +/* basicfwd.c: Basic DPDK skeleton forwarding example. */ + +/* + * Initializes a given port using global settings and with the RX buffers + * coming from the mbuf_pool passed as a parameter. + */ +static inline int +port_init(uint8_t port, struct rte_mempool *mbuf_pool) +{ + struct rte_eth_conf port_conf = port_conf_default; + const uint16_t rx_rings = 1, tx_rings = 1; + int retval; + uint16_t q; + + if (port >= rte_eth_dev_count()) + return -1; + + /* Configure the Ethernet device. */ + retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); + if (retval != 0) + return retval; + + /* Allocate and set up 1 RX queue per Ethernet port. */ + for (q = 0; q < rx_rings; q++) { + retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL, mbuf_pool); + if (retval < 0) + return retval; + } + + /* Allocate and set up 1 TX queue per Ethernet port. */ + for (q = 0; q < tx_rings; q++) { + retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL); + if (retval < 0) + return retval; + } + + /* Start the Ethernet port. */ + retval = rte_eth_dev_start(port); + if (retval < 0) + return retval; + + /* Display the port MAC address. */ + struct ether_addr addr; + rte_eth_macaddr_get(port, &addr); + printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 + " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n", + (unsigned)port, + addr.addr_bytes[0], addr.addr_bytes[1], + addr.addr_bytes[2], addr.addr_bytes[3], + addr.addr_bytes[4], addr.addr_bytes[5]); + + /* Enable RX in promiscuous mode for the Ethernet device. */ + rte_eth_promiscuous_enable(port); + + return 0; +} + +struct pipeline_params { + struct rte_pkt_dev *src; + struct rte_pkt_dev *dst; +}; + +/* + * The lcore main. This is the main thread that does the work, reading from + * an input port and writing to an output port. + */ +static __attribute__((noreturn)) void +do_work(const struct pipeline_params *p) +{ + printf("\nCore %u forwarding packets. %p -> %p\n", + rte_lcore_id(), + p->src, + p->dst); + + /* Run until the application is quit or killed. */ + for (;;) { + /* + * Receive packets on a src device and forward them on out + * the dst device. + */ + /* Get burst of RX packets, from first port of pair. */ + struct rte_mbuf *bufs[BURST_SIZE]; + const uint16_t nb_rx = rte_pkt_rx_burst(p->src, bufs, BURST_SIZE); + + if (unlikely(nb_rx == 0)) + continue; + + /* Send burst of TX packets, to second port of pair. */ + const uint16_t nb_tx = rte_pkt_tx_burst(p->dst, bufs, nb_rx); + + /* Free any unsent packets. */ + if (unlikely(nb_tx < nb_rx)) { + uint16_t buf; + for (buf = nb_tx; buf < nb_rx; buf++) + rte_pktmbuf_free(bufs[buf]); + } + } +} + +/* + * The main function, which does initialization and calls the per-lcore + * functions. + */ +int +main(int argc, char *argv[]) +{ + struct pipeline_params p[RTE_MAX_LCORE]; + struct rte_mempool *mbuf_pool; + unsigned nb_ports, lcore_id; + uint8_t portid; + + /* Initialize the Environment Abstraction Layer (EAL). */ + int ret = rte_eal_init(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Error with EAL initialization\n"); + + argc -= ret; + argv += ret; + + /* Check that there is an even number of ports to send/receive on. */ + nb_ports = rte_eth_dev_count(); + if (nb_ports < 2 || (nb_ports & 1)) + rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n"); + + /* Creates a new mempool in memory to hold the mbufs. */ + mbuf_pool = rte_mempool_create("MBUF_POOL", + NUM_MBUFS * nb_ports, + MBUF_SIZE, + MBUF_CACHE_SIZE, + sizeof(struct rte_pktmbuf_pool_private), + rte_pktmbuf_pool_init, NULL, + rte_pktmbuf_init, NULL, + rte_socket_id(), + 0); + + if (mbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); + + /* Initialize all ports. */ + for (portid = 0; portid < nb_ports; portid++) + if (port_init(portid, mbuf_pool) != 0) + rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", + portid); + + struct rte_pkt_dev *in = rte_pkt_dev_from_ethport(0, 0, 0); + RTE_LCORE_FOREACH_SLAVE(lcore_id){ + char name[RTE_RING_NAMESIZE]; + snprintf(name, sizeof(name), "RING_from_%u", lcore_id); + struct rte_pkt_dev *out = rte_pkt_dev_from_ring( + rte_ring_create(name, 4096, rte_socket_id(), 0)); + + p[lcore_id].src = in; + p[lcore_id].dst = out; + rte_eal_remote_launch((lcore_function_t *)do_work, + &p[lcore_id], lcore_id); + in = out; // next pipeline stage reads from my output. + } + //now finish pipeline on master lcore + lcore_id = rte_lcore_id(); + p[lcore_id].src = in; + p[lcore_id].dst = rte_pkt_dev_from_ethport(1, 0, 0); + do_work(&p[lcore_id]); + + return 0; +} -- 2.1.0 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type 2015-05-11 16:29 [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type Bruce Richardson 2015-05-11 16:29 ` [dpdk-dev] [RFC PATCHv2 1/2] Add example pktdev implementation Bruce Richardson 2015-05-11 16:29 ` [dpdk-dev] [RFC PATCHv2 2/2] example app showing pktdevs used in a chain Bruce Richardson @ 2015-05-19 11:31 ` Bruce Richardson 2015-05-20 0:19 ` Wiles, Keith ` (2 more replies) 2 siblings, 3 replies; 19+ messages in thread From: Bruce Richardson @ 2015-05-19 11:31 UTC (permalink / raw) To: dev On Mon, May 11, 2015 at 05:29:39PM +0100, Bruce Richardson wrote: > Hi all, > > after a small amount of offline discussion with Marc Sune, here is an > alternative proposal for a higher-level interface - aka pktdev - to allow a > common Rx/Tx API across device types handling mbufs [for now, ethdev, ring > and KNI]. The key code is in the first patch fo the set - the second is an > example of a trivial usecase. > > What is different about this to previously: > * wrapper class, so no changes to any existing ring, ethdev implementations > * use of function pointers for RX/TX with an API that maps to ethdev > - this means there is little/no additional overhead for ethdev calls > - inline special case for rings, to accelerate that. Since we are at a > higher level, we can special case process some things if appropriate. This > means the impact to ring ops is one (predictable) branch per burst > * elimination of the queue abstraction. For the ring and KNI, there is no > concept of queues, so we just wrap the functions directly (no need even for > wrapper functions, the api's match so we can call directly). This also > means: > - adding in features per-queue, is far easier as we don't need to worry about > having arrays of multiple queues. For example: > - adding in buffering on TX (or RX) is easier since again we only have a > single queue. > * thread safety is made easier using a wrapper. For a MP ring, we can create > multiple pktdevs around it, and each thread will then be able to use their > own copy, with their own buffering etc. > > However, at this point, I'm just looking for general feedback on this as an > approach. I think it's quite flexible - even more so than the earlier proposal > we had. It's less proscriptive and doesn't make any demands on any other libs. > > Comments/thoughts welcome. > > Bruce Richardson (2): > Add example pktdev implementation > example app showing pktdevs used in a chain > Any comments on this RFC before I see about investing further time in it to clean it up a bit and submit as a non-RFC patchset for merge in 2.1? /Bruce ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type 2015-05-19 11:31 ` [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type Bruce Richardson @ 2015-05-20 0:19 ` Wiles, Keith 2015-05-20 8:31 ` Thomas Monjalon 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Bruce Richardson 2 siblings, 0 replies; 19+ messages in thread From: Wiles, Keith @ 2015-05-20 0:19 UTC (permalink / raw) To: Richardson, Bruce, dev It looks fine to me. On 5/19/15, 7:31 AM, "Richardson, Bruce" <bruce.richardson@intel.com> wrote: >On Mon, May 11, 2015 at 05:29:39PM +0100, Bruce Richardson wrote: >> Hi all, >> >> after a small amount of offline discussion with Marc Sune, here is an >> alternative proposal for a higher-level interface - aka pktdev - to >>allow a >> common Rx/Tx API across device types handling mbufs [for now, ethdev, >>ring >> and KNI]. The key code is in the first patch fo the set - the second is >>an >> example of a trivial usecase. >> >> What is different about this to previously: >> * wrapper class, so no changes to any existing ring, ethdev >>implementations >> * use of function pointers for RX/TX with an API that maps to ethdev >> - this means there is little/no additional overhead for ethdev calls >> - inline special case for rings, to accelerate that. Since we are at >>a >> higher level, we can special case process some things if >>appropriate. This >> means the impact to ring ops is one (predictable) branch per burst >> * elimination of the queue abstraction. For the ring and KNI, there is >>no >> concept of queues, so we just wrap the functions directly (no need >>even for >> wrapper functions, the api's match so we can call directly). This also >> means: >> - adding in features per-queue, is far easier as we don't need to >>worry about >> having arrays of multiple queues. For example: >> - adding in buffering on TX (or RX) is easier since again we only >>have a >> single queue. >> * thread safety is made easier using a wrapper. For a MP ring, we can >>create >> multiple pktdevs around it, and each thread will then be able to use >>their >> own copy, with their own buffering etc. >> >> However, at this point, I'm just looking for general feedback on this >>as an >> approach. I think it's quite flexible - even more so than the earlier >>proposal >> we had. It's less proscriptive and doesn't make any demands on any >>other libs. >> >> Comments/thoughts welcome. >> >> Bruce Richardson (2): >> Add example pktdev implementation >> example app showing pktdevs used in a chain >> > >Any comments on this RFC before I see about investing further time in it >to clean >it up a bit and submit as a non-RFC patchset for merge in 2.1? > >/Bruce ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type 2015-05-19 11:31 ` [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type Bruce Richardson 2015-05-20 0:19 ` Wiles, Keith @ 2015-05-20 8:31 ` Thomas Monjalon 2015-05-20 10:05 ` Marc Sune 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Bruce Richardson 2 siblings, 1 reply; 19+ messages in thread From: Thomas Monjalon @ 2015-05-20 8:31 UTC (permalink / raw) To: dev, Bruce Richardson 2015-05-19 12:31, Bruce Richardson: > On Mon, May 11, 2015 at 05:29:39PM +0100, Bruce Richardson wrote: > > Hi all, > > > > after a small amount of offline discussion with Marc Sune, here is an > > alternative proposal for a higher-level interface - aka pktdev - to allow a > > common Rx/Tx API across device types handling mbufs [for now, ethdev, ring > > and KNI]. The key code is in the first patch fo the set - the second is an > > example of a trivial usecase. > > > > What is different about this to previously: > > * wrapper class, so no changes to any existing ring, ethdev implementations > > * use of function pointers for RX/TX with an API that maps to ethdev > > - this means there is little/no additional overhead for ethdev calls > > - inline special case for rings, to accelerate that. Since we are at a > > higher level, we can special case process some things if appropriate. This > > means the impact to ring ops is one (predictable) branch per burst > > * elimination of the queue abstraction. For the ring and KNI, there is no > > concept of queues, so we just wrap the functions directly (no need even for > > wrapper functions, the api's match so we can call directly). This also > > means: > > - adding in features per-queue, is far easier as we don't need to worry about > > having arrays of multiple queues. For example: > > - adding in buffering on TX (or RX) is easier since again we only have a > > single queue. > > * thread safety is made easier using a wrapper. For a MP ring, we can create > > multiple pktdevs around it, and each thread will then be able to use their > > own copy, with their own buffering etc. > > > > However, at this point, I'm just looking for general feedback on this as an > > approach. I think it's quite flexible - even more so than the earlier proposal > > we had. It's less proscriptive and doesn't make any demands on any other libs. > > > > Comments/thoughts welcome. > > Any comments on this RFC before I see about investing further time in it to clean > it up a bit and submit as a non-RFC patchset for merge in 2.1? I would say there are 2 possible approaches for KNI and ring handling: 1/ You Bruce, Marc and Keith are advocating for a layer on top of ethdev, ring, KNI and possibly other devices, which uses mbuf. The set of functions is simpler than ethdev but the data structure is mbuf which is related to ethdev layer. 2/ Konstantin and Neil talked about keeping mbuf for ethdev layer and related libs only. Ring and KNI could have an ethdev API with a reduced set of implemented functions. Crypto devices could adopt a specific crypto API and an ethdev API at the same time. I feel it's cleaner, more generic and more maintainable to have drivers implementing one or several stable APIs instead of having some restricted wrappers to update. Comments are welcome. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type 2015-05-20 8:31 ` Thomas Monjalon @ 2015-05-20 10:05 ` Marc Sune 2015-05-20 10:28 ` Neil Horman 0 siblings, 1 reply; 19+ messages in thread From: Marc Sune @ 2015-05-20 10:05 UTC (permalink / raw) To: dev On 20/05/15 10:31, Thomas Monjalon wrote: > 2015-05-19 12:31, Bruce Richardson: >> On Mon, May 11, 2015 at 05:29:39PM +0100, Bruce Richardson wrote: >>> Hi all, >>> >>> after a small amount of offline discussion with Marc Sune, here is an >>> alternative proposal for a higher-level interface - aka pktdev - to allow a >>> common Rx/Tx API across device types handling mbufs [for now, ethdev, ring >>> and KNI]. The key code is in the first patch fo the set - the second is an >>> example of a trivial usecase. >>> >>> What is different about this to previously: >>> * wrapper class, so no changes to any existing ring, ethdev implementations >>> * use of function pointers for RX/TX with an API that maps to ethdev >>> - this means there is little/no additional overhead for ethdev calls >>> - inline special case for rings, to accelerate that. Since we are at a >>> higher level, we can special case process some things if appropriate. This >>> means the impact to ring ops is one (predictable) branch per burst >>> * elimination of the queue abstraction. For the ring and KNI, there is no >>> concept of queues, so we just wrap the functions directly (no need even for >>> wrapper functions, the api's match so we can call directly). This also >>> means: >>> - adding in features per-queue, is far easier as we don't need to worry about >>> having arrays of multiple queues. For example: >>> - adding in buffering on TX (or RX) is easier since again we only have a >>> single queue. >>> * thread safety is made easier using a wrapper. For a MP ring, we can create >>> multiple pktdevs around it, and each thread will then be able to use their >>> own copy, with their own buffering etc. >>> >>> However, at this point, I'm just looking for general feedback on this as an >>> approach. I think it's quite flexible - even more so than the earlier proposal >>> we had. It's less proscriptive and doesn't make any demands on any other libs. >>> >>> Comments/thoughts welcome. >> Any comments on this RFC before I see about investing further time in it to clean >> it up a bit and submit as a non-RFC patchset for merge in 2.1? > I would say there are 2 possible approaches for KNI and ring handling: > 1/ You Bruce, Marc and Keith are advocating for a layer on top of ethdev, > ring, KNI and possibly other devices, which uses mbuf. The set of functions > is simpler than ethdev but the data structure is mbuf which is related to > ethdev layer. > 2/ Konstantin and Neil talked about keeping mbuf for ethdev layer and related > libs only. Ring and KNI could have an ethdev API with a reduced set of > implemented functions. Crypto devices could adopt a specific crypto API and > an ethdev API at the same time. I don't fully understand which APIs you meant by non-ethdev. This pktdev wrapper proposal abstracts RX and TX functions only, and all of these are using mbufs as the packet buffer abstraction right now anyway (ethdev). This approach does not preclude that different libraries expose other API calls. In fact they will have to; setup the port/device ... It is just a higher level API, so that you don't have to check the type of port in your DPDK application I/O loop, minimizing user's code. Or were you in 2) thinking about creating a different "packet buffer" abstraction, independent from the ethdev, and then map the different port specifics (e.g. mbuf) to this new abstraction? > > I feel it's cleaner, more generic and more maintainable to have drivers > implementing one or several stable APIs instead of having some restricted > wrappers to update. This would be a separate library _on top_ of the existing APIs, and it has the advantage to simplify the DPDK user's application code when an application needs to deal with several types of port, as shown in the example that Bruce provided in PATCH #2. I don't see why this could limit us or make it less maintainable. Of course this is an RFC patch; appropriate tests are missing (Bruce I can help you on that) Marc > > Comments are welcome. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type 2015-05-20 10:05 ` Marc Sune @ 2015-05-20 10:28 ` Neil Horman 2015-05-20 17:01 ` Marc Sune 0 siblings, 1 reply; 19+ messages in thread From: Neil Horman @ 2015-05-20 10:28 UTC (permalink / raw) To: Marc Sune; +Cc: dev On Wed, May 20, 2015 at 12:05:00PM +0200, Marc Sune wrote: > > > On 20/05/15 10:31, Thomas Monjalon wrote: > >2015-05-19 12:31, Bruce Richardson: > >>On Mon, May 11, 2015 at 05:29:39PM +0100, Bruce Richardson wrote: > >>>Hi all, > >>> > >>>after a small amount of offline discussion with Marc Sune, here is an > >>>alternative proposal for a higher-level interface - aka pktdev - to allow a > >>>common Rx/Tx API across device types handling mbufs [for now, ethdev, ring > >>>and KNI]. The key code is in the first patch fo the set - the second is an > >>>example of a trivial usecase. > >>> > >>>What is different about this to previously: > >>>* wrapper class, so no changes to any existing ring, ethdev implementations > >>>* use of function pointers for RX/TX with an API that maps to ethdev > >>> - this means there is little/no additional overhead for ethdev calls > >>> - inline special case for rings, to accelerate that. Since we are at a > >>> higher level, we can special case process some things if appropriate. This > >>> means the impact to ring ops is one (predictable) branch per burst > >>>* elimination of the queue abstraction. For the ring and KNI, there is no > >>> concept of queues, so we just wrap the functions directly (no need even for > >>> wrapper functions, the api's match so we can call directly). This also > >>> means: > >>> - adding in features per-queue, is far easier as we don't need to worry about > >>> having arrays of multiple queues. For example: > >>> - adding in buffering on TX (or RX) is easier since again we only have a > >>> single queue. > >>>* thread safety is made easier using a wrapper. For a MP ring, we can create > >>> multiple pktdevs around it, and each thread will then be able to use their > >>> own copy, with their own buffering etc. > >>> > >>>However, at this point, I'm just looking for general feedback on this as an > >>>approach. I think it's quite flexible - even more so than the earlier proposal > >>>we had. It's less proscriptive and doesn't make any demands on any other libs. > >>> > >>>Comments/thoughts welcome. > >>Any comments on this RFC before I see about investing further time in it to clean > >>it up a bit and submit as a non-RFC patchset for merge in 2.1? > >I would say there are 2 possible approaches for KNI and ring handling: > >1/ You Bruce, Marc and Keith are advocating for a layer on top of ethdev, > >ring, KNI and possibly other devices, which uses mbuf. The set of functions > >is simpler than ethdev but the data structure is mbuf which is related to > >ethdev layer. > >2/ Konstantin and Neil talked about keeping mbuf for ethdev layer and related > >libs only. Ring and KNI could have an ethdev API with a reduced set of > >implemented functions. Crypto devices could adopt a specific crypto API and > >an ethdev API at the same time. > > I don't fully understand which APIs you meant by non-ethdev. This pktdev > wrapper proposal abstracts RX and TX functions only, and all of these are > using mbufs as the packet buffer abstraction right now anyway (ethdev). > He's referring to future device classes (like crypto devices), which ostensibly would make use of the pktdev API. My argument (and I think Thomas') is that if a bit of hardware can be made to operate as a packet sending/receiving device, then its just as reasonable to use the existing ethdev api rather than some other restricted version of it (pktdev) > This approach does not preclude that different libraries expose other API > calls. In fact they will have to; setup the port/device ... It is just a > higher level API, so that you don't have to check the type of port in your > DPDK application I/O loop, minimizing user's code. > No argument there. But if thats the case (and I agree that it is), an application will implicitly have to know what what type of device it is, because it (the application) will need to understand the specific API it is writing to. > Or were you in 2) thinking about creating a different "packet buffer" > abstraction, independent from the ethdev, and then map the different port > specifics (e.g. mbuf) to this new abstraction? > My argument was to just leave the ethdev api alone. If a device class can be made to look like a packet forwarding device, then use the existing ethdev api to implement it. > > > >I feel it's cleaner, more generic and more maintainable to have drivers > >implementing one or several stable APIs instead of having some restricted > >wrappers to update. > > This would be a separate library _on top_ of the existing APIs, and it has > the advantage to simplify the DPDK user's application code when an > application needs to deal with several types of port, as shown in the > example that Bruce provided in PATCH #2. > But thats already the purpose of the ethdev api. Different types of hardware/software can be made to look like the same thing (an ethdev) from an application standpoint. Adding this pktdev layer does nothing but that, add a layer. If you want restricted functionality of an interface, thats ok, ethdev offers that ability. unimplemented methods in a pmd cause the ethdev api to return EOPNOTSUP to the calling application, so the application knows when a given ethdev can't do some aspect of what an ethdev is. > I don't see why this could limit us or make it less maintainable. Of course > this is an RFC patch; appropriate tests are missing (Bruce I can help you on > that) > It doesn't limit us, its just not a useful abstraction, because we already have the abilities it provides. Neil > Marc > > > > >Comments are welcome. > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type 2015-05-20 10:28 ` Neil Horman @ 2015-05-20 17:01 ` Marc Sune 2015-05-20 18:47 ` Neil Horman 0 siblings, 1 reply; 19+ messages in thread From: Marc Sune @ 2015-05-20 17:01 UTC (permalink / raw) To: Neil Horman; +Cc: dev On 20/05/15 12:28, Neil Horman wrote: > On Wed, May 20, 2015 at 12:05:00PM +0200, Marc Sune wrote: >> >> On 20/05/15 10:31, Thomas Monjalon wrote: >>> 2015-05-19 12:31, Bruce Richardson: >>>> On Mon, May 11, 2015 at 05:29:39PM +0100, Bruce Richardson wrote: >>>>> Hi all, >>>>> >>>>> after a small amount of offline discussion with Marc Sune, here is an >>>>> alternative proposal for a higher-level interface - aka pktdev - to allow a >>>>> common Rx/Tx API across device types handling mbufs [for now, ethdev, ring >>>>> and KNI]. The key code is in the first patch fo the set - the second is an >>>>> example of a trivial usecase. >>>>> >>>>> What is different about this to previously: >>>>> * wrapper class, so no changes to any existing ring, ethdev implementations >>>>> * use of function pointers for RX/TX with an API that maps to ethdev >>>>> - this means there is little/no additional overhead for ethdev calls >>>>> - inline special case for rings, to accelerate that. Since we are at a >>>>> higher level, we can special case process some things if appropriate. This >>>>> means the impact to ring ops is one (predictable) branch per burst >>>>> * elimination of the queue abstraction. For the ring and KNI, there is no >>>>> concept of queues, so we just wrap the functions directly (no need even for >>>>> wrapper functions, the api's match so we can call directly). This also >>>>> means: >>>>> - adding in features per-queue, is far easier as we don't need to worry about >>>>> having arrays of multiple queues. For example: >>>>> - adding in buffering on TX (or RX) is easier since again we only have a >>>>> single queue. >>>>> * thread safety is made easier using a wrapper. For a MP ring, we can create >>>>> multiple pktdevs around it, and each thread will then be able to use their >>>>> own copy, with their own buffering etc. >>>>> >>>>> However, at this point, I'm just looking for general feedback on this as an >>>>> approach. I think it's quite flexible - even more so than the earlier proposal >>>>> we had. It's less proscriptive and doesn't make any demands on any other libs. >>>>> >>>>> Comments/thoughts welcome. >>>> Any comments on this RFC before I see about investing further time in it to clean >>>> it up a bit and submit as a non-RFC patchset for merge in 2.1? >>> I would say there are 2 possible approaches for KNI and ring handling: >>> 1/ You Bruce, Marc and Keith are advocating for a layer on top of ethdev, >>> ring, KNI and possibly other devices, which uses mbuf. The set of functions >>> is simpler than ethdev but the data structure is mbuf which is related to >>> ethdev layer. >>> 2/ Konstantin and Neil talked about keeping mbuf for ethdev layer and related >>> libs only. Ring and KNI could have an ethdev API with a reduced set of >>> implemented functions. Crypto devices could adopt a specific crypto API and >>> an ethdev API at the same time. >> I don't fully understand which APIs you meant by non-ethdev. This pktdev >> wrapper proposal abstracts RX and TX functions only, and all of these are >> using mbufs as the packet buffer abstraction right now anyway (ethdev). >> > He's referring to future device classes (like crypto devices), which ostensibly > would make use of the pktdev API. My argument (and I think Thomas') is that if > a bit of hardware can be made to operate as a packet sending/receiving device, > then its just as reasonable to use the existing ethdev api rather than some > other restricted version of it (pktdev) > >> This approach does not preclude that different libraries expose other API >> calls. In fact they will have to; setup the port/device ... It is just a >> higher level API, so that you don't have to check the type of port in your >> DPDK application I/O loop, minimizing user's code. >> > No argument there. But if thats the case (and I agree that it is), an > application will implicitly have to know what what type of device it is, because > it (the application) will need to understand the specific API it is writing to. > >> Or were you in 2) thinking about creating a different "packet buffer" >> abstraction, independent from the ethdev, and then map the different port >> specifics (e.g. mbuf) to this new abstraction? >> > My argument was to just leave the ethdev api alone. If a device class can be > made to look like a packet forwarding device, then use the existing ethdev api > to implement it. > >>> I feel it's cleaner, more generic and more maintainable to have drivers >>> implementing one or several stable APIs instead of having some restricted >>> wrappers to update. >> This would be a separate library _on top_ of the existing APIs, and it has >> the advantage to simplify the DPDK user's application code when an >> application needs to deal with several types of port, as shown in the >> example that Bruce provided in PATCH #2. >> > But thats already the purpose of the ethdev api. Different types of > hardware/software can be made to look like the same thing (an ethdev) from an > application standpoint. Adding this pktdev layer does nothing but that, add a > layer. If you want restricted functionality of an interface, thats ok, ethdev > offers that ability. unimplemented methods in a pmd cause the ethdev api to > return EOPNOTSUP to the calling application, so the application knows when a > given ethdev can't do some aspect of what an ethdev is. Hi Neil, Thanks for the clarifications. Now I understand the concern Thomas expressed. Using ethdev API (port-ids) was actually my first suggestion here: http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/13545 And to be honest, what I was expecting when I was reading for the first time DPDK's APIs. It is indeed an option. However, if we take a look at the API: http://www.dpdk.org/doc/api/rte__ethdev_8h.html none of the API calls, except the burst RX/TX and, perhaps, the callbacks, would be used by devices other than NICs. It seems going a bit too far using it, but ofc possible. In essence, rte_ether(rte_ethdev.h) right now has: i) NIC setup; general configuration, queue config, fdir, offloads, hw stuff like leds... ii) RX/TX routines and callbacks iii) Stats and queue stats iv) other utils for ethernet stuff (rte_ether.h) i) is clearly HW specific, and does only apply to NICs/ASICs (e.g. FM10k) while ii) and iii) are things that could be abstracted beyond NICs, like KNI, rte_ring, crypto... (iv could be moved into some utils/protocol parsing libraries). Perhaps these two groups could be split into two different libraries and then ii) and iii) together would be something like ~ rte_pktdev (stats are missing on the proposed patch), while i) would be rte_ether, or rte_nic if we think it is a better name. In any case, and I think we all agree here, I just think that one way or another this should be abstracted so that it simplifies (and reduces) a bit the code of DPDK applications. Marc > >> I don't see why this could limit us or make it less maintainable. Of course >> this is an RFC patch; appropriate tests are missing (Bruce I can help you on >> that) >> > It doesn't limit us, its just not a useful abstraction, because we already have > the abilities it provides. > > Neil >> Marc >> >>> Comments are welcome. >> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type 2015-05-20 17:01 ` Marc Sune @ 2015-05-20 18:47 ` Neil Horman 2015-05-21 12:12 ` Richardson, Bruce 0 siblings, 1 reply; 19+ messages in thread From: Neil Horman @ 2015-05-20 18:47 UTC (permalink / raw) To: Marc Sune; +Cc: dev On Wed, May 20, 2015 at 07:01:02PM +0200, Marc Sune wrote: > > > On 20/05/15 12:28, Neil Horman wrote: > >On Wed, May 20, 2015 at 12:05:00PM +0200, Marc Sune wrote: > >> > >>On 20/05/15 10:31, Thomas Monjalon wrote: > >>>2015-05-19 12:31, Bruce Richardson: > >>>>On Mon, May 11, 2015 at 05:29:39PM +0100, Bruce Richardson wrote: > >>>>>Hi all, > >>>>> > >>>>>after a small amount of offline discussion with Marc Sune, here is an > >>>>>alternative proposal for a higher-level interface - aka pktdev - to allow a > >>>>>common Rx/Tx API across device types handling mbufs [for now, ethdev, ring > >>>>>and KNI]. The key code is in the first patch fo the set - the second is an > >>>>>example of a trivial usecase. > >>>>> > >>>>>What is different about this to previously: > >>>>>* wrapper class, so no changes to any existing ring, ethdev implementations > >>>>>* use of function pointers for RX/TX with an API that maps to ethdev > >>>>> - this means there is little/no additional overhead for ethdev calls > >>>>> - inline special case for rings, to accelerate that. Since we are at a > >>>>> higher level, we can special case process some things if appropriate. This > >>>>> means the impact to ring ops is one (predictable) branch per burst > >>>>>* elimination of the queue abstraction. For the ring and KNI, there is no > >>>>> concept of queues, so we just wrap the functions directly (no need even for > >>>>> wrapper functions, the api's match so we can call directly). This also > >>>>> means: > >>>>> - adding in features per-queue, is far easier as we don't need to worry about > >>>>> having arrays of multiple queues. For example: > >>>>> - adding in buffering on TX (or RX) is easier since again we only have a > >>>>> single queue. > >>>>>* thread safety is made easier using a wrapper. For a MP ring, we can create > >>>>> multiple pktdevs around it, and each thread will then be able to use their > >>>>> own copy, with their own buffering etc. > >>>>> > >>>>>However, at this point, I'm just looking for general feedback on this as an > >>>>>approach. I think it's quite flexible - even more so than the earlier proposal > >>>>>we had. It's less proscriptive and doesn't make any demands on any other libs. > >>>>> > >>>>>Comments/thoughts welcome. > >>>>Any comments on this RFC before I see about investing further time in it to clean > >>>>it up a bit and submit as a non-RFC patchset for merge in 2.1? > >>>I would say there are 2 possible approaches for KNI and ring handling: > >>>1/ You Bruce, Marc and Keith are advocating for a layer on top of ethdev, > >>>ring, KNI and possibly other devices, which uses mbuf. The set of functions > >>>is simpler than ethdev but the data structure is mbuf which is related to > >>>ethdev layer. > >>>2/ Konstantin and Neil talked about keeping mbuf for ethdev layer and related > >>>libs only. Ring and KNI could have an ethdev API with a reduced set of > >>>implemented functions. Crypto devices could adopt a specific crypto API and > >>>an ethdev API at the same time. > >>I don't fully understand which APIs you meant by non-ethdev. This pktdev > >>wrapper proposal abstracts RX and TX functions only, and all of these are > >>using mbufs as the packet buffer abstraction right now anyway (ethdev). > >> > >He's referring to future device classes (like crypto devices), which ostensibly > >would make use of the pktdev API. My argument (and I think Thomas') is that if > >a bit of hardware can be made to operate as a packet sending/receiving device, > >then its just as reasonable to use the existing ethdev api rather than some > >other restricted version of it (pktdev) > > > >>This approach does not preclude that different libraries expose other API > >>calls. In fact they will have to; setup the port/device ... It is just a > >>higher level API, so that you don't have to check the type of port in your > >>DPDK application I/O loop, minimizing user's code. > >> > >No argument there. But if thats the case (and I agree that it is), an > >application will implicitly have to know what what type of device it is, because > >it (the application) will need to understand the specific API it is writing to. > > > >>Or were you in 2) thinking about creating a different "packet buffer" > >>abstraction, independent from the ethdev, and then map the different port > >>specifics (e.g. mbuf) to this new abstraction? > >> > >My argument was to just leave the ethdev api alone. If a device class can be > >made to look like a packet forwarding device, then use the existing ethdev api > >to implement it. > > > >>>I feel it's cleaner, more generic and more maintainable to have drivers > >>>implementing one or several stable APIs instead of having some restricted > >>>wrappers to update. > >>This would be a separate library _on top_ of the existing APIs, and it has > >>the advantage to simplify the DPDK user's application code when an > >>application needs to deal with several types of port, as shown in the > >>example that Bruce provided in PATCH #2. > >> > >But thats already the purpose of the ethdev api. Different types of > >hardware/software can be made to look like the same thing (an ethdev) from an > >application standpoint. Adding this pktdev layer does nothing but that, add a > >layer. If you want restricted functionality of an interface, thats ok, ethdev > >offers that ability. unimplemented methods in a pmd cause the ethdev api to > >return EOPNOTSUP to the calling application, so the application knows when a > >given ethdev can't do some aspect of what an ethdev is. > > Hi Neil, > > Thanks for the clarifications. Now I understand the concern Thomas > expressed. Using ethdev API (port-ids) was actually my first suggestion > here: > > http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/13545 > > And to be honest, what I was expecting when I was reading for the first time > DPDK's APIs. It is indeed an option. However, if we take a look at the API: > > http://www.dpdk.org/doc/api/rte__ethdev_8h.html > > none of the API calls, except the burst RX/TX and, perhaps, the callbacks, > would be used by devices other than NICs. It seems going a bit too far using > it, but ofc possible. > So, I'll make 3 counter-arguments here: 1) To your point about the ethdev api being much larger than what a non-ethernet device could use, I'll tacitly agree, but indicate that its not relevant. If you want a bit of hardware that isn't a network interface to behave like a network interface, then there are going to be alot of aspects of a network interface that it just can't do. Thats true regardless of how you implement that. In the pktdev model, you prevent those operations from being an option at all, while in the current ethdev model, you simply get a return code of EOPNOTSUP, and the application does the right thing (which is to say, it understands that this hardware doesn't need that aspect of network card mangement and goes on with its day). I assert that, because we already have the ethdev api, its a lower time investment to simply reuse it 2) To the implication that we aren't working with NICs here, you're correct. As you note in your previous message, the pktdev interface is in no way the end all and be all of device model design. You will need to add other api calls to manage the device. If thats the case, then don't shoehorn any one particular aspect of the API to fit a device model that the device doesn't conform to. Design the API so that it best reflects the hardware behavior. 3) An addendum to the point about hardware not being a NIC (and you didn't make this point directly above, but I think you may have mentioned it previously), sometimes you want a device to behave like another device for the purposes of using generic code to talk to several device types. While this is true, this is a case for device translation and use, not for carving out parts of an api to make something more generic. The use case I cited previously was an ipsec tunnel. An ipsec tunnel uses cryptography, and crypto device apis to encrypt decrypt packet data. The common way to implement this is to design a crypto api that accepts a block of data in a way most condusive to the hardware, and then implement a network driver (that uses whatever ethernet api, in this case the ethdev api), to integrate with the network datapath. With this model, the ipsec tunnel uses the full range of the ethdev api (or a good deal more of it), and the crypto api is optimized to work with crypo acceleration hardware. > In essence, rte_ether(rte_ethdev.h) right now has: i) NIC setup; general > configuration, queue config, fdir, offloads, hw stuff like leds... ii) RX/TX > routines and callbacks iii) Stats and queue stats iv) other utils for > ethernet stuff (rte_ether.h) > The key that I'm taking away here is 'right now'. Its already written, so theres no work involved in implementing it for new devices. > i) is clearly HW specific, and does only apply to NICs/ASICs (e.g. FM10k) Ok, so it only applies to NIC's, thats fine. If you want to write a driver that leaves those methods for the pmd set to NULL, the ethdev library will correctly return EOPNOTSUPP to the calling applications. > while ii) and iii) are things that could be abstracted beyond NICs, like > KNI, rte_ring, crypto... (iv could be moved into some utils/protocol parsing > libraries). > Right again, so let those device types implement the appropriate portions of the pmd driver structure that match to what they support. EVerything else is handled by the ethdev library automatically. > Perhaps these two groups could be split into two different libraries and > then ii) and iii) together would be something like ~ rte_pktdev (stats are > missing on the proposed patch), while i) would be rte_ether, or rte_nic if > we think it is a better name. > The point I'm trying to get to is, why split at all? Theres just no need that I can see. The example I would set here is the dummy driver in linux. Its a net device that only serves to act as a sink for network packets. It still uses the network driver interface, but of the 65-ish methods that the netdevice model in linux offers, it implements 8 (or approximately 12%). The other unused method are just that, unused, and thats ok. Applications that try to do things like set flow director options, or speed/duplex options gets a return code that effectively says "This device can't do that", and thats ok. Thats what we need to be doing here. Instead of finding a way to codify the subset of functionality that other devices might be able to implement, for those cases where we want other hardware to act like a netdevice, lets just let those devices pick and choose what to implement, and the interface we already have will communicate with applications appropriately. Regards Neil > In any case, and I think we all agree here, I just think that one way or > another this should be abstracted so that it simplifies (and reduces) a bit > the code of DPDK applications. > > Marc > > > > >>I don't see why this could limit us or make it less maintainable. Of course > >>this is an RFC patch; appropriate tests are missing (Bruce I can help you on > >>that) > >> > >It doesn't limit us, its just not a useful abstraction, because we already have > >the abilities it provides. > > > >Neil > >>Marc > >> > >>>Comments are welcome. > >> > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type 2015-05-20 18:47 ` Neil Horman @ 2015-05-21 12:12 ` Richardson, Bruce 0 siblings, 0 replies; 19+ messages in thread From: Richardson, Bruce @ 2015-05-21 12:12 UTC (permalink / raw) To: Neil Horman, Marc Sune; +Cc: dev > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman > Sent: Wednesday, May 20, 2015 7:47 PM > To: Marc Sune > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type > > On Wed, May 20, 2015 at 07:01:02PM +0200, Marc Sune wrote: > > > > > > On 20/05/15 12:28, Neil Horman wrote: > > >On Wed, May 20, 2015 at 12:05:00PM +0200, Marc Sune wrote: > > >> > > >>On 20/05/15 10:31, Thomas Monjalon wrote: > > >>>2015-05-19 12:31, Bruce Richardson: > > >>>>On Mon, May 11, 2015 at 05:29:39PM +0100, Bruce Richardson wrote: > > >>>>>Hi all, > > >>>>> > > >>>>>after a small amount of offline discussion with Marc Sune, here > > >>>>>is an alternative proposal for a higher-level interface - aka > > >>>>>pktdev - to allow a common Rx/Tx API across device types handling > > >>>>>mbufs [for now, ethdev, ring and KNI]. The key code is in the > > >>>>>first patch fo the set - the second is an example of a trivial > usecase. > > >>>>> > > >>>>>What is different about this to previously: > > >>>>>* wrapper class, so no changes to any existing ring, ethdev > > >>>>>implementations > > >>>>>* use of function pointers for RX/TX with an API that maps to > ethdev > > >>>>> - this means there is little/no additional overhead for ethdev > calls > > >>>>> - inline special case for rings, to accelerate that. Since we > are at a > > >>>>> higher level, we can special case process some things if > appropriate. This > > >>>>> means the impact to ring ops is one (predictable) branch per > > >>>>>burst > > >>>>>* elimination of the queue abstraction. For the ring and KNI, there > is no > > >>>>> concept of queues, so we just wrap the functions directly (no > need even for > > >>>>> wrapper functions, the api's match so we can call directly). > This also > > >>>>> means: > > >>>>> - adding in features per-queue, is far easier as we don't need > to worry about > > >>>>> having arrays of multiple queues. For example: > > >>>>> - adding in buffering on TX (or RX) is easier since again we > only have a > > >>>>> single queue. > > >>>>>* thread safety is made easier using a wrapper. For a MP ring, we > can create > > >>>>> multiple pktdevs around it, and each thread will then be able to > use their > > >>>>> own copy, with their own buffering etc. > > >>>>> > > >>>>>However, at this point, I'm just looking for general feedback on > > >>>>>this as an approach. I think it's quite flexible - even more so > > >>>>>than the earlier proposal we had. It's less proscriptive and > doesn't make any demands on any other libs. > > >>>>> > > >>>>>Comments/thoughts welcome. > > >>>>Any comments on this RFC before I see about investing further time > > >>>>in it to clean it up a bit and submit as a non-RFC patchset for > merge in 2.1? > > >>>I would say there are 2 possible approaches for KNI and ring > handling: > > >>>1/ You Bruce, Marc and Keith are advocating for a layer on top of > > >>>ethdev, ring, KNI and possibly other devices, which uses mbuf. The > > >>>set of functions is simpler than ethdev but the data structure is > > >>>mbuf which is related to ethdev layer. > > >>>2/ Konstantin and Neil talked about keeping mbuf for ethdev layer > > >>>and related libs only. Ring and KNI could have an ethdev API with a > > >>>reduced set of implemented functions. Crypto devices could adopt a > > >>>specific crypto API and an ethdev API at the same time. > > >>I don't fully understand which APIs you meant by non-ethdev. This > > >>pktdev wrapper proposal abstracts RX and TX functions only, and all > > >>of these are using mbufs as the packet buffer abstraction right now > anyway (ethdev). > > >> > > >He's referring to future device classes (like crypto devices), which > > >ostensibly would make use of the pktdev API. My argument (and I > > >think Thomas') is that if a bit of hardware can be made to operate as > > >a packet sending/receiving device, then its just as reasonable to use > > >the existing ethdev api rather than some other restricted version of > > >it (pktdev) > > > > > >>This approach does not preclude that different libraries expose > > >>other API calls. In fact they will have to; setup the port/device > > >>... It is just a higher level API, so that you don't have to check > > >>the type of port in your DPDK application I/O loop, minimizing user's > code. > > >> > > >No argument there. But if thats the case (and I agree that it is), > > >an application will implicitly have to know what what type of device > > >it is, because it (the application) will need to understand the > specific API it is writing to. > > > > > >>Or were you in 2) thinking about creating a different "packet buffer" > > >>abstraction, independent from the ethdev, and then map the different > > >>port specifics (e.g. mbuf) to this new abstraction? > > >> > > >My argument was to just leave the ethdev api alone. If a device > > >class can be made to look like a packet forwarding device, then use > > >the existing ethdev api to implement it. > > > > > >>>I feel it's cleaner, more generic and more maintainable to have > > >>>drivers implementing one or several stable APIs instead of having > > >>>some restricted wrappers to update. > > >>This would be a separate library _on top_ of the existing APIs, and > > >>it has the advantage to simplify the DPDK user's application code > > >>when an application needs to deal with several types of port, as > > >>shown in the example that Bruce provided in PATCH #2. > > >> > > >But thats already the purpose of the ethdev api. Different types of > > >hardware/software can be made to look like the same thing (an ethdev) > > >from an application standpoint. Adding this pktdev layer does > > >nothing but that, add a layer. If you want restricted functionality > > >of an interface, thats ok, ethdev offers that ability. unimplemented > > >methods in a pmd cause the ethdev api to return EOPNOTSUP to the > > >calling application, so the application knows when a given ethdev can't > do some aspect of what an ethdev is. > > > > Hi Neil, > > > > Thanks for the clarifications. Now I understand the concern Thomas > > expressed. Using ethdev API (port-ids) was actually my first > > suggestion > > here: > > > > http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/13545 > > > > And to be honest, what I was expecting when I was reading for the > > first time DPDK's APIs. It is indeed an option. However, if we take a > look at the API: > > > > http://www.dpdk.org/doc/api/rte__ethdev_8h.html > > > > none of the API calls, except the burst RX/TX and, perhaps, the > > callbacks, would be used by devices other than NICs. It seems going a > > bit too far using it, but ofc possible. > > > So, I'll make 3 counter-arguments here: > > 1) To your point about the ethdev api being much larger than what a non- > ethernet device could use, I'll tacitly agree, but indicate that its not > relevant. If you want a bit of hardware that isn't a network interface to > behave like a network interface, then there are going to be alot of > aspects of a network interface that it just can't do. Thats true > regardless of how you implement that. In the pktdev model, you prevent > those operations from being an option at all, while in the current ethdev > model, you simply get a return code of EOPNOTSUP, and the application does > the right thing (which is to say, it understands that this hardware > doesn't need that aspect of network card mangement and goes on with its > day). I assert that, because we already have the ethdev api, its a lower > time investment to simply reuse it > > 2) To the implication that we aren't working with NICs here, you're > correct. As you note in your previous message, the pktdev interface is in > no way the end all and be all of device model design. You will need to > add other api calls to manage the device. If thats the case, then don't > shoehorn any one particular aspect of the API to fit a device model that > the device doesn't conform to. > Design the API so that it best reflects the hardware behavior. > > > 3) An addendum to the point about hardware not being a NIC (and you didn't > make this point directly above, but I think you may have mentioned it > previously), sometimes you want a device to behave like another device for > the purposes of using generic code to talk to several device types. While > this is true, this is a case for device translation and use, not for > carving out parts of an api to make something more generic. The use case > I cited previously was an ipsec tunnel. An ipsec tunnel uses > cryptography, and crypto device apis to encrypt decrypt packet data. The > common way to implement this is to design a crypto api that accepts a > block of data in a way most condusive to the hardware, and then implement > a network driver (that uses whatever ethernet api, in this case the ethdev > api), to integrate with the network datapath. With this model, the ipsec > tunnel uses the full range of the ethdev api (or a good deal more of it), > and the crypto api is optimized to work with crypo acceleration hardware. > > > In essence, rte_ether(rte_ethdev.h) right now has: i) NIC setup; > > general configuration, queue config, fdir, offloads, hw stuff like > > leds... ii) RX/TX routines and callbacks iii) Stats and queue stats > > iv) other utils for ethernet stuff (rte_ether.h) > > > The key that I'm taking away here is 'right now'. Its already written, so > theres no work involved in implementing it for new devices. > > > i) is clearly HW specific, and does only apply to NICs/ASICs (e.g. > > FM10k) > Ok, so it only applies to NIC's, thats fine. If you want to write a > driver that leaves those methods for the pmd set to NULL, the ethdev > library will correctly return EOPNOTSUPP to the calling applications. > > > while ii) and iii) are things that could be abstracted beyond NICs, > > like KNI, rte_ring, crypto... (iv could be moved into some > > utils/protocol parsing libraries). > > > Right again, so let those device types implement the appropriate portions > of the pmd driver structure that match to what they support. EVerything > else is handled by the ethdev library automatically. > > > Perhaps these two groups could be split into two different libraries > > and then ii) and iii) together would be something like ~ rte_pktdev > > (stats are missing on the proposed patch), while i) would be > > rte_ether, or rte_nic if we think it is a better name. > > > The point I'm trying to get to is, why split at all? Theres just no need > that I can see. The example I would set here is the dummy driver in linux. > Its a net device that only serves to act as a sink for network packets. > It still uses the network driver interface, but of the 65-ish methods that > the netdevice model in linux offers, it implements 8 (or approximately > 12%). The other unused method are just that, unused, and thats ok. > Applications that try to do things like set flow director options, or > speed/duplex options gets a return code that effectively says "This device > can't do that", and thats ok. Thats what we need to be doing here. > Instead of finding a way to codify the subset of functionality that other > devices might be able to implement, for those cases where we want other > hardware to act like a netdevice, lets just let those devices pick and > choose what to implement, and the interface we already have will > communicate with applications appropriately. > > Regards > Neil > Hi Neil, First off, a note on the naming and the basic concept: this proposal is not trying to make everything look like NIC, rather we are trying to make a bunch of different components appear as generic sources/sinks for pkts or mbufs. From my point of view, it's an important difference. Be that as it may, I'd like to first deal with the whole idea of the application needing to know about the type of the underlying device. For me, this is a critical point. Applications - such as all our sample apps - have essentially two parts: * an initialization and control part * a data-path part. These two parts are very, very different in what they do. The initialization part - which e.g. in testpmd continues on in the form of the cmdline interface as a control part - does the initial setup of devices/rings/etc. and potentially makes use of the full APIs provided by the ethdev interface. It's also not performance critical, as evidenced by the fact that the APIs used there have additional checks for valid input etc. The second, data-path part, is entirely the part that this proposal is targeting. This data-path is completely separate in the application, is highly performance sensitive, and rarely, if ever needs to know or care about the actual source of its data. So the idea behind this library is that you can write your initialization control parts of your app as-now, fully aware of the underlying types involved, and without ever using rx/tx burst. Then when you have the various devices and DPDK objects set up, you spawn your data-path threads and pass each one the set of input and outputs it needs, in the form of your generic packet source objects. This distinction is also why I'm not particularly interested in the ability to pass in different objects via cmdline, as is done now with pcap/ring PMDs. That's ok when you want the initialization part of your app to be oblivious to see everything under a common abstraction, but when it's only the data path you want to work with generic packet objects that's unnecessary, and the initialization path should be able to convert any of the required input/output sources to a generic type using a single API call. [This doesn't rule out specifying different inputs/outputs on the commandline, it's just you can specify them as their native types, rather than hiding them under a common API at the control-path level]. As for what that abstraction should be. There are a number of issues I see with ethdev - as it is right now, as that common abstraction. 1. The use of port-ids. I think port ids are fine for numbering physical ports, but I think pointers are better for passing around objects to be worked on by the data path. What is more concerning [than my opinion on numbers vs pointers :-)] is the fact that we are limited to 256 port ids. Yes, that can be changed, but the impacts are massive. To change the type, we would break the ABI for every single ethdev API, as well as likely other functions too. Furthermore, increasing the size of the port id would require a change to the internals of the mbuf structure, which would lead to the ABI being broken for any function that uses mbufs. By adopting an API, such as proposed, which uses pointers, we avoid the problem, as port ids would only apply to ethdevs. 2. Simplicity. While you say that its fine for an ethdev not to implement all the functions in the ethdev API, to create a proper PMD like you are proposing involves a good deal more work than using the proposed pktdev abstraction. If it's to appear like a proper NIC to the control paths, as well as the init paths - which seems to be what you imply - you really do need to implement additional functions like queue setup, and start and stop. While it's true that the library can return -ENOSUP on an unsupported function, I don't believe any of our sample apps are set up to check for this on NIC setup, and therefore I would hazard a guess that real-world customer apps aren't set up to handle it either. 3. Performance for rings. While not applicable for all cases, the performance of the rings under an ethdev abstraction would not be the same as here. For example, when polling on an empty ring for packets, the current time taken by our ring rx/tx functions, is literally a few cycles (as tested by the rings autotest). If these functions cannot be inlined, that cycle count goes up to 3x what it is now. [I observed this previously when doing reworking of the rings code, and the code-size led to icc no longer doing inlining. In that case, the gcc code for empty polling was indeed 3 times faster than the icc version. Adding forced inlining made things equal again]. This metric of empty polling may seem trivial i.e. "if there are no packets, why does it matter how long it takes?", but is important in real-world cases where you are pulling packets from multiple sources, and your application is only currently dealing with input on one of them. [Often tested to see how an application handles in a single-flow situation - an metric our customers do look at]. Even in the non-empty situation, for smaller packet bursts, the overhead of the function call may slow things down. [For larger bursts, e.g. 32, the effect should not be noticeable, I suspect]. The only other final point I'd make here is that what is proposed is not proscriptive - whatever a future API for handling other device types, such as crypto devices, may look like can be decided separately from this pktdev implementation. Whether one chooses pktdev or ethdev as a common abstraction layer type, the decision of whether or not a particular object type is allowed to be made look like that common type can be made entirely independently, and based upon whether or not such a type-conversion makes sense. Regards, /Bruce ^ permalink raw reply [flat|nested] 19+ messages in thread
* [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update 2015-05-19 11:31 ` [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type Bruce Richardson 2015-05-20 0:19 ` Wiles, Keith 2015-05-20 8:31 ` Thomas Monjalon @ 2015-06-10 13:07 ` Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 1/6] kni: add function to query the name of a kni object Bruce Richardson ` (6 more replies) 2 siblings, 7 replies; 19+ messages in thread From: Bruce Richardson @ 2015-06-10 13:07 UTC (permalink / raw) To: dev Following on from the feedback received from the community about the pktdev idea, I've decided not to push this approach further for DPDK 2.1. Instead, for future releases, I'll look at taking some of what was investigated in this work and see if it can be applied to the existing ethdev library, which seems to be the favoured point of convergence in the community. Hopefully, we can get ethdev to meet all the requirements I had looked for for pktdev. [If not, I may need to come back to look at this again, but I hope not! :-)] For the sake of completeness, I'm also sending out my latest, and final, draft set of patches for pktdev, in case Marc, or someone else, wishes to take this further right now. As I've said, for the time being, I'm going to switch focus to ethdev. Thanks for all the feedback. Regards, /Bruce Bruce Richardson (5): kni: add function to query the name of a kni object pktdev: Add pktdev implementation library example app showing pktdevs used in a chain new pktdev l2fwd sample test: add pktdev performance tests Marc Sune (1): pktdev: adding app test app/test/Makefile | 2 + app/test/test_pktdev.c | 440 +++++++++++++++++++++++++ app/test/test_pktdev_perf.c | 260 +++++++++++++++ config/common_bsdapp | 5 + config/common_linuxapp | 5 + examples/pktdev-chain/Makefile | 57 ++++ examples/pktdev-chain/basicfwd.c | 221 +++++++++++++ examples/pktdev-l2fwd/Makefile | 50 +++ examples/pktdev-l2fwd/main.c | 530 +++++++++++++++++++++++++++++++ lib/Makefile | 1 + lib/librte_kni/rte_kni.c | 6 + lib/librte_kni/rte_kni.h | 10 + lib/librte_kni/rte_kni_version.map | 1 + lib/librte_pktdev/Makefile | 56 ++++ lib/librte_pktdev/rte_pktdev.c | 188 +++++++++++ lib/librte_pktdev/rte_pktdev.h | 400 +++++++++++++++++++++++ lib/librte_pktdev/rte_pktdev_version.map | 11 + mk/rte.app.mk | 1 + 18 files changed, 2244 insertions(+) create mode 100644 app/test/test_pktdev.c create mode 100644 app/test/test_pktdev_perf.c create mode 100644 examples/pktdev-chain/Makefile create mode 100644 examples/pktdev-chain/basicfwd.c create mode 100644 examples/pktdev-l2fwd/Makefile create mode 100644 examples/pktdev-l2fwd/main.c create mode 100644 lib/librte_pktdev/Makefile create mode 100644 lib/librte_pktdev/rte_pktdev.c create mode 100644 lib/librte_pktdev/rte_pktdev.h create mode 100644 lib/librte_pktdev/rte_pktdev_version.map -- 2.4.2 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [dpdk-dev] [RFC-PATCH-v3 1/6] kni: add function to query the name of a kni object 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Bruce Richardson @ 2015-06-10 13:07 ` Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 2/6] pktdev: Add pktdev implementation library Bruce Richardson ` (5 subsequent siblings) 6 siblings, 0 replies; 19+ messages in thread From: Bruce Richardson @ 2015-06-10 13:07 UTC (permalink / raw) To: dev When a KNI object is created, a name is assigned to it which is stored internally. There is also an API function to look up a KNI object by name, but there is no API to query the current name of an existing KNI object. This patch adds just such an API. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> --- lib/librte_kni/rte_kni.c | 6 ++++++ lib/librte_kni/rte_kni.h | 10 ++++++++++ lib/librte_kni/rte_kni_version.map | 1 + 3 files changed, 17 insertions(+) diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index 4e70fa0..c5a0089 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -674,6 +674,12 @@ rte_kni_get(const char *name) return NULL; } +const char * +rte_kni_get_name(const struct rte_kni *kni) +{ + return kni->name; +} + /* * It is deprecated and just for backward compatibility. */ diff --git a/lib/librte_kni/rte_kni.h b/lib/librte_kni/rte_kni.h index 44240fe..0c74251 100644 --- a/lib/librte_kni/rte_kni.h +++ b/lib/librte_kni/rte_kni.h @@ -248,6 +248,16 @@ extern uint8_t rte_kni_get_port_id(struct rte_kni *kni) \ extern struct rte_kni *rte_kni_get(const char *name); /** + * Get the name given to a KNI device + * + * @param kni + * The KNI instance to query + * @return + * The pointer to the KNI name + */ +extern const char *rte_kni_get_name(const struct rte_kni *kni); + +/** * Get the KNI context of the specific port. * * Note: It is deprecated and just for backward compatibility. diff --git a/lib/librte_kni/rte_kni_version.map b/lib/librte_kni/rte_kni_version.map index b0bbf4d..e5e4e1b 100644 --- a/lib/librte_kni/rte_kni_version.map +++ b/lib/librte_kni/rte_kni_version.map @@ -6,6 +6,7 @@ DPDK_2.0 { rte_kni_create; rte_kni_get; rte_kni_get_port_id; + rte_kni_get_name; rte_kni_handle_request; rte_kni_info_get; rte_kni_init; -- 2.4.2 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [dpdk-dev] [RFC-PATCH-v3 2/6] pktdev: Add pktdev implementation library 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 1/6] kni: add function to query the name of a kni object Bruce Richardson @ 2015-06-10 13:07 ` Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 3/6] example app showing pktdevs used in a chain Bruce Richardson ` (4 subsequent siblings) 6 siblings, 0 replies; 19+ messages in thread From: Bruce Richardson @ 2015-06-10 13:07 UTC (permalink / raw) To: dev The pktdev API is a minimal API designed for runtime packet processing use. It works by providing a common API for RX and TX across a range of device and object types: namely SW rings, ethernet ports and KNI ports. By using the pktdev abstraction, packet data-path code can be written to accept packets from a source and pass them to a destination without having to be concerned about what the underlying types of those sources and destinations are. Pktdev does not provide any specific APIs for manipulating the underlying packet sources - that should be handled by the application initialization, or control code, not the data path code which uses the pktdev abstraction. Signed-off-by: Marc Sune <marc.sune@bisdn.de> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> --- config/common_bsdapp | 5 + config/common_linuxapp | 5 + lib/Makefile | 1 + lib/librte_pktdev/Makefile | 56 +++++ lib/librte_pktdev/rte_pktdev.c | 188 +++++++++++++++ lib/librte_pktdev/rte_pktdev.h | 400 +++++++++++++++++++++++++++++++ lib/librte_pktdev/rte_pktdev_version.map | 11 + mk/rte.app.mk | 1 + 8 files changed, 667 insertions(+) create mode 100644 lib/librte_pktdev/Makefile create mode 100644 lib/librte_pktdev/rte_pktdev.c create mode 100644 lib/librte_pktdev/rte_pktdev.h create mode 100644 lib/librte_pktdev/rte_pktdev_version.map diff --git a/config/common_bsdapp b/config/common_bsdapp index 0b169c8..ad073f2 100644 --- a/config/common_bsdapp +++ b/config/common_bsdapp @@ -132,6 +132,11 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y CONFIG_RTE_LIBRTE_KVARGS=y # +# Compile generic packet handling device library +# +CONFIG_RTE_LIBRTE_PKTDEV=y + +# # Compile generic ethernet library # CONFIG_RTE_LIBRTE_ETHER=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 5deb55a..ffffb58 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -129,6 +129,11 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y CONFIG_RTE_LIBRTE_KVARGS=y # +# Compile generic packet handling device library +# +CONFIG_RTE_LIBRTE_PKTDEV=y + +# # Compile generic ethernet library # CONFIG_RTE_LIBRTE_ETHER=y diff --git a/lib/Makefile b/lib/Makefile index 5f480f9..c7980d4 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -32,6 +32,7 @@ include $(RTE_SDK)/mk/rte.vars.mk DIRS-y += librte_compat +DIRS-$(CONFIG_RTE_LIBRTE_PKTDEV) += librte_pktdev DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring diff --git a/lib/librte_pktdev/Makefile b/lib/librte_pktdev/Makefile new file mode 100644 index 0000000..e7c681c --- /dev/null +++ b/lib/librte_pktdev/Makefile @@ -0,0 +1,56 @@ +# BSD LICENSE +# +# Copyright(c) 2015 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# library name +# +LIB = librte_pktdev.a + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) + +EXPORT_MAP := rte_pktdev_version.map + +LIBABIVER := 1 + +# all source are stored in SRCS-y +SRCS-y := rte_pktdev.c + +# +# Export include files +# +SYMLINK-y-include += rte_pktdev.h + +DEPDIRS-y += lib/librte_ring lib/librte_kni lib/librte_ether + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/lib/librte_pktdev/rte_pktdev.c b/lib/librte_pktdev/rte_pktdev.c new file mode 100644 index 0000000..6e0c979 --- /dev/null +++ b/lib/librte_pktdev/rte_pktdev.c @@ -0,0 +1,188 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <rte_kni.h> +#include <rte_ring.h> +#include <rte_errno.h> +#include <rte_malloc.h> +#include <rte_ethdev.h> +#include "rte_pktdev.h" + +struct rte_pktdev * +rte_pktdev_from_ring(struct rte_ring *r) +{ + struct rte_pktdev *d; + + if (r == NULL) + return NULL; + + d = rte_zmalloc(NULL, sizeof(*d), 0); + if (d == NULL) { + rte_errno = ENOMEM; + return d; + } + + d->type = RTE_PKT_DEV_TYPE_RING; + d->base_object = r; + d->rx_pkt_burst = (pkt_rx_burst_t)rte_ring_dequeue_burst; + d->tx_pkt_burst = (pkt_rx_burst_t)rte_ring_enqueue_burst; + d->rx_handle = r; + d->tx_handle = r; + + return d; +} + +struct rte_pktdev * +rte_pktdev_from_kni(struct rte_kni *k) +{ + struct rte_pktdev *d; + + if (k == NULL) + return NULL; + + d = rte_zmalloc(NULL, sizeof(*d), 0); + if (d == NULL) { + rte_errno = ENOMEM; + return d; + } + + d->type = RTE_PKT_DEV_TYPE_KNI; + d->base_object = k; + d->rx_pkt_burst = (pkt_rx_burst_t)rte_kni_rx_burst; + d->tx_pkt_burst = (pkt_rx_burst_t)rte_kni_tx_burst; + d->rx_handle = k; + d->tx_handle = k; + + return d; +} + +static struct rte_pktdev * +rte_pktdev_from_ethdev(struct rte_eth_dev *e, uint16_t rxq, uint16_t txq) +{ + struct rte_pktdev *d; + + if (e == NULL) + return NULL; + + d = rte_zmalloc(NULL, sizeof(*d), 0); + if (d == NULL) { + rte_errno = ENOMEM; + return d; + } + + d->type = RTE_PKT_DEV_TYPE_ETHDEV; + d->base_object = e; + d->rx_pkt_burst = e->rx_pkt_burst; + d->tx_pkt_burst = e->tx_pkt_burst; + d->rx_handle = e->data->rx_queues[rxq]; + d->tx_handle = e->data->tx_queues[txq]; + + return d; +} + +struct rte_pktdev * +rte_pktdev_from_ethport(uint8_t port_id, uint16_t rxq, uint16_t txq) +{ + if(port_id >= rte_eth_dev_count()) + return NULL; + + return rte_pktdev_from_ethdev(&rte_eth_devices[port_id], rxq, txq); +} + +const char * +rte_pktdev_get_name(const struct rte_pktdev *dev, char *buffer, unsigned buf_len) +{ + if (buf_len == 0) + goto out; + + switch (dev->type) { + case RTE_PKT_DEV_TYPE_ETHDEV: { + const struct rte_eth_dev *e = dev->base_object; + snprintf(buffer, buf_len, "Port_%u", e->data->port_id); + break; + } + case RTE_PKT_DEV_TYPE_RING: { + const struct rte_ring *r = dev->base_object; + snprintf(buffer, buf_len, "%s", r->name); + break; + } + case RTE_PKT_DEV_TYPE_KNI: { + const struct rte_kni *k = dev->base_object; + snprintf(buffer, buf_len, "KNI_%s", rte_kni_get_name(k)); + break; + } + /* + * explicitly list all values, so compiler can detect missing values + * in future if new values are added to the enum. [Using "default", this + * error detection won't occur.] + */ + case RTE_PKT_DEV_TYPE_NONE: + case RTE_PKT_DEV_TYPE_MAX: + buffer[0] = '\0'; /* guarantee a valid null-terminated string */ + break; + } +out: + return buffer; +} + +void +rte_pktdev_set_tx_buf_err_callback(struct rte_pktdev *dev, + pkt_burst_error_t fn, void *param) +{ + dev->tx_error = fn; + dev->tx_error_param = param; +} + +uint64_t +rte_pktdev_get_rx_count(const struct rte_pktdev *dev) +{ + return dev->rx_count; +} + +uint64_t +rte_pktdev_get_tx_count(const struct rte_pktdev *dev) +{ + return dev->tx_count; +} + +uint64_t +rte_pktdev_get_tx_buffer_count(const struct rte_pktdev *dev) +{ + return dev->tx_buf_count; +} + +uint64_t +rte_pktdev_get_tx_drop_count(const struct rte_pktdev *dev) +{ + return dev->tx_drop_count; +} diff --git a/lib/librte_pktdev/rte_pktdev.h b/lib/librte_pktdev/rte_pktdev.h new file mode 100644 index 0000000..3acbc0d --- /dev/null +++ b/lib/librte_pktdev/rte_pktdev.h @@ -0,0 +1,400 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_PKTDEV_H_ +#define _RTE_PKTDEV_H_ + +/** + * @file + * + * RTE Packet Processing Device API + */ + +#ifdef __cplusplus +extern "C" { +#endif + +#include <stdint.h> +#include <rte_ring.h> +#include <rte_branch_prediction.h> + +/* Buffered TX works in bursts of 32 */ +#define TX_BUFFER_SIZE 32 + +/* + * forward definition of data structures. + * We don't need full mbuf/kni/ethdev headers here + */ +struct rte_mbuf; +struct rte_kni; +struct rte_eth_dev; + +/* forward declaration of structure declared here. Needed for function typedef */ +struct rte_pktdev; + +/** @internal Retrieve packets from a queue of a device. */ +typedef uint16_t (*pkt_rx_burst_t)(void *rx_handle, + struct rte_mbuf **rx_pkts, uint16_t nb_pkts); + +/** @internal Send packets on a queue of a device. */ +typedef uint16_t (*pkt_tx_burst_t)(void *tx_handle, + struct rte_mbuf **tx_pkts, uint16_t nb_pkts); + +/** + * Callback that applications can configure to be called when a burst of pkts + * fail to send on TX buffer flush. + */ +typedef void (*pkt_burst_error_t)(struct rte_pktdev *dev, + struct rte_mbuf **failed_pkts, uint16_t nb_failed, void *param); + +/** + * @internal Enum listing all the underlying types that pktdev can be built upon + */ +enum rte_pktdev_type { + RTE_PKT_DEV_TYPE_NONE = 0, + RTE_PKT_DEV_TYPE_ETHDEV, + RTE_PKT_DEV_TYPE_RING, + RTE_PKT_DEV_TYPE_KNI, + + RTE_PKT_DEV_TYPE_MAX +}; + +/** + * @internal The pktdev structure + */ +struct rte_pktdev { + /* basic functionality support - type info, rx + tx */ + enum rte_pktdev_type type; /**< The type of object being used as a pktdev */ + void *base_object; /**< The underlying ethdev/ring/KNI object*/ + pkt_rx_burst_t rx_pkt_burst; /**< Pointer to receive function. */ + pkt_tx_burst_t tx_pkt_burst; /**< Pointer to transmit function. */ + void *rx_handle; /**< Handle passed to RX function */ + void *tx_handle; /**< Handle passed to TX function */ + + /* support for buffered TX */ + struct rte_mbuf *tx_buf[32]; /**< Store packets to allow buffered TX */ + uint16_t tx_buf_count; /**< Tracks tx_buffer occupation */ + pkt_burst_error_t tx_error; /**< callback for err sending buffered pkts */ + void *tx_error_param; /**< extra parameter passed to err callback */ + + /* basic stats tracking, packets in/out */ + uint64_t rx_count; /**< Received packets count */ + uint64_t tx_count; /**< Packets sent successfully, excluding + * any currently buffered packets */ + uint64_t tx_drop_count; /**< Packets dropped when flushing + * the tx_buffer **/ + +} __rte_cache_aligned; + +/** + * + * Retrieve a burst of input packets from a receive queue of a + * device. The retrieved packets are stored in *rte_mbuf* structures whose + * pointers are supplied in the *rx_pkts* array. + * + * @param dev + * The device to be polled for packets + * @param queue_id + * The index of the receive queue from which to retrieve input packets. + * @param rx_pkts + * The address of an array of pointers to *rte_mbuf* structures that + * must be large enough to store *nb_pkts* pointers in it. + * @param nb_pkts + * The maximum number of packets to retrieve. + * @return + * The number of packets actually retrieved, which is the number + * of pointers to *rte_mbuf* structures effectively supplied to the + * *rx_pkts* array. + */ +static inline uint16_t +rte_pkt_rx_burst(struct rte_pktdev *dev, struct rte_mbuf **rx_pkts, + uint16_t nb_pkts) +{ + uint16_t nb_rx; + /* special case ring, so that call gets inlined for performance */ + if (dev->type == RTE_PKT_DEV_TYPE_RING) + nb_rx = rte_ring_dequeue_burst(dev->rx_handle, (void *)rx_pkts, + nb_pkts); + else + nb_rx = (*dev->rx_pkt_burst)(dev->rx_handle, rx_pkts, nb_pkts); + dev->rx_count += nb_rx; + return nb_rx; +} + +/** + * Send a burst of output packets on a transmit queue of a device. + * + * @param dev + * The device to be given the packets. + * @param queue_id + * The index of the queue through which output packets must be sent. + * @param tx_pkts + * The address of an array of *nb_pkts* pointers to *rte_mbuf* structures + * which contain the output packets. + * @param nb_pkts + * The maximum number of packets to transmit. + * @return + * The number of output packets actually stored in transmit descriptors of + * the transmit ring. The return value can be less than the value of the + * *tx_pkts* parameter when the transmit ring is full or has been filled up. + */ +static inline uint16_t +rte_pkt_tx_burst(struct rte_pktdev *dev, struct rte_mbuf **tx_pkts, + uint16_t nb_pkts) +{ + uint16_t nb_tx; + /* special case ring, so that call gets inlined for performance */ + if (dev->type == RTE_PKT_DEV_TYPE_RING) + nb_tx = rte_ring_enqueue_burst(dev->tx_handle, (void *)tx_pkts, + nb_pkts); + else + nb_tx = (*dev->tx_pkt_burst)(dev->tx_handle, tx_pkts, nb_pkts); + dev->tx_count += nb_tx; + return nb_tx; +} + +/** + * @internal Unconditionally flush the tx buffer + */ +static inline void +_pktdev_tx_buffer_flush(struct rte_pktdev *dev) +{ + const uint16_t burst_size = dev->tx_buf_count; + const uint16_t nb_tx = rte_pkt_tx_burst(dev, dev->tx_buf, burst_size); + dev->tx_buf_count = 0; + if (likely(nb_tx == burst_size)) + return; + + dev->tx_drop_count += (burst_size - nb_tx); + if (dev->tx_error) + dev->tx_error(dev, &dev->tx_buf[nb_tx], burst_size - nb_tx, + dev->tx_error_param); + else { + uint16_t i; + for (i = nb_tx; i < burst_size; i++) + rte_pktmbuf_free(dev->tx_buf[i]); + } +} + +/** + * Flush any buffered packets on a pktdev port + * + * Transmits any currently buffered packets on the pktdev port. Does nothing if + * no packets are buffered. If not all packets are sent, + * - the dev drop-count is updated appropriately + * - any configured error callbacks are called, with the failing packets passed + * in as parameter. + * NOTE: The callback is responsible for freeing any unsent packets to ensure + * that memory is not leaked. + * - if no callback is configured, the packets are just dropped + * + * @param dev + * The pktdev which has TX buffers to be flushed + */ +static inline void +rte_pkt_tx_buffer_flush(struct rte_pktdev *dev) +{ + if (dev->tx_buf_count != 0) + _pktdev_tx_buffer_flush(dev); +} + + +/** + * Buffer a packet for future transmission + * + * This function takes a single packet and queues it up on a pktdev instance for + * sending at some point in the future. The packet is actually sent once one of + * two conditions are encountered: + * - The number of buffered packets hits TX_BUFFER_SIZE + * - The application calls "rte_pkt_dev_tx_buffer_flush" API to force a manual + * flush. + * + * @param dev + * The device on which the packet is to be sent + * @param buf + * The packet to be sent on the device. + */ +static inline void +rte_pkt_tx_buffer(struct rte_pktdev *dev, struct rte_mbuf *buf) +{ + dev->tx_buf[dev->tx_buf_count++] = buf; + + if (dev->tx_buf_count == TX_BUFFER_SIZE) + _pktdev_tx_buffer_flush(dev); +} + +/** + * Set the callback to be called when the tx buffer flushing fails + * + * @param dev + * The pktdev device + * @param fn + * Callback function to be called when the flushing of the tx buffer does not + * send all packets. The function should be of "void" type and take as + * parameters: + * - struct rte_pktdev *dev --> the dev parameter as passed in here + * - struct rte_mbuf **failed_pkts --> the array of failed packets (only) + * - uint16_t nb_failed --> the number of failed packets in the array + * - void *param --> user supplied extra parameter, as passed in here + * @param param + * The user-supplied extra parameter to be passed to the callback function + */ +void +rte_pktdev_set_tx_buf_err_callback(struct rte_pktdev *dev, + pkt_burst_error_t fn, void *param); + +/** + * Returns a printable string to use to identify a pktdev, for example, in logs. + * + * The name returned is based off the underlying object for the pktdev. For + * example, for a ring, it would be the name parameter of the ring object, for + * an ethernet port, the port number and RX/TX queue values. Example outputs: + * + * - RG_my_ring_name = a pktdev based on ring "RG_my_ring_name" + * - Port_1 = a pktdev based on port 1 + * + * @param d + * The pktdev device, for which the name is to be got + * @param buffer + * The output buffer in which the name is to be stored. + * @param buf_len + * The length of the output buffer. + * @return + * The pointer to the buffer. If buf_len > 0, the output buffer will + * always be null-terminated. + */ +const char * +rte_pktdev_get_name(const struct rte_pktdev *dev, char *buffer, unsigned buf_len); + +/** + * Returns the number of packets received by this pktdev + * + * @param dev + * The pktdev device to be queried + */ +uint64_t +rte_pktdev_get_rx_count(const struct rte_pktdev *dev); + +/** + * Returns the number of packets transmitted by this pktdev. + * + * The transmission count only includes packets actually sent, not any packets + * currently buffered for transmission + * + * @param dev + * The pktdev device to be queried + */ +uint64_t +rte_pktdev_get_tx_count(const struct rte_pktdev *dev); + +/** + * Returns the number of packets currently buffered for transmission + * + * @param dev + * The pktdev device to be queried + */ +uint64_t +rte_pktdev_get_tx_buffer_count(const struct rte_pktdev *dev); + +/** + * Returns the number of packets buffered for transmission, but for which sending + * failed. + * + * This does not count failures from calls to rte_pkt_tx_burst directly, only + * for buffered packets + * This does not count any packets currently buffered for which transmission + * has not yet been attempted. + * If a user-callback is provided for the handling of tx errors, this counter + * is updated before that callback is called, so it will always count failures + * even if the callback attempts a retransmission. + * + * @param dev + * The pktdev device to be queried + */ +uint64_t +rte_pktdev_get_tx_drop_count(const struct rte_pktdev *dev); + +/** + * Take an rte_ring and return it as a pktdev object + * + * @param r + * The rte_ring to be used. If a single-producer, single-consumer ring is used + * this parameter should be used only by a single pktdev. A multi-producer, + * multi-consumer ring can be shared between multiple pktdev instances. + * @return + * The pktdev object created. + * NULL on error, with rte_errno set appropriately: + * ENOMEM = memory allocation failed for pktdev structure + */ +struct rte_pktdev * +rte_pktdev_from_ring(struct rte_ring *r); + +/** + * Take a kni instance and return it as a pktdev object + * + * @param k + * The kni object to be used + * @return + * The pktdev object created. + * NULL on error, with rte_errno set appropriately: + * ENOMEM = memory allocation failed for pktdev structure + */ +struct rte_pktdev * +rte_pktdev_from_kni(struct rte_kni *k); + +/** + * Take an ethdev port RX and TX queue and return it as a pktdev object + * + * @param port_id + * The port number of the ethdev port to be used. + * @param rxq + * The receive queue on the ethdev port to be used for packet reception. Each + * queue should only be used by a single pktdev object, as the ethdev queues + * are not multi-thread safe. + * @param txq + * The transmit queue on the ethdev port to be used for packet transmission. Each + * queue should only be used by a single pktdev object, as the ethdev queues + * are not multi-thread safe. + * @return + * The pktdev object created + * NULL on error, with rte_errno set appropriately: + * ENOMEM = memory allocation failed for pktdev structure + */ +struct rte_pktdev * +rte_pktdev_from_ethport(uint8_t port_id, uint16_t rxq, uint16_t txq); + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_PKTDEV_H_ */ diff --git a/lib/librte_pktdev/rte_pktdev_version.map b/lib/librte_pktdev/rte_pktdev_version.map new file mode 100644 index 0000000..8c8671a --- /dev/null +++ b/lib/librte_pktdev/rte_pktdev_version.map @@ -0,0 +1,11 @@ +DPDK_2.1 { + global: + + rte_pkt_rx_burst; + rte_pkt_tx_burst; + rte_pkt_dev_from_ring; + rte_pkt_dev_from_kni; + rte_pkt_dev_from_eth_port; + + local: *; +}; diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 1a2043a..85d8732 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -112,6 +112,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING) += -lrte_ring _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL) += -lrte_eal _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE) += -lrte_cmdline _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE) += -lrte_cfgfile +_LDLIBS-$(CONFIG_RTE_LIBRTE_PKTDEV) += -lrte_pktdev _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += -lrte_pmd_bond _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += -lrte_pmd_xenvirt -- 2.4.2 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [dpdk-dev] [RFC-PATCH-v3 3/6] example app showing pktdevs used in a chain 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 1/6] kni: add function to query the name of a kni object Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 2/6] pktdev: Add pktdev implementation library Bruce Richardson @ 2015-06-10 13:07 ` Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 4/6] new pktdev l2fwd sample Bruce Richardson ` (3 subsequent siblings) 6 siblings, 0 replies; 19+ messages in thread From: Bruce Richardson @ 2015-06-10 13:07 UTC (permalink / raw) To: dev This is a trivial example showing code which is using ethdevs and rings in a neutral manner, with the same piece of pipeline code passing mbufs along a chain without ever having to query its source or destination type. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> --- examples/pktdev-chain/Makefile | 57 ++++++++++ examples/pktdev-chain/basicfwd.c | 221 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 278 insertions(+) create mode 100644 examples/pktdev-chain/Makefile create mode 100644 examples/pktdev-chain/basicfwd.c diff --git a/examples/pktdev-chain/Makefile b/examples/pktdev-chain/Makefile new file mode 100644 index 0000000..4a5d99f --- /dev/null +++ b/examples/pktdev-chain/Makefile @@ -0,0 +1,57 @@ +# BSD LICENSE +# +# Copyright(c) 2015 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overridden by command line or environment +RTE_TARGET ?= x86_64-native-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +# binary name +APP = basicfwd + +# all source are stored in SRCS-y +SRCS-y := basicfwd.c + +CFLAGS += $(WERROR_FLAGS) + +# workaround for a gcc bug with noreturn attribute +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) +CFLAGS_main.o += -Wno-return-type +endif + +EXTRA_CFLAGS += -O3 -g -Wfatal-errors + +include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/pktdev-chain/basicfwd.c b/examples/pktdev-chain/basicfwd.c new file mode 100644 index 0000000..be2b3f1 --- /dev/null +++ b/examples/pktdev-chain/basicfwd.c @@ -0,0 +1,221 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <stdint.h> +#include <inttypes.h> +#include <rte_eal.h> +#include <rte_ethdev.h> +#include <rte_cycles.h> +#include <rte_lcore.h> +#include <rte_mbuf.h> +#include <rte_pktdev.h> + +#define RX_RING_SIZE 128 +#define TX_RING_SIZE 512 + +#define NUM_MBUFS 8191 +#define MBUF_SIZE (1600 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM) +#define MBUF_CACHE_SIZE 250 +#define BURST_SIZE 32 + +static const struct rte_eth_conf port_conf_default = { + .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN } +}; + +/* basicfwd.c: Basic DPDK skeleton forwarding example. */ + +/* + * Initializes a given port using global settings and with the RX buffers + * coming from the mbuf_pool passed as a parameter. + */ +static inline int +port_init(uint8_t port, struct rte_mempool *mbuf_pool) +{ + struct rte_eth_conf port_conf = port_conf_default; + const uint16_t rx_rings = 1, tx_rings = 1; + int retval; + uint16_t q; + + if (port >= rte_eth_dev_count()) + return -1; + + /* Configure the Ethernet device. */ + retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); + if (retval != 0) + return retval; + + /* Allocate and set up 1 RX queue per Ethernet port. */ + for (q = 0; q < rx_rings; q++) { + retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL, mbuf_pool); + if (retval < 0) + return retval; + } + + /* Allocate and set up 1 TX queue per Ethernet port. */ + for (q = 0; q < tx_rings; q++) { + retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL); + if (retval < 0) + return retval; + } + + /* Start the Ethernet port. */ + retval = rte_eth_dev_start(port); + if (retval < 0) + return retval; + + /* Display the port MAC address. */ + struct ether_addr addr; + rte_eth_macaddr_get(port, &addr); + printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 + " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n", + (unsigned)port, + addr.addr_bytes[0], addr.addr_bytes[1], + addr.addr_bytes[2], addr.addr_bytes[3], + addr.addr_bytes[4], addr.addr_bytes[5]); + + /* Enable RX in promiscuous mode for the Ethernet device. */ + rte_eth_promiscuous_enable(port); + + return 0; +} + +struct pipeline_params { + struct rte_pktdev *src; + struct rte_pktdev *dst; +}; + +/* + * The lcore main. This is the main thread that does the work, reading from + * an input port and writing to an output port. + */ +static __attribute__((noreturn)) void +do_work(const struct pipeline_params *p) +{ + printf("\nCore %u forwarding packets. %p -> %p\n", + rte_lcore_id(), + p->src, + p->dst); + + /* Run until the application is quit or killed. */ + for (;;) { + /* + * Receive packets on a src device and forward them on out + * the dst device. + */ + /* Get burst of RX packets, from first port of pair. */ + struct rte_mbuf *bufs[BURST_SIZE]; + const uint16_t nb_rx = rte_pkt_rx_burst(p->src, bufs, BURST_SIZE); + + if (unlikely(nb_rx == 0)) + continue; + + /* Send burst of TX packets, to second port of pair. */ + const uint16_t nb_tx = rte_pkt_tx_burst(p->dst, bufs, nb_rx); + + /* Free any unsent packets. */ + if (unlikely(nb_tx < nb_rx)) { + uint16_t buf; + for (buf = nb_tx; buf < nb_rx; buf++) + rte_pktmbuf_free(bufs[buf]); + } + } +} + +/* + * The main function, which does initialization and calls the per-lcore + * functions. + */ +int +main(int argc, char *argv[]) +{ + struct pipeline_params p[RTE_MAX_LCORE]; + struct rte_mempool *mbuf_pool; + unsigned nb_ports, lcore_id; + uint8_t portid; + + /* Initialize the Environment Abstraction Layer (EAL). */ + int ret = rte_eal_init(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Error with EAL initialization\n"); + + argc -= ret; + argv += ret; + + /* Check that there is an even number of ports to send/receive on. */ + nb_ports = rte_eth_dev_count(); + if (nb_ports < 2 || (nb_ports & 1)) + rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n"); + + /* Creates a new mempool in memory to hold the mbufs. */ + mbuf_pool = rte_mempool_create("MBUF_POOL", + NUM_MBUFS * nb_ports, + MBUF_SIZE, + MBUF_CACHE_SIZE, + sizeof(struct rte_pktmbuf_pool_private), + rte_pktmbuf_pool_init, NULL, + rte_pktmbuf_init, NULL, + rte_socket_id(), + 0); + + if (mbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); + + /* Initialize all ports. */ + for (portid = 0; portid < nb_ports; portid++) + if (port_init(portid, mbuf_pool) != 0) + rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", + portid); + + struct rte_pktdev *in = rte_pktdev_from_ethport(0, 0, 0); + RTE_LCORE_FOREACH_SLAVE(lcore_id){ + char name[RTE_RING_NAMESIZE]; + snprintf(name, sizeof(name), "RING_from_%u", lcore_id); + struct rte_pktdev *out = rte_pktdev_from_ring( + rte_ring_create(name, 4096, rte_socket_id(), 0)); + + p[lcore_id].src = in; + p[lcore_id].dst = out; + rte_eal_remote_launch((lcore_function_t *)do_work, + &p[lcore_id], lcore_id); + in = out; // next pipeline stage reads from my output. + } + //now finish pipeline on master lcore + lcore_id = rte_lcore_id(); + p[lcore_id].src = in; + p[lcore_id].dst = rte_pktdev_from_ethport(1, 0, 0); + do_work(&p[lcore_id]); + + return 0; +} -- 2.4.2 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [dpdk-dev] [RFC-PATCH-v3 4/6] new pktdev l2fwd sample 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Bruce Richardson ` (2 preceding siblings ...) 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 3/6] example app showing pktdevs used in a chain Bruce Richardson @ 2015-06-10 13:07 ` Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 5/6] pktdev: adding app test Bruce Richardson ` (2 subsequent siblings) 6 siblings, 0 replies; 19+ messages in thread From: Bruce Richardson @ 2015-06-10 13:07 UTC (permalink / raw) To: dev This patch takes the existing l2fwd example application and modifies it to use pktdevs instead of ethdevs directly. The code is shorter, as we are able to take advantage of the TX buffering provided by pktdev. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> --- examples/pktdev-l2fwd/Makefile | 50 ++++ examples/pktdev-l2fwd/main.c | 530 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 580 insertions(+) create mode 100644 examples/pktdev-l2fwd/Makefile create mode 100644 examples/pktdev-l2fwd/main.c diff --git a/examples/pktdev-l2fwd/Makefile b/examples/pktdev-l2fwd/Makefile new file mode 100644 index 0000000..78feeeb --- /dev/null +++ b/examples/pktdev-l2fwd/Makefile @@ -0,0 +1,50 @@ +# BSD LICENSE +# +# Copyright(c) 2015 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overriden by command line or environment +RTE_TARGET ?= x86_64-native-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +# binary name +APP = l2fwd + +# all source are stored in SRCS-y +SRCS-y := main.c + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) + +include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/pktdev-l2fwd/main.c b/examples/pktdev-l2fwd/main.c new file mode 100644 index 0000000..851922f --- /dev/null +++ b/examples/pktdev-l2fwd/main.c @@ -0,0 +1,530 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <stdint.h> +#include <getopt.h> + +#include <rte_ether.h> +#include <rte_ethdev.h> +#include <rte_cycles.h> +#include <rte_lcore.h> +#include <rte_pktdev.h> + +#define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1 + +#define NB_MBUF 8192 +#define MAX_PKT_BURST 32 + +/* + * Configurable number of RX/TX ring descriptors + */ +#define RTE_TEST_RX_DESC_DEFAULT 128 +#define RTE_TEST_TX_DESC_DEFAULT 512 +static uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT; +static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT; + +/* ethernet addresses of ports */ +static struct ether_addr l2fwd_ports_eth_addr[RTE_MAX_ETHPORTS]; + +/* mask of enabled ports */ +static uint32_t l2fwd_enabled_port_mask = 0; + +/* The actual port devices to be used */ +static struct rte_pktdev *devs[RTE_MAX_ETHPORTS]; +static uint8_t nb_devs; + +static unsigned int l2fwd_rx_queue_per_lcore = 1; + +static const struct rte_eth_conf port_conf = { + .txmode = { + .mq_mode = ETH_MQ_TX_NONE, + }, +}; + +struct rte_mempool * l2fwd_pktmbuf_pool = NULL; + +#define MAX_RX_QUEUE_PER_LCORE 16 + +struct lcore_queue_conf { + unsigned n_rx_port; + unsigned inputs[MAX_RX_QUEUE_PER_LCORE]; + unsigned outputs[MAX_RX_QUEUE_PER_LCORE]; +} __rte_cache_aligned; +struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE]; + +/* A tsc-based timer responsible for triggering statistics printout */ +#define MAX_TIMER_PERIOD 86400 /* 1 day max */ +static int64_t timer_period = 10; /* default of 10 seconds */ + +/* Print out statistics on packets dropped */ +static void +print_stats(void) +{ + uint64_t total_packets_dropped, total_packets_tx, total_packets_rx; + unsigned portid; + + total_packets_dropped = 0; + total_packets_tx = 0; + total_packets_rx = 0; + + const char clr[] = { 27, '[', '2', 'J', '\0' }; + const char topLeft[] = { 27, '[', '1', ';', '1', 'H','\0' }; + + /* Clear screen and move to top left */ + printf("%s%s", clr, topLeft); + + printf("\nPort statistics ===================================="); + + for (portid = 0; portid < nb_devs; portid++) { + char name[64]; + struct rte_pktdev *dev = devs[portid]; + printf("\nStatistics for %s ------------------------------" + "\nPackets sent: %24"PRIu64 + "\nPackets received: %20"PRIu64 + "\nPackets dropped: %21"PRIu64, + rte_pktdev_get_name(dev, name, sizeof(name)), + rte_pktdev_get_tx_count(dev), + rte_pktdev_get_rx_count(dev), + rte_pktdev_get_tx_drop_count(dev)); + + total_packets_dropped += rte_pktdev_get_tx_drop_count(dev); + total_packets_tx += rte_pktdev_get_tx_count(dev); + total_packets_rx += rte_pktdev_get_rx_count(dev); + } + printf("\nAggregate statistics ===============================" + "\nTotal packets sent: %18"PRIu64 + "\nTotal packets received: %14"PRIu64 + "\nTotal packets dropped: %15"PRIu64, + total_packets_tx, + total_packets_rx, + total_packets_dropped); + printf("\n====================================================\n"); +} + +static void +l2fwd_simple_forward(struct rte_mbuf *m, unsigned dst_port) +{ + struct ether_hdr *eth; + void *tmp; + + eth = rte_pktmbuf_mtod(m, struct ether_hdr *); + + /* 02:00:00:00:00:xx */ + tmp = ð->d_addr.addr_bytes[0]; + *((uint64_t *)tmp) = 0x000000000002 + ((uint64_t)dst_port << 40); + + /* src addr */ + ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], ð->s_addr); + + rte_pkt_tx_buffer(devs[dst_port], m); +} + +/* main processing loop */ +static void +l2fwd_main_loop(void) +{ + unsigned i, j; + uint64_t cur_tsc, prev_tsc = 0; + const unsigned lcore_id = rte_lcore_id(); + const struct lcore_queue_conf *qconf = &lcore_queue_conf[lcore_id]; + const int do_print = (lcore_id == rte_get_master_lcore() && + timer_period > 0); + + if (qconf->n_rx_port == 0) { + RTE_LOG(INFO, L2FWD, "lcore %u has nothing to do\n", lcore_id); + return; + } + + RTE_LOG(INFO, L2FWD, "entering main loop on lcore %u\n", lcore_id); + + for (i = 0; i < qconf->n_rx_port; i++) { + char buffer[64]; + rte_pktdev_get_name(devs[qconf->inputs[i]], + buffer, sizeof(buffer)); + RTE_LOG(INFO, L2FWD, " -- lcoreid=%u %s\n", lcore_id, + buffer); + } + + for (;;) { + if (do_print) { + cur_tsc = rte_rdtsc(); + if ((cur_tsc - prev_tsc) >= (uint64_t)timer_period) { + print_stats(); + prev_tsc = cur_tsc; + } + } + for (i = 0; i < qconf->n_rx_port; i++) { + struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; + const unsigned portid = qconf->inputs[i]; + const uint16_t nb_rx = rte_pkt_rx_burst(devs[portid], + pkts_burst, MAX_PKT_BURST); + + if (nb_rx == 0) { + rte_pkt_tx_buffer_flush(devs[portid]); + continue; + } + + for (j = 0; j < nb_rx; j++) { + struct rte_mbuf *m = pkts_burst[j]; + rte_prefetch0(rte_pktmbuf_mtod(m, void *)); + l2fwd_simple_forward(m, qconf->outputs[i]); + } + } + } +} + +static int +l2fwd_launch_one_lcore(__attribute__((unused)) void *dummy) +{ + l2fwd_main_loop(); + return 0; +} + +/* display usage */ +static void +l2fwd_usage(const char *prgname) +{ + printf("%s [EAL options] -- -p PORTMASK [-q NQ]\n" + " -p PORTMASK: hexadecimal bitmask of ports to configure\n" + " -q NQ: number of queue (=ports) per lcore (default is 1)\n" + " -T PERIOD: statistics will be refreshed each PERIOD seconds (0 to disable, 10 default, 86400 maximum)\n", + prgname); +} + +static int +l2fwd_parse_portmask(const char *portmask) +{ + char *end = NULL; + unsigned long pm; + + /* parse hexadecimal string */ + pm = strtoul(portmask, &end, 16); + if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0')) + return -1; + + if (pm == 0) + return -1; + + return pm; +} + +static unsigned int +l2fwd_parse_nqueue(const char *q_arg) +{ + char *end = NULL; + unsigned long n; + + /* parse hexadecimal string */ + n = strtoul(q_arg, &end, 10); + if ((q_arg[0] == '\0') || (end == NULL) || (*end != '\0')) + return 0; + if (n == 0) + return 0; + if (n >= MAX_RX_QUEUE_PER_LCORE) + return 0; + + return n; +} + +static int +l2fwd_parse_timer_period(const char *q_arg) +{ + char *end = NULL; + int n; + + /* parse number string */ + n = strtol(q_arg, &end, 10); + if ((q_arg[0] == '\0') || (end == NULL) || (*end != '\0')) + return -1; + if (n >= MAX_TIMER_PERIOD) + return -1; + + return n; +} + +/* Parse the argument given in the command line of the application */ +static int +l2fwd_parse_args(int argc, char **argv) +{ + int opt, ret; + char **argvopt; + int option_index; + char *prgname = argv[0]; + static struct option lgopts[] = { + {NULL, 0, 0, 0} + }; + + argvopt = argv; + + while ((opt = getopt_long(argc, argvopt, "p:q:T:", + lgopts, &option_index)) != EOF) { + + switch (opt) { + /* portmask */ + case 'p': + l2fwd_enabled_port_mask = l2fwd_parse_portmask(optarg); + if (l2fwd_enabled_port_mask == 0) { + printf("invalid portmask\n"); + l2fwd_usage(prgname); + return -1; + } + break; + + /* nqueue */ + case 'q': + l2fwd_rx_queue_per_lcore = l2fwd_parse_nqueue(optarg); + if (l2fwd_rx_queue_per_lcore == 0) { + printf("invalid queue number\n"); + l2fwd_usage(prgname); + return -1; + } + break; + + /* timer period */ + case 'T': + timer_period = l2fwd_parse_timer_period(optarg); + if (timer_period < 0) { + printf("invalid timer period\n"); + l2fwd_usage(prgname); + return -1; + } + break; + + /* long options */ + case 0: + l2fwd_usage(prgname); + return -1; + + default: + l2fwd_usage(prgname); + return -1; + } + } + + if (optind >= 0) + argv[optind-1] = prgname; + + ret = optind-1; + optind = 0; /* reset getopt lib */ + return ret; +} + +/* Check the link status of all ports in up to 9s, and print them finally */ +static void +check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) +{ +#define CHECK_INTERVAL 100 /* 100ms */ +#define MAX_CHECK_TIME 90 /* 9s (90 * 100ms) in total */ + uint8_t portid, count, all_ports_up, print_flag = 0; + struct rte_eth_link link; + + printf("\nChecking link status"); + fflush(stdout); + for (count = 0; count <= MAX_CHECK_TIME; count++) { + all_ports_up = 1; + for (portid = 0; portid < port_num; portid++) { + if ((port_mask & (1 << portid)) == 0) + continue; + memset(&link, 0, sizeof(link)); + rte_eth_link_get_nowait(portid, &link); + /* print link status if flag set */ + if (print_flag == 1) { + if (link.link_status) + printf("Port %d Link Up - speed %u " + "Mbps - %s\n", (uint8_t)portid, + (unsigned)link.link_speed, + (link.link_duplex == ETH_LINK_FULL_DUPLEX) ? + ("full-duplex") : ("half-duplex\n")); + else + printf("Port %d Link Down\n", + (uint8_t)portid); + continue; + } + /* clear all_ports_up flag if any link down */ + if (link.link_status == 0) { + all_ports_up = 0; + break; + } + } + /* after finally printing all link status, get out */ + if (print_flag == 1) + break; + + if (all_ports_up == 0) { + printf("."); + fflush(stdout); + rte_delay_ms(CHECK_INTERVAL); + } + + /* set the print_flag if all ports up or timeout */ + if (all_ports_up == 1 || count == (MAX_CHECK_TIME - 1)) { + print_flag = 1; + printf("done\n"); + } + } +} + +int +main(int argc, char **argv) +{ + int ret; + uint8_t nb_ports; + uint8_t portid; + unsigned lcore_id; + unsigned rx_lcore_id = 0; + struct lcore_queue_conf *qconf = NULL; + + /* init EAL */ + ret = rte_eal_init(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n"); + argc -= ret; + argv += ret; + + /* parse application arguments (after the EAL ones) */ + ret = l2fwd_parse_args(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Invalid L2FWD arguments\n"); + + /* create the mbuf pool */ + l2fwd_pktmbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", NB_MBUF, 32, + 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); + if (l2fwd_pktmbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n"); + + nb_ports = rte_eth_dev_count(); + if (nb_ports == 0) + rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n"); + + if (nb_ports > RTE_MAX_ETHPORTS) + nb_ports = RTE_MAX_ETHPORTS; + + /* Initialise each port */ + for (portid = 0; portid < nb_ports; portid++) { + /* skip ports that are not enabled */ + if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) { + printf("Skipping disabled port %u\n", (unsigned) portid); + continue; + } + + /* init port */ + printf("Initializing port %u... ", (unsigned) portid); + fflush(stdout); + ret = rte_eth_dev_configure(portid, 1, 1, &port_conf); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Cannot configure device: err=%d, port=%u\n", + ret, (unsigned) portid); + + rte_eth_macaddr_get(portid,&l2fwd_ports_eth_addr[portid]); + + /* init one RX queue */ + fflush(stdout); + ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd, + rte_eth_dev_socket_id(portid), + NULL, + l2fwd_pktmbuf_pool); + if (ret < 0) + rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d, port=%u\n", + ret, (unsigned) portid); + + /* init one TX queue on each port */ + fflush(stdout); + ret = rte_eth_tx_queue_setup(portid, 0, nb_txd, + rte_eth_dev_socket_id(portid), + NULL); + if (ret < 0) + rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n", + ret, (unsigned) portid); + + /* Start device */ + ret = rte_eth_dev_start(portid); + if (ret < 0) + rte_exit(EXIT_FAILURE, "rte_eth_dev_start:err=%d, port=%u\n", + ret, (unsigned) portid); + + printf("done: \n"); + + rte_eth_promiscuous_enable(portid); + + printf("Port %u, MAC address: %02X:%02X:%02X:%02X:%02X:%02X\n\n", + (unsigned) portid, + l2fwd_ports_eth_addr[portid].addr_bytes[0], + l2fwd_ports_eth_addr[portid].addr_bytes[1], + l2fwd_ports_eth_addr[portid].addr_bytes[2], + l2fwd_ports_eth_addr[portid].addr_bytes[3], + l2fwd_ports_eth_addr[portid].addr_bytes[4], + l2fwd_ports_eth_addr[portid].addr_bytes[5]); + + devs[nb_devs] = rte_pktdev_from_ethport(portid, 0, 0); + if (devs[nb_devs] != NULL) + nb_devs++; + } + + if (!nb_devs) { + rte_exit(EXIT_FAILURE, + "All available ports are disabled. Please set portmask.\n"); + } + + check_all_ports_link_status(nb_ports, l2fwd_enabled_port_mask); + + /* convert seconds to ticks */ + timer_period *= rte_get_tsc_hz(); + + /* Initialize the port/queue configuration of each logical core */ + for (portid = 0; portid < nb_devs; portid++) { + /* get the lcore_id for this port */ + while (rte_lcore_is_enabled(rx_lcore_id) == 0 || + lcore_queue_conf[rx_lcore_id].n_rx_port == + l2fwd_rx_queue_per_lcore) { + rx_lcore_id++; + if (rx_lcore_id >= RTE_MAX_LCORE) + rte_exit(EXIT_FAILURE, "Not enough cores\n"); + } + qconf = &lcore_queue_conf[rx_lcore_id]; + qconf->inputs[qconf->n_rx_port] = portid; + qconf->outputs[qconf->n_rx_port] = portid ^ 1; + qconf->n_rx_port++; + } + if (nb_devs % 2 == 1) { + printf("Notice: odd number of ports in portmask.\n"); + qconf->outputs[qconf->n_rx_port - 1] = portid; + } + + /* launch per-lcore init on every lcore */ + rte_eal_mp_remote_launch(l2fwd_launch_one_lcore, NULL, CALL_MASTER); + RTE_LCORE_FOREACH_SLAVE(lcore_id) { + if (rte_eal_wait_lcore(lcore_id) < 0) + return -1; + } + + return 0;} -- 2.4.2 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [dpdk-dev] [RFC-PATCH-v3 5/6] pktdev: adding app test 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Bruce Richardson ` (3 preceding siblings ...) 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 4/6] new pktdev l2fwd sample Bruce Richardson @ 2015-06-10 13:07 ` Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 6/6] test: add pktdev performance tests Bruce Richardson 2015-06-10 13:26 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Thomas Monjalon 6 siblings, 0 replies; 19+ messages in thread From: Bruce Richardson @ 2015-06-10 13:07 UTC (permalink / raw) To: dev From: Marc Sune <marc.sune@bisdn.de> Add a basic test for pktdev non-buffered API for pktdev types: * ethdev * ring * kni Signed-off-by: Marc Sune <marc.sune@bisdn.de> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> --- app/test/Makefile | 4 + app/test/test_pktdev.c | 440 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 444 insertions(+) create mode 100644 app/test/test_pktdev.c diff --git a/app/test/Makefile b/app/test/Makefile index 3c777bf..77e48c1 100644 --- a/app/test/Makefile +++ b/app/test/Makefile @@ -58,6 +58,10 @@ SRCS-y += test_ring.c SRCS-y += test_ring_perf.c SRCS-y += test_pmd_perf.c +ifeq ($(CONFIG_RTE_LIBRTE_PKTDEV),y) +SRCS-y += test_pktdev.c +endif + ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y) SRCS-y += test_table.c SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += test_table_pipeline.c diff --git a/app/test/test_pktdev.c b/app/test/test_pktdev.c new file mode 100644 index 0000000..e24fd78 --- /dev/null +++ b/app/test/test_pktdev.c @@ -0,0 +1,440 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + + +#include <stdio.h> +#include <stdbool.h> +#include <inttypes.h> +#include <signal.h> +#include <unistd.h> +#include <rte_cycles.h> +#include <rte_errno.h> +#include <rte_ethdev.h> +#include <rte_ether.h> +#include <rte_kni.h> +#include <rte_byteorder.h> +#include <rte_atomic.h> +#include <rte_malloc.h> +#include <rte_pktdev.h> + +#include "packet_burst_generator.h" +#include "test.h" + +/* General test constants */ +#define MAX_PKTDEVS 4 +#define NB_MBUF 8192 +#define SOCKET 0 +#define MAX_PACKET_SZ 2048 +#define MBUF_DATA_SZ (MAX_PACKET_SZ + RTE_PKTMBUF_HEADROOM) +#define PKT_BURST_SZ 32 +#define MEMPOOL_CACHE_SZ PKT_BURST_SZ +#define RING_SIZE 128 + +#define RING_NAME "test-pktdev" +#define KNI_NAME "kni-pktdev" +#define MEMPOOL_NAME "pkdev-mempool" + +/* Shared mempool */ +struct rte_mempool* mp; + +/* We use port_id 0 */ +static uint8_t port_id = 0; + +/* Device specific contexts*/ +static struct rte_ring* r = NULL; +#ifdef RTE_LIBRTE_KNI +static struct rte_kni* kni = NULL; +#endif + +/* pktdev handles */ +static struct rte_pktdev* ring_dev = NULL; +#ifdef RTE_LIBRTE_KNI +static struct rte_pktdev* kni_dev = NULL; +#endif +static struct rte_pktdev* eth_dev = NULL; + +static int +setup_mempool(void) +{ + mp = rte_mempool_lookup(MEMPOOL_NAME); + if (!mp) + mp = rte_pktmbuf_pool_create(MEMPOOL_NAME, + NB_MBUF, + MEMPOOL_CACHE_SZ, 0, MBUF_DATA_SZ, + SOCKET); + + if (!mp){ + printf( "Could not create mempool!\n"); + return -1; + } + + return 0; +} + +static int +setup_ring(void) +{ + r = rte_ring_lookup(RING_NAME); + + if (r == NULL) + r = rte_ring_create(RING_NAME, RING_SIZE, SOCKET_ID_ANY, 0); + + if (r == NULL) { + printf( "ERROR: unable to create rte_ring '" RING_NAME "' required for the pktdev device!\n"); + return -1; + } + + /* Check NULL ring */ + if (rte_pktdev_from_ring(NULL) != NULL) { + printf( "ERROR: invalid behaviour of rte_pktdev_from_ring() for NULL rings!\n"); + return -1; + } + + /* Create the pktdev device */ + ring_dev = rte_pktdev_from_ring(r); + + if (ring_dev == NULL) { + printf( "ERROR: could not create pktdev from rte_ring '" RING_NAME "'!\n"); + return -1; + } + + return 0; +} + +#ifdef RTE_LIBRTE_KNI +static int +setup_kni(void) +{ + struct rte_kni_conf conf; + struct rte_kni_ops ops; + + memset(&conf, 0, sizeof(conf)); + memset(&ops, 0, sizeof(ops)); + sprintf(conf.name,"%s", KNI_NAME); + conf.mbuf_size = MAX_PACKET_SZ; + + /* Initialize KNI subsystem */ + rte_kni_init(1); + + /* Allocate KNI interface */ + kni = rte_kni_get(KNI_NAME); + + if (kni == NULL) + kni = rte_kni_alloc(mp, &conf, &ops); + + if (kni == NULL) { + printf( "ERROR: could not allocate KNI interface '" KNI_NAME "'!\n"); + return -1; + } + + /* Check NULL CTX */ + if (rte_pktdev_from_kni(NULL) != NULL) { + printf( "ERROR: invalid behaviour of rte_pktdev_from_kni() for NULL KNI context!\n"); + return -1; + } + + kni_dev = rte_pktdev_from_kni(kni); + + if (kni_dev == NULL) { + printf( "ERROR: could not create pktdev from KNI interface '" KNI_NAME "'\n"); + return -1; + } + + return 0; +} +#endif /* RTE_LIBRTE_KNI */ + + +static struct rte_eth_conf port_conf = { + .rxmode = { + .mq_mode = ETH_MQ_RX_NONE, + .max_rx_pkt_len = MAX_PACKET_SZ, + .split_hdr_size = 0, + .header_split = 0, /**< Header Split disabled */ + .hw_ip_checksum = 0, /**< IP checksum offload enabled */ + .hw_vlan_filter = 0, /**< VLAN filtering disabled */ + .hw_vlan_strip = 0, /**< VLAN strip enabled. */ + .hw_vlan_extend = 0, /**< Extended VLAN disabled. */ + .jumbo_frame = 0, /**< Jumbo Frame Support disabled */ + .hw_strip_crc = 0, /**< CRC stripped by hardware */ + .enable_scatter = 0, /**< scatter rx disabled */ + }, + .txmode = { + .mq_mode = ETH_MQ_TX_NONE, + }, + .lpbk_mode = 1, /* enable loopback */ +}; + +static int +setup_ethdev(void) +{ + uint16_t nb_rx_queue = 1, nb_tx_queue = 1; + int ret; + + if (rte_eth_dev_count() == 0 ) { + printf( "ERROR: pktdev test requires at least one ethdev!\n"); + return -1; + } + + /* Check NULL CTX */ + if (rte_pktdev_from_ethport(255, 0, 0) != NULL) { + printf( "ERROR: invalid behaviour of rte_pktdev_from_ethport() for invalid port id!\n"); + return -1; + } + + /* port configure */ + ret = rte_eth_dev_configure(port_id, nb_rx_queue, nb_tx_queue, + &port_conf); + if (ret < 0) { + printf( "ERROR: unable to configure port 0 '%s'.\n", + rte_strerror(ret)); + return -1; + } + + /* tx queue setup */ + ret = rte_eth_tx_queue_setup(port_id, 0, PKT_BURST_SZ, 0, NULL); + if (ret < 0) { + printf( "ERROR: unable to setup TX queue '%s'.\n", + rte_strerror(ret)); + return -1; + } + + /* rx queue steup */ + ret = rte_eth_rx_queue_setup(port_id, 0, PKT_BURST_SZ, 0, NULL, mp); + if (ret < 0) { + printf( "ERROR: unable to setup RX queue '%s'.\n", + rte_strerror(ret)); + return -1; + } + + /* Start device */ + ret = rte_eth_dev_start(port_id); + if (ret < 0) { + printf( "ERROR: unable to start eth_dev '%s'.\n", + rte_strerror(ret)); + return -1; + } + + /* Enable promiscuous */ + rte_eth_promiscuous_enable(port_id); + + /* Create eth_dev */ + eth_dev = rte_pktdev_from_ethport(0, 0, 0); + if (eth_dev == NULL) { + printf( "ERROR: could not create pktdev from ethdev'\n"); + return -1; + } + + return 0; +} + +static void +stop_ethdev(void) +{ + rte_eth_dev_stop(port_id); +} + +/* I/O loop stop flag */ +static bool keep_running = true; + +/* Intialize as an ARP pkt */ +static void +init_pkt(struct rte_mbuf *mbuf) +{ + struct ether_hdr *eth; + + /* Set length */ + rte_pktmbuf_reset(mbuf); + rte_ctrlmbuf_len(mbuf) = ETHER_MIN_LEN; + eth = (struct ether_hdr*)rte_ctrlmbuf_data(mbuf); + + /* Set mac addresses & ethtype */ + memset(eth, 0, sizeof(*eth)); + eth->d_addr.addr_bytes[0] = 0x2; + eth->d_addr.addr_bytes[0] = 0x1; + eth->ether_type = ETHER_TYPE_ARP; +} + +/* + * Injects a pkt through the rte_ring pkt dev, and the packet + * has to travel back and forth through the chain (eth_dev in loopback) + */ +static int +io_loop(void* not_used) +{ + (void)not_used; + unsigned int len; + struct rte_mbuf *burst[PKT_BURST_SZ]; + struct rte_mbuf *mbuf = NULL; + + /* Get an mbuf */ + rte_mempool_get(mp, (void**)&mbuf); + if (mbuf == NULL) { + printf("Unable to allocate an mbuf\n"); + return -1; + } + + /* Prepare packet */ + init_pkt(mbuf); + + /* Test ring dev */ + printf("Testing ring pktdev...\n"); + burst[0] = mbuf; + len = rte_pkt_tx_burst(ring_dev, burst, 1); + + if (len != 1) { + printf("TX through ring pktdev failed (len:%u)\n", len); + return -1; + } + if (ring_dev->tx_count != 1) { + printf("TX through ring pktdev stats check failed (len:%u)\n", len); + return -1; + } + + len = rte_pkt_rx_burst(ring_dev, burst, PKT_BURST_SZ); + if (len != 1) { + printf("RX through ring pktdev failed (len:%u), ring count: %u\n", + len, + rte_ring_count(r)); + return -1; + } + if (ring_dev->rx_count != 1) { + printf("TX through ring pktdev stats check failed (len:%u)\n", len); + return -1; + } + + + +#ifdef RTE_LIBRTE_KNI + /* Test KNI dev. TODO: check stats and RX */ + printf("Testing KNI pktdev...\n"); + rte_mempool_get(mp, (void**)&mbuf); + if (mbuf == NULL) { + printf("Unable to allocate an mbuf (len:%u)\n", len); + return -1; + } + + init_pkt(mbuf); + burst[0] = mbuf; + + len = rte_pkt_tx_burst(kni_dev, burst, 1); + if (len != 1) { + printf("TX through kni pktdev failed (len:%u)\n", len); + return -1; + } + if (kni_dev->tx_count != 1) { + printf("TX through KNI pktdev stats check failed (len:%u)\n", len); + return -1; + } +#endif /* RTE_LIBRTE_KNI */ + + /* Test eth dev. TODO: check RX */ + printf("Testing ethdev pktdev...\n"); + rte_mempool_get(mp, (void**)&mbuf); + if (mbuf == NULL) { + printf("Unable to allocate an mbuf (len:%u)\n", len); + return -1; + } + + init_pkt(mbuf); + burst[0] = mbuf; + + len = rte_pkt_tx_burst(eth_dev, burst, 1); + if (len != 1) { + printf("TX through eth pktdev failed\n"); + return -1; + } + if (eth_dev->tx_count != 1) { + printf("TX through eth pktdev stats check failed (len:%u)\n", len); + return -1; + } + + return 0; +} + +static int +test_pktdev(void) +{ + unsigned int lcore_id; + + /* Get lcore id */ + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + if (lcore_id == rte_get_master_lcore()) + continue; + if ( rte_lcore_is_enabled(lcore_id) == 1) + break; + } + + if (lcore_id == RTE_MAX_LCORE) { + printf("No available lcores to run I/O loop. This test requires at least 2 lcores\n"); + return -1; + } + + printf("Initializing devices...\n"); + + if(setup_mempool() < 0) + return -1; + if(setup_ring() < 0) + return -1; +#ifdef RTE_LIBRTE_KNI + if(setup_kni() < 0) + return -1; +#endif /* RTE_LIBRTE_KNI */ + + if(setup_ethdev() < 0) + return -1; + + printf("Launching I/O core and testing devs...\n"); + rte_eal_remote_launch(io_loop, NULL, lcore_id); + + /* Wait */ + sleep(1); + + /* Stop lcore */ + keep_running = false; + rte_eal_wait_lcore(lcore_id); + + printf("Cleaning the house...\n"); + + stop_ethdev(); + + return 0; +} + + +static struct test_command pktdev_cmd = { + .command = "pktdev_autotest", + .callback = test_pktdev, +}; + +REGISTER_TEST_COMMAND(pktdev_cmd); -- 2.4.2 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [dpdk-dev] [RFC-PATCH-v3 6/6] test: add pktdev performance tests 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Bruce Richardson ` (4 preceding siblings ...) 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 5/6] pktdev: adding app test Bruce Richardson @ 2015-06-10 13:07 ` Bruce Richardson 2015-06-10 13:26 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Thomas Monjalon 6 siblings, 0 replies; 19+ messages in thread From: Bruce Richardson @ 2015-06-10 13:07 UTC (permalink / raw) To: dev Add in some performance testing for the pktdev library. Looking at cycles count for a ring-based implementation, based off the ring performance tests. Compares ring performance: * native ring calls * calls through pktdev to the ring * calls through ring pmd wrapper to the ring * calls through pktdev to the pmd wrapper to the ring. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> --- app/test/Makefile | 4 +- app/test/test_pktdev_perf.c | 260 +++++++++++++++++++++++++++++++++++++++++ lib/librte_pktdev/rte_pktdev.h | 8 +- 3 files changed, 265 insertions(+), 7 deletions(-) create mode 100644 app/test/test_pktdev_perf.c diff --git a/app/test/Makefile b/app/test/Makefile index 77e48c1..8697893 100644 --- a/app/test/Makefile +++ b/app/test/Makefile @@ -58,9 +58,7 @@ SRCS-y += test_ring.c SRCS-y += test_ring_perf.c SRCS-y += test_pmd_perf.c -ifeq ($(CONFIG_RTE_LIBRTE_PKTDEV),y) -SRCS-y += test_pktdev.c -endif +SRCS-$(CONFIG_RTE_LIBRTE_PKTDEV) += test_pktdev.c test_pktdev_perf.c ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y) SRCS-y += test_table.c diff --git a/app/test/test_pktdev_perf.c b/app/test/test_pktdev_perf.c new file mode 100644 index 0000000..6a94e4d --- /dev/null +++ b/app/test/test_pktdev_perf.c @@ -0,0 +1,260 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + + +#include <stdio.h> +#include <inttypes.h> +#include <rte_ring.h> +#include <rte_cycles.h> +#include <rte_launch.h> +#include <rte_pktdev.h> +#include <rte_ethdev.h> +#include <rte_eth_ring.h> + +#include "test.h" + +/* + * Ring + * ==== + * + * Measures performance of various operations using rdtsc + * * Empty ring dequeue + * * Enqueue/dequeue of bursts in 1 threads + * * Enqueue/dequeue of bursts in 2 threads + */ + +#define RING_NAME "RING_PERF" +#define RING_SIZE 4096 +#define MAX_BURST 32 + +/* + * the sizes to enqueue and dequeue in testing + * (marked volatile so they won't be seen as compile-time constants) + */ +static const volatile unsigned bulk_sizes[] = { 1, 8, 32 }; + +/* The ring structure used for tests */ +static struct rte_ring *r; +static struct rte_pktdev *r_pdev; +static uint8_t ring_ethdev_port; +static struct rte_pktdev *re_pdev; + +/* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles */ +static void +test_empty_dequeue(void) +{ + const unsigned iter_shift = 26; + const unsigned iterations = 1<<iter_shift; + unsigned i = 0; + void *burst[MAX_BURST]; + + const uint64_t sc_start = rte_rdtsc(); + for (i = 0; i < iterations; i++) + rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]); + const uint64_t sc_end = rte_rdtsc(); + + const uint64_t mc_start = rte_rdtsc(); + for (i = 0; i < iterations; i++) + rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0]); + const uint64_t mc_end = rte_rdtsc(); + + printf("SC empty dequeue: %.2F\n", + (double)(sc_end-sc_start) / iterations); + printf("MC empty dequeue: %.2F\n", + (double)(mc_end-mc_start) / iterations); +} + +/* + * Test function that determines how long an enqueue + dequeue of a single item + * takes on a single lcore. Result is for comparison with the bulk enq+deq. + */ +static void +test_single_enqueue_dequeue(void) +{ + const unsigned iter_shift = 24; + const unsigned iterations = 1<<iter_shift; + unsigned i = 0; + void *burst = NULL; + struct rte_mbuf *mburst[1] = { NULL }; + + const uint64_t sc_start = rte_rdtsc_precise(); + rte_compiler_barrier(); + for (i = 0; i < iterations; i++) { + rte_ring_enqueue_bulk(r, &burst, 1); + rte_ring_dequeue_bulk(r, &burst, 1); + } + const uint64_t sc_end = rte_rdtsc_precise(); + rte_compiler_barrier(); + + const uint64_t pd_start = rte_rdtsc_precise(); + rte_compiler_barrier(); + for (i = 0; i < iterations; i++) { + rte_pkt_tx_burst(r_pdev, mburst, 1); + rte_pkt_rx_burst(r_pdev, mburst, 1); + } + const uint64_t pd_end = rte_rdtsc_precise(); + rte_compiler_barrier(); + + const uint64_t eth_start = rte_rdtsc_precise(); + rte_compiler_barrier(); + for (i = 0; i < iterations; i++) { + rte_eth_tx_burst(ring_ethdev_port, 0, mburst, 1); + rte_eth_rx_burst(ring_ethdev_port, 0, mburst, 1); + } + const uint64_t eth_end = rte_rdtsc_precise(); + rte_compiler_barrier(); + + const uint64_t pd_eth_start = rte_rdtsc_precise(); + rte_compiler_barrier(); + for (i = 0; i < iterations; i++) { + rte_pkt_tx_burst(re_pdev, mburst, 1); + rte_pkt_rx_burst(re_pdev, mburst, 1); + } + const uint64_t pd_eth_end = rte_rdtsc_precise(); + rte_compiler_barrier(); + + printf("Ring single enq/dequeue : %"PRIu64"\n", + (sc_end-sc_start) >> iter_shift); + printf("Pktdev(ring) single enq/deq : %"PRIu64"\n", + (pd_end-pd_start) >> iter_shift); + printf("Ethdev single enq/dequeue : %"PRIu64"\n", + (eth_end-eth_start) >> iter_shift); + printf("Pktdev(ethdev) single enq/deq: %"PRIu64"\n", + (pd_eth_end-pd_eth_start) >> iter_shift); +} + +/* Times enqueue and dequeue on a single lcore */ +static void +test_bulk_enqueue_dequeue(void) +{ + const unsigned iter_shift = 23; + const unsigned iterations = 1<<iter_shift; + unsigned sz, i = 0; + struct rte_mbuf *burst[MAX_BURST] = {0}; + + for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) { + const uint64_t sc_start = rte_rdtsc(); + for (i = 0; i < iterations; i++) { + rte_ring_sp_enqueue_bulk(r, (void *)burst, bulk_sizes[sz]); + rte_ring_sc_dequeue_bulk(r, (void *)burst, bulk_sizes[sz]); + } + const uint64_t sc_end = rte_rdtsc(); + + const uint64_t pd_start = rte_rdtsc_precise(); + rte_compiler_barrier(); + for (i = 0; i < iterations; i++) { + rte_pkt_tx_burst(r_pdev, burst, bulk_sizes[sz]); + rte_pkt_rx_burst(r_pdev, burst, bulk_sizes[sz]); + } + const uint64_t pd_end = rte_rdtsc_precise(); + rte_compiler_barrier(); + + const uint64_t eth_start = rte_rdtsc_precise(); + rte_compiler_barrier(); + for (i = 0; i < iterations; i++) { + rte_eth_tx_burst(ring_ethdev_port, 0, burst, bulk_sizes[sz]); + rte_eth_rx_burst(ring_ethdev_port, 0, burst, bulk_sizes[sz]); + } + const uint64_t eth_end = rte_rdtsc_precise(); + rte_compiler_barrier(); + + const uint64_t pd_eth_start = rte_rdtsc_precise(); + rte_compiler_barrier(); + for (i = 0; i < iterations; i++) { + rte_pkt_tx_burst(re_pdev, burst, bulk_sizes[sz]); + rte_pkt_rx_burst(re_pdev, burst, bulk_sizes[sz]); + } + const uint64_t pd_eth_end = rte_rdtsc_precise(); + rte_compiler_barrier(); + + double sc_avg = ((double)(sc_end-sc_start) / + (iterations * bulk_sizes[sz])); + double pd_avg = ((double)(pd_end-pd_start) / + (iterations * bulk_sizes[sz])); + double eth_avg = ((double)(eth_end-eth_start) / + (iterations * bulk_sizes[sz])); + double pd_eth_avg = ((double)(pd_eth_end-pd_eth_start) / + (iterations * bulk_sizes[sz])); + + printf("ring bulk enq/dequeue (size: %u): %.1F\n", bulk_sizes[sz], + sc_avg); + printf("pktdev(ring) bulk enq/deq (%u) : %.1F\n", bulk_sizes[sz], + pd_avg); + printf("ethdev bulk enq/dequeue (%u) : %.1F\n", bulk_sizes[sz], + eth_avg); + printf("pktdev(ethdev) bulk enq/deq (%u): %.1F\n", bulk_sizes[sz], + pd_eth_avg); + + printf("\n"); + } +} + +static int +test_pktdev_perf(void) +{ + const struct rte_eth_conf port_conf_default = {0}; + struct rte_mempool *p; + + r = rte_ring_create(RING_NAME, RING_SIZE, rte_socket_id(), + RING_F_SP_ENQ|RING_F_SC_DEQ); + if (r == NULL && (r = rte_ring_lookup(RING_NAME)) == NULL) + return -1; + + r_pdev = rte_pktdev_from_ring(r); + ring_ethdev_port = rte_eth_from_rings("TEST_RING", + &r, 1, &r, 1, /* one RX ring, one TX ring */ + rte_socket_id()); + rte_eth_dev_configure(ring_ethdev_port, 1, 1, &port_conf_default); + p = rte_pktmbuf_pool_create("Test pool", 1023, 32, 0, 2048, rte_socket_id()); + rte_eth_rx_queue_setup(ring_ethdev_port, 0, 128, rte_socket_id(), NULL, p); + rte_eth_tx_queue_setup(ring_ethdev_port, 0, 128, rte_socket_id(), NULL); + + re_pdev = rte_pktdev_from_ethport(ring_ethdev_port, 0, 0); + + printf("### Testing single element and burst enq/deq ###\n"); + test_single_enqueue_dequeue(); + + printf("\n### Testing empty dequeue ###\n"); + test_empty_dequeue(); + + printf("\n### Testing using a single lcore ###\n"); + test_bulk_enqueue_dequeue(); + + return 0; +} + +static struct test_command ring_perf_cmd = { + .command = "pktdev_perf_autotest", + .callback = test_pktdev_perf, +}; +REGISTER_TEST_COMMAND(ring_perf_cmd); diff --git a/lib/librte_pktdev/rte_pktdev.h b/lib/librte_pktdev/rte_pktdev.h index 3acbc0d..4740c67 100644 --- a/lib/librte_pktdev/rte_pktdev.h +++ b/lib/librte_pktdev/rte_pktdev.h @@ -46,6 +46,7 @@ extern "C" { #include <stdint.h> #include <rte_ring.h> +#include <rte_mbuf.h> #include <rte_branch_prediction.h> /* Buffered TX works in bursts of 32 */ @@ -53,9 +54,8 @@ extern "C" { /* * forward definition of data structures. - * We don't need full mbuf/kni/ethdev headers here + * We don't need full kni/ethdev headers here */ -struct rte_mbuf; struct rte_kni; struct rte_eth_dev; @@ -136,7 +136,7 @@ struct rte_pktdev { * of pointers to *rte_mbuf* structures effectively supplied to the * *rx_pkts* array. */ -static inline uint16_t +static inline uint16_t __attribute__((always_inline)) rte_pkt_rx_burst(struct rte_pktdev *dev, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) { @@ -168,7 +168,7 @@ rte_pkt_rx_burst(struct rte_pktdev *dev, struct rte_mbuf **rx_pkts, * the transmit ring. The return value can be less than the value of the * *tx_pkts* parameter when the transmit ring is full or has been filled up. */ -static inline uint16_t +static inline uint16_t __attribute__((always_inline)) rte_pkt_tx_burst(struct rte_pktdev *dev, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { -- 2.4.2 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Bruce Richardson ` (5 preceding siblings ...) 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 6/6] test: add pktdev performance tests Bruce Richardson @ 2015-06-10 13:26 ` Thomas Monjalon 6 siblings, 0 replies; 19+ messages in thread From: Thomas Monjalon @ 2015-06-10 13:26 UTC (permalink / raw) To: Bruce Richardson; +Cc: dev 2015-06-10 14:07, Bruce Richardson: > Following on from the feedback received from the community about the pktdev idea, > I've decided not to push this approach further for DPDK 2.1. > > Instead, for future releases, I'll look at taking some of what was investigated in > this work and see if it can be applied to the existing ethdev library, which seems > to be the favoured point of convergence in the community. Hopefully, we can get > ethdev to meet all the requirements I had looked for for pktdev. [If not, I may > need to come back to look at this again, but I hope not! :-)] Maybe we should start removing some things from ethdev. There are some functions or structures which are specific to some devices only. Why not include a header from the driver when we want to use a specific feature of this driver? Examples: ixgbe bypass mode, ixgbe queue stats mapping, Rx/Tx thresholds tuning... ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2015-06-10 13:27 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-11 16:29 [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type Bruce Richardson 2015-05-11 16:29 ` [dpdk-dev] [RFC PATCHv2 1/2] Add example pktdev implementation Bruce Richardson 2015-05-11 16:29 ` [dpdk-dev] [RFC PATCHv2 2/2] example app showing pktdevs used in a chain Bruce Richardson 2015-05-19 11:31 ` [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type Bruce Richardson 2015-05-20 0:19 ` Wiles, Keith 2015-05-20 8:31 ` Thomas Monjalon 2015-05-20 10:05 ` Marc Sune 2015-05-20 10:28 ` Neil Horman 2015-05-20 17:01 ` Marc Sune 2015-05-20 18:47 ` Neil Horman 2015-05-21 12:12 ` Richardson, Bruce 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 1/6] kni: add function to query the name of a kni object Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 2/6] pktdev: Add pktdev implementation library Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 3/6] example app showing pktdevs used in a chain Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 4/6] new pktdev l2fwd sample Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 5/6] pktdev: adding app test Bruce Richardson 2015-06-10 13:07 ` [dpdk-dev] [RFC-PATCH-v3 6/6] test: add pktdev performance tests Bruce Richardson 2015-06-10 13:26 ` [dpdk-dev] [RFC-PATCH-v3 0/6] pktdev update Thomas Monjalon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).