From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id AD26F5A31 for ; Tue, 21 Apr 2015 11:23:41 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 21 Apr 2015 02:23:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,614,1422950400"; d="scan'208";a="683273691" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.220.106]) by orsmga001.jf.intel.com with SMTP; 21 Apr 2015 02:23:38 -0700 Received: by (sSMTP sendmail emulation); Tue, 21 Apr 2015 10:23:38 +0025 Date: Tue, 21 Apr 2015 10:23:37 +0100 From: Bruce Richardson To: "Ananyev, Konstantin" Message-ID: <20150421092337.GA5360@bricha3-MOBL3> References: <1428954274-26944-1-git-send-email-keith.wiles@intel.com> <1429283804-28087-1-git-send-email-bruce.richardson@intel.com> <1429283804-28087-2-git-send-email-bruce.richardson@intel.com> <2601191342CEEE43887BDE71AB97725821416F4D@irsmsx105.ger.corp.intel.com> <20150420150236.GA9656@bricha3-MOBL3> <2601191342CEEE43887BDE71AB9772582141B5D1@irsmsx105.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB9772582141B5D1@irsmsx105.ger.corp.intel.com> Organization: Intel Shannon Ltd. User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [RFC PATCH 1/4] Add example pktdev implementation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Apr 2015 09:23:42 -0000 On Tue, Apr 21, 2015 at 09:40:05AM +0100, Ananyev, Konstantin wrote: > > -----Original Message----- > > From: Richardson, Bruce > > Sent: Monday, April 20, 2015 4:03 PM > > To: Ananyev, Konstantin > > Cc: dev@dpdk.org; Wiles, Keith > > Subject: Re: [dpdk-dev] [RFC PATCH 1/4] Add example pktdev implementation > > > > On Mon, Apr 20, 2015 at 12:26:43PM +0100, Ananyev, Konstantin wrote: > > > > > > > > > > -----Original Message----- > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > > > > Sent: Friday, April 17, 2015 4:17 PM > > > > To: dev@dpdk.org; Wiles, Keith > > > > Subject: [dpdk-dev] [RFC PATCH 1/4] Add example pktdev implementation > > > > > > > > This commit demonstrates what a minimal API for all packet handling > > > > types would look like. It simply provides the necessary parts for > > > > receiving and transmiting packets, and is based off the ethdev > > > > implementation. > > > > --- > > > > config/common_bsdapp | 5 ++ > > > > config/common_linuxapp | 5 ++ > > > > lib/Makefile | 1 + > > > > lib/librte_pktdev/Makefile | 56 ++++++++++++++++ > > > > lib/librte_pktdev/rte_pktdev.c | 35 ++++++++++ > > > > lib/librte_pktdev/rte_pktdev.h | 144 +++++++++++++++++++++++++++++++++++++++++ > > > > 6 files changed, 246 insertions(+) > > > > create mode 100644 lib/librte_pktdev/Makefile > > > > create mode 100644 lib/librte_pktdev/rte_pktdev.c > > > > create mode 100644 lib/librte_pktdev/rte_pktdev.h > > > > > > > > diff --git a/config/common_bsdapp b/config/common_bsdapp > > > > index 8ff4dc2..d2b932c 100644 > > > > --- a/config/common_bsdapp > > > > +++ b/config/common_bsdapp > > > > @@ -132,6 +132,11 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y > > > > CONFIG_RTE_LIBRTE_KVARGS=y > > > > > > > > # > > > > +# Compile generic packet handling device library > > > > +# > > > > +CONFIG_RTE_LIBRTE_PKTDEV=y > > > > + > > > > +# > > > > # Compile generic ethernet library > > > > # > > > > CONFIG_RTE_LIBRTE_ETHER=y > > > > diff --git a/config/common_linuxapp b/config/common_linuxapp > > > > index 09a58ac..5bda416 100644 > > > > --- a/config/common_linuxapp > > > > +++ b/config/common_linuxapp > > > > @@ -129,6 +129,11 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y > > > > CONFIG_RTE_LIBRTE_KVARGS=y > > > > > > > > # > > > > +# Compile generic packet handling device library > > > > +# > > > > +CONFIG_RTE_LIBRTE_PKTDEV=y > > > > + > > > > +# > > > > # Compile generic ethernet library > > > > # > > > > CONFIG_RTE_LIBRTE_ETHER=y > > > > diff --git a/lib/Makefile b/lib/Makefile > > > > index d94355d..4db5ee0 100644 > > > > --- a/lib/Makefile > > > > +++ b/lib/Makefile > > > > @@ -32,6 +32,7 @@ > > > > include $(RTE_SDK)/mk/rte.vars.mk > > > > > > > > DIRS-y += librte_compat > > > > +DIRS-$(CONFIG_RTE_LIBRTE_PKTDEV) += librte_pktdev > > > > DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal > > > > DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc > > > > DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring > > > > diff --git a/lib/librte_pktdev/Makefile b/lib/librte_pktdev/Makefile > > > > new file mode 100644 > > > > index 0000000..2d3b3a1 > > > > --- /dev/null > > > > +++ b/lib/librte_pktdev/Makefile > > > > @@ -0,0 +1,56 @@ > > > > +# BSD LICENSE > > > > +# > > > > +# Copyright(c) 2015 Intel Corporation. All rights reserved. > > > > +# All rights reserved. > > > > +# > > > > +# Redistribution and use in source and binary forms, with or without > > > > +# modification, are permitted provided that the following conditions > > > > +# are met: > > > > +# > > > > +# * Redistributions of source code must retain the above copyright > > > > +# notice, this list of conditions and the following disclaimer. > > > > +# * Redistributions in binary form must reproduce the above copyright > > > > +# notice, this list of conditions and the following disclaimer in > > > > +# the documentation and/or other materials provided with the > > > > +# distribution. > > > > +# * Neither the name of Intel Corporation nor the names of its > > > > +# contributors may be used to endorse or promote products derived > > > > +# from this software without specific prior written permission. > > > > +# > > > > +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > > > > +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > > > > +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > > > > +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > > > > +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > > > > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > > > > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > > > > +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > > > > +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > > > > +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > > > > +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > > > > + > > > > +include $(RTE_SDK)/mk/rte.vars.mk > > > > + > > > > +# > > > > +# library name > > > > +# > > > > +LIB = libpktdev.a > > > > + > > > > +CFLAGS += -O3 > > > > +CFLAGS += $(WERROR_FLAGS) > > > > + > > > > +EXPORT_MAP := rte_pktdev_version.map > > > > + > > > > +LIBABIVER := 1 > > > > + > > > > +SRCS-y += rte_pktdev.c > > > > + > > > > +# > > > > +# Export include files > > > > +# > > > > +SYMLINK-y-include += rte_pktdev.h > > > > + > > > > +# this lib depends upon no others: > > > > +DEPDIRS-y += > > > > + > > > > +include $(RTE_SDK)/mk/rte.lib.mk > > > > diff --git a/lib/librte_pktdev/rte_pktdev.c b/lib/librte_pktdev/rte_pktdev.c > > > > new file mode 100644 > > > > index 0000000..4c32d86 > > > > --- /dev/null > > > > +++ b/lib/librte_pktdev/rte_pktdev.c > > > > @@ -0,0 +1,36 @@ > > > > +/*- > > > > + * BSD LICENSE > > > > + * > > > > + * Copyright(c) 2015 Intel Corporation. All rights reserved. > > > > + * All rights reserved. > > > > + * > > > > + * Redistribution and use in source and binary forms, with or without > > > > + * modification, are permitted provided that the following conditions > > > > + * are met: > > > > + * > > > > + * * Redistributions of source code must retain the above copyright > > > > + * notice, this list of conditions and the following disclaimer. > > > > + * * Redistributions in binary form must reproduce the above copyright > > > > + * notice, this list of conditions and the following disclaimer in > > > > + * the documentation and/or other materials provided with the > > > > + * distribution. > > > > + * * Neither the name of Intel Corporation nor the names of its > > > > + * contributors may be used to endorse or promote products derived > > > > + * from this software without specific prior written permission. > > > > + * > > > > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > > > > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > > > > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > > > > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > > > > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > > > > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > > > > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > > > > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > > > > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > > > > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > > > > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > > > > + */ > > > > + > > > > +#include "rte_pktdev.h" > > > > + > > > > +/* For future use */ > > > > diff --git a/lib/librte_pktdev/rte_pktdev.h b/lib/librte_pktdev/rte_pktdev.h > > > > new file mode 100644 > > > > index 0000000..8a5699a > > > > --- /dev/null > > > > +++ b/lib/librte_pktdev/rte_pktdev.h > > > > @@ -0,0 +1,144 @@ > > > > +/*- > > > > + * BSD LICENSE > > > > + * > > > > + * Copyright(c) 2015 Intel Corporation. All rights reserved. > > > > + * All rights reserved. > > > > + * > > > > + * Redistribution and use in source and binary forms, with or without > > > > + * modification, are permitted provided that the following conditions > > > > + * are met: > > > > + * > > > > + * * Redistributions of source code must retain the above copyright > > > > + * notice, this list of conditions and the following disclaimer. > > > > + * * Redistributions in binary form must reproduce the above copyright > > > > + * notice, this list of conditions and the following disclaimer in > > > > + * the documentation and/or other materials provided with the > > > > + * distribution. > > > > + * * Neither the name of Intel Corporation nor the names of its > > > > + * contributors may be used to endorse or promote products derived > > > > + * from this software without specific prior written permission. > > > > + * > > > > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > > > > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > > > > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > > > > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > > > > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > > > > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > > > > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > > > > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > > > > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > > > > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > > > > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > > > > + */ > > > > + > > > > +#ifndef _RTE_PKTDEV_H_ > > > > +#define _RTE_PKTDEV_H_ > > > > + > > > > +#include > > > > + > > > > +/** > > > > + * @file > > > > + * > > > > + * RTE Packet Processing Device API > > > > + */ > > > > + > > > > +#ifdef __cplusplus > > > > +extern "C" { > > > > +#endif > > > > + > > > > +/* forward definition of mbuf structure. We don't need full mbuf header here */ > > > > +struct rte_mbuf; > > > > + > > > > +#define RTE_PKT_NAME_MAX_LEN (32) > > > > + > > > > +typedef uint16_t (*pkt_rx_burst_t)(void *rxq, > > > > + struct rte_mbuf **rx_pkts, > > > > + uint16_t nb_pkts); > > > > +/**< @internal Retrieve packets from a queue of a device. */ > > > > + > > > > +typedef uint16_t (*pkt_tx_burst_t)(void *txq, > > > > + struct rte_mbuf **tx_pkts, > > > > + uint16_t nb_pkts); > > > > +/**< @internal Send packets on a queue of a device. */ > > > > + > > > > +#define RTE_PKT_DEV_HDR(structname) struct { \ > > > > + pkt_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */ \ > > > > + pkt_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */ \ > > > > + struct structname ## _data *data; /**< Pointer to device data */ \ > > > > +} > > > > + > > > > +#define RTE_PKT_DEV_DATA_HDR struct { \ > > > > + char name[RTE_PKT_NAME_MAX_LEN]; /**< Unique identifier name */ \ > > > > +\ > > > > + void **rx_queues; /**< Array of pointers to RX queues. */ \ > > > > + void **tx_queues; /**< Array of pointers to TX queues. */ \ > > > > + uint16_t nb_rx_queues; /**< Number of RX queues. */ \ > > > > + uint16_t nb_tx_queues; /**< Number of TX queues. */ \ > > > > +} > > > > + > > > > +struct rte_pkt_dev { > > > > + RTE_PKT_DEV_HDR(rte_pkt_dev); > > > > +}; > > > > + > > > > +struct rte_pkt_dev_data { > > > > + RTE_PKT_DEV_DATA_HDR; > > > > +}; > > > > + > > > > +/** > > > > + * > > > > + * Retrieve a burst of input packets from a receive queue of a > > > > + * device. The retrieved packets are stored in *rte_mbuf* structures whose > > > > + * pointers are supplied in the *rx_pkts* array. > > > > + * > > > > + * @param dev > > > > + * The device to be polled for packets > > > > + * @param queue_id > > > > + * The index of the receive queue from which to retrieve input packets. > > > > + * @param rx_pkts > > > > + * The address of an array of pointers to *rte_mbuf* structures that > > > > + * must be large enough to store *nb_pkts* pointers in it. > > > > + * @param nb_pkts > > > > + * The maximum number of packets to retrieve. > > > > + * @return > > > > + * The number of packets actually retrieved, which is the number > > > > + * of pointers to *rte_mbuf* structures effectively supplied to the > > > > + * *rx_pkts* array. > > > > + */ > > > > +static inline uint16_t > > > > +rte_pkt_rx_burst(struct rte_pkt_dev *dev, uint16_t queue_id, > > > > + struct rte_mbuf **rx_pkts, uint16_t nb_pkts) > > > > +{ > > > > + return (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id], > > > > + rx_pkts, nb_pkts); > > > > +} > > > > + > > > > +/** > > > > + * Send a burst of output packets on a transmit queue of a device. > > > > + * > > > > + * @param dev > > > > + * The device to be given the packets. > > > > + * @param queue_id > > > > + * The index of the queue through which output packets must be sent. > > > > + * @param tx_pkts > > > > + * The address of an array of *nb_pkts* pointers to *rte_mbuf* structures > > > > + * which contain the output packets. > > > > + * @param nb_pkts > > > > + * The maximum number of packets to transmit. > > > > + * @return > > > > + * The number of output packets actually stored in transmit descriptors of > > > > + * the transmit ring. The return value can be less than the value of the > > > > + * *tx_pkts* parameter when the transmit ring is full or has been filled up. > > > > + */ > > > > +static inline uint16_t > > > > +rte_pkt_tx_burst(struct rte_pkt_dev *dev, uint16_t queue_id, > > > > + struct rte_mbuf **tx_pkts, uint16_t nb_pkts) > > > > +{ > > > > + return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts); > > > > +} > > > > > > That one looks much more lightweight, then Keith one :) > > > I have a question here: > > > Why are you guys so confident, that all foreseeable devices would fit into current eth_dev rx_burst/tx_burst API? > > > As I understand, QAT devices have HW request/response ring pairs, so the idea probably is to make tx_burst() > > > populate an request ring and rx_burst() to read from response ring, right? > > > Though right now rte_mbuf contains a lot of flags and data fields that are specific for ethdev, and probably have no sense for crypto dev. > > > From other side, for QAT devices, I suppose you'll need crypto specific flags and data: > > > encrypt/decrypt, source and destination buffer addresses, some cipher specific data, etc. > > > Wonder do you plan to fit all that into current rte_mbuf structure, or do you plan to have some superset structure, > > > that would consist of rte_mbuf plus some extra stuff, or ... ? > > > Konstantin > > > > > > > > > > From my point of view, the purpose of a pktdev API to to define a common API > > for devices that read/write (i.e. RX/TX) mbufs. If other devices don't fit that > > model, then they don't use it. However, we already have a number of different > > device types that already read/write mbufs, and there are likely more in future, > > so it would be nice to have a common API that can be used across those types so > > the data path can be source-neutral when processing mbufs. So far, we have > > three proposals on how that might be achieved. :-) > > Ok, probably I misunderstood previous RFCs about pktdev. > So pktdev is not going to be a generic abstraction for any foreseeable device types: eth, crypto, compress, dpi, etc? > It would represent only subset of dev types: real eth devices, plus SW emulated ones > that can mimic rx_burst/tx_burst, but don't support any other eth specific ops (kni, pmd_ring)? > Basically what we support right now? > And for future device types to support (crypto, etc) there would be something else? > Not necessarily. I would expect future device types to also use this model - but I wouldn't want to be prescriptive, if it really doesn't make sense. I was really hoping to get multiple-benefit from a pktdev abstraction: * commonality of interface across existing pkt-handling types * framework for any new types to actually use. > > > > As for the crypto, there is still a lot of investigation work to be done here, > > and Keith is probably better placed than me to comment on it. But if you look at any > > application using DPDK for packet processing, the mbuf is still going to be at > > the heart of it, so whatever the actual crypto API is like, it's going to have > > to have to fit into an "mbuf world" anyway. > > If the crypto-dev APIs don't work > > with mbufs, then it's pushing work onto the app (or higher level libs) to extract > > the meaningful fields from the mbuf to pass to the dev. > > As I understand application (or some upper layer lib) would have to do that work anyway: > RX packet, find and read related to this packet crypto data, and pass all that info to the crypto dev, right? > I can hardly imagine, that in reality you always would be able to do: > > n = rx_burst(eth_dev, pkts, n); > n = tx_burst(crypto_dev, pkts, n); > > without any processing of incoming packets in between. > The difference is only how that information will be passed into crypto device: > Would you need to to update an mbuf, or fill some other structure. Possibly not. However, I would hope we may be able to get to something like: n = rx_burst(eth_dev, pkts, n); for (i = 0; i < n; i++) bufs[i]->crypto_context = lookup_sa(bufs[i]); n = tx_burst(crypto_dev, pkts, n); > > > Therefore, in a DPDK > > library, I would think it better to have the dev work with mbufs directly, as > > the rest of DPDK does. > > I wouldn't say that all DPDK libs accepts data as mbufs only. > Only those where it makes sense: PMDs, ip_frag/reassembly, pipeline fw. > Let say rte_hash and rte_acl accept input data in a neutral way. > You wouldn't expect rte_hash_lookup() or rte_acl_classify() to accept pointer(s) to the mbuf. > if you'll have a SW implemented encrypt()/decrypt() routines, you probably wouldn't require an mbuf as input parameter either. > Wouldn't it be more plausible to create a new struct rte_crypto_buf (or something) to pass information to/from crypto devices. > Then we can provide for it an ability to attach/detach to rte_mbuf(s) buf_addr(s). > That way we can probably make crypto_buf to attach to 2 mbufs at once (source and destination). Maybe what you suggest is the way to go. However, on the hash side, I actually think it would be very useful to have our hash library have APIs to work directly off mbufs. For a single lookup pulling out a key is ok, but when working off a full burst of traffic, doing optimized code to extract keys from packets to pass as an array into a hash is awkward for the app, and also prevents full code pipelining in the lookup. [Ideally, you want your key extraction to be running in parallel with lookups of previously extracted keys. Something a bit similar to what is done for lookups in lib/librte_table/rte_table_hash_key16.c] /Bruce