From: "Wiles, Keith" <keith.wiles@intel.com>
To: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [RFC PATCH 0/4] pktdev
Date: Mon, 20 Apr 2015 13:30:53 +0000 [thread overview]
Message-ID: <D15A68B1.1D677%keith.wiles@intel.com> (raw)
In-Reply-To: <D15A663F.1D665%keith.wiles@intel.com>
On 4/20/15, 8:19 AM, "Wiles, Keith" <keith.wiles@intel.com> wrote:
>
>
>From: Marc Sune <marc.sune@bisdn.de<mailto:marc.sune@bisdn.de>>
>Date: Monday, April 20, 2015 at 1:51 AM
>To: Keith Wiles <keith.wiles@intel.com<mailto:keith.wiles@intel.com>>,
>"dev@dpdk.org<mailto:dev@dpdk.org>" <dev@dpdk.org<mailto:dev@dpdk.org>>
>Subject: Re: [dpdk-dev] [RFC PATCH 0/4] pktdev
>
>
>
>On 17/04/15 21:50, Wiles, Keith wrote:
>
>Hi Marc and Bruce,
>
>Hi Keith, Bruce,
>
>
>On 4/17/15, 1:49 PM, "Marc Sune"
><marc.sune@bisdn.de><mailto:marc.sune@bisdn.de> wrote:
>
>
>
>On 17/04/15 17:16, Bruce Richardson wrote:
>
>
>Hi all,
>
>to continue this discussion a bit more, here is my, slightly different,
>slant
>on what a pktdev abstraction may look like.
>
>The primary objective I had in mind when drafting this is to provide the
>minimal abstraction that can be *easily* used as a common device
>abstraction for
>existing (and future) device types to be passed to dataplane code. The
>patchset
>demonstrates this by defining a minimal interface for pktdev - since I
>firmly
>believe the interface should be as small as possible - and then showing
>how that
>common interface can be used to unify rings and ethdevs under a common
>API for the
>datapath. I believe any attempt to unify things much beyond this to the
>control
>plane or setup phase is not worth doing - at least not initially - as at
>init time the code always needs to be aware of the underlying resource
>type in
>order to configure it properly for dataplane use.
>
>The overall objective I look to achieve is illustrated by the final
>patch in
>the series, which is a sample app where the same code is used for all
>cores,
>irrespective of the underlying device type.
>
>To get to that point, patch 1 defines the minimal API - just RX and TX.
>The .c
>file in the library is empty for simplicity, though I would see some
>functionality moving there when/if it makes sense e.g. the callback
>support
>from ethdev, as is done in Keith's patchset.
>Patch 2 then makes very minimal changes to ethdev to allow ethdevs to
>be used
>as pktdevs, and to make use of the pktdev functions when appropriate
>Patch 3 was, for me, the key test for this implementation - how hard
>was it to
>make an rte_ring usable as a pktdev too. Two single-line functions for
>RX/TX
>and a separate "converter" function proved to be all that was necessary
>here -
>and I believe simpler solutions may be possible too, as the extra
>structures
>allocated on conversion could be merged into the rte_ring structure
>itself and
>initialized on ring creation if we prefer that option. It is
>hoped/presumed that
>wrapping other structures, such as KNI, may prove to be just as easily
>done.
>[Not attempted yet - left as an exercise for the reader :-)].
>
>Now, in terms of pktdev vs ethdev, there is nothing in this proposal
>that
>cannot also be done using ethdev AFAIK. However, pktdev as outlined here
>should make the process far easier than trying to create a full PMD for
>something.
>All NIC specific functions, including things like stop/start, are
>stripped out,
>as they don't make sense for an rte_ring or other software objects.
>Also, the other thing this provides is that we can move away from just
>using
>port ids. Instead in the same way as we now reference
>rings/mempools/KNIs etc
>via pointer, we can do the same with ethernet ports as pktdevs on the
>data path.
>There was discussion previously on moving beyond 8-bit port ids. If we
>look to
>use ethdev as a common abstraction, I feel that change will soon have
>to be made
>causing a large amount of code churn.
>
>
>Hi Richard,
>
>First thank you both for taking the time to look at this. I did not not
>reply to Keith because you Richard summarized most of my concerns.
>
>I had a brief look to this second proposal. It is more aligned to what I
>had in mind. But still I feel it is slightly too complicated. I don't
>like much the necessary (in your approach) MACRO-like pkt_dev_data
>struct. It is also slightly inconvenient that the user has to do:
>
>+ struct rte_pkt_dev *in = rte_eth_get_dev(0);
>
>+ struct rte_pkt_dev *out = rte_ring_get_dev(
>+ rte_ring_create(name, 4096,
>rte_socket_id(), 0));
>
>
>
>What about something like (~pseudo-code):
>
>rte_pkt_dev_data.h:
>
> enum rte_pkt_dev_type{
> RTE_PKT_DEV_ETH,
> RTE_PKT_DEV_RING,
> RTE_PKT_DEV_KNI,
> //Keep adding as more PMDs are supported
> };
>
>
> //This struct may be redundant if there is nothing more
> struct rte_pkt_dev_data{
> enum rte_pkt_dev_type;
> //Placeholder, maybe we need more...
> };
>
> //Make RX/TX pktdev APIs more readable, but not really needed
> typedef void pkt_dev_t;
>
>(In all PMDs and e.g. KNI and RINGs):
>
> struct rte_eth_dev {
> struct rte_pkt_dev_data pkt_dev;//
><++++++++++++++++++++++++++++++++++
> eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
> eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function.
>*/
> struct rte_eth_dev_data *data; /**< Pointer to device data */
> /...
>
>rte_pkt_dev.h:
>
> #include <rte_ethdev.h>
> //Include PMD (and non-PMD) TX/RX headers...
>
> static inline uint16_t
> rte_pkt_tx_burst(pkt_dev_t* dev, uint16_t queue_id,
> struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> {
> switch (((struct rte_pkt_dev_data*)dev)->type){
> case RTE_PKT_DEV_ETH:
> struct rte_eth_dev* eth_dev = (struct
>rte_eth_dev*)pkt_dev;
> rte_pkt_tx_burst(eth_dev, queue_id, tx_pkts, nb_pkts);
> break;
> case RTE_PKT_DEV_RING:
> //...
> }
> }
>
> //...
>
>
>I do not see the functional different from my proposal other then a bit
>more search/replace, which does effect some of the PMD¹s. The changes to
>the PMD is just to use the macros as we now have a nested structure.
>
>I think what I am proposing is different. See below.
>
Start of Keith's comment.
>
>The nesting and movement some structures and code to a common location is
>to provide a better way to add other devices. Moving code into a common
>set of structures and APIs should only improve the code as we get more
>devices added to DPDK. The basic flow and design is the same.
End of Keith’s comment.
>
>In your proposal you are movig TX/RX queues as well as some shared stuff
>and, most importantly, you are using function pointers to execute the
>RX/TX routines.
>
>What I was proposing is to try to add the minimum common shared state in
>order to properly demultiplex the RX/TX call and have a common set of
>abstract calls (the pkt_dev type). In a way, I was proposing to
>deliberately not have a shared struct rte_dev_data because I think the
>internals of the "pkt_dev" can be very different across devices (e.g.
>queues in kni vs eth port vs. crypto?). I treat the pkt_dev as a "black
>box" that conforms to TX/RX API, leaving the developer of that device to
>define its internal structures as it better suites the needs. I only use
>each of the specific device type TX/RX APIs (external to us, pkt_dev
>library) in rte_pkt_dev.h. This also simplifies the refactor required to
>eventually integrate the rte_pkt_dev library and builds it "on top" of
>the existing APIs.
>
>The other important difference with both, Bruce and your approach, and
>mine is the use of function pointers for RX/TX. I don't use them, which
>makes the entire abstracted TX/RX (including the final RX/TX routines
>itself) functions be "inlinable".
Making something inlinable is not based on using a pointer or not, is that
what you mean?
>
>Btw, I forgot to add something basic in the previous pseudo-code. The
>different types have to be conditionally compiled according to
>compiled-in DPDK libs:
>
>rte_pkt_dev.h:
>
> #include <rte_config.h>
>
> //Eth devices
> #ifdef RTE_LIBRTE_ETHER
> #include <rte_ethdev.h>
> #endif
>
> //KNI
> #ifdef RTE_LIBRTE_KNI
> #include <rte_kni.h>
> #endif
>
> //...
> //Include PMD (and non-PMD) TX/RX headers...
>
> static inline uint16_t
> rte_pkt_tx_burst(pkt_dev_t* dev, uint16_t queue_id,
> struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> {
> switch (((struct rte_pkt_dev_data*)dev)->type){
> #ifdef RTE_LIBRTE_ETHER
> case RTE_PKT_DEV_ETH:
> struct rte_eth_dev* eth_dev = (struct
>rte_eth_dev*)pkt_dev;
> rte_pkt_tx_burst(eth_dev, queue_id, tx_pkts, nb_pkts);
> break;
> #endif
>
> #ifdef RTE_LIBRTE_KNI
> case RTE_PKT_DEV_KNI:
> //...
> break;
> #endif
>
> default:
> //Corrupted type or unsupported (without compiled
>support)
> //Ignore or fail(fatal error)?
> break;
> }
> }
>
> //...
>
>This adds a switch into the performance path, not a great solution IMO.
>
>
>From rte_common_device.h (format screwed with cut/paste and not all of the
>code)
>
>
>struct rte_dev_data {
> char name[RTE_DEV_NAME_MAX_LEN]; /**< Unique identifier name */
>
> void **rx_queues; /**< Array of pointers to RX
>queues. */
> void **tx_queues; /**< Array of pointers to TX
>queues. */
>
> uint16_t nb_rx_queues; /**< Number of RX queues. */
> uint16_t nb_tx_queues; /**< Number of TX queues. */
>
> uint16_t mtu; /**< Maximum Transmission Unit. */
> uint8_t unit_id; /**< Unit/Port ID for this
>instance */
> uint8_t _pad0; /* alignment filler */
>
> void *dev_private; /**< PMD-specific private data */
> uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation
>failures. */
> uint32_t min_rx_buf_size; /**< Common rx buffer size
>handled by all
>queues */
>};
>
>struct rte_dev_info {
> struct rte_pci_device *pci_dev; /**< Device PCI information.
>*/
> const char *driver_name; /**< Device Driver name.
>*/
> unsigned int if_index; /**< Index to bound host
>interface, or 0
>if none. */
> /* Use if_indextoname() to translate into an interface name. */
> unsigned int _pad0;
>};
>
>
>
>struct rte_dev {
> dev_rx_burst_t rx_pkt_burst; /**< Pointer to PMD
>receive
>function. */
> dev_tx_burst_t tx_pkt_burst; /**< Pointer to PMD
>transmit
>function. */
> void *data; /**<
>Pointer to device data */
> struct rte_dev_global *gbl_data; /**< Pointer to device
>global data
>*/
> const struct rte_dev_drv *driver; /**< Driver for this
>device */
> void *dev_ops; /**<
>Functions exported by PMD */
> struct rte_pci_device *pci_dev; /**< PCI info. supplied by
>probing */
> struct rte_dev_info *dev_info; /**<
>Pointer to the device info
>structure */
>
> /** User application callback for interrupts if present */
> struct rte_dev_cb_list link_intr_cbs;
> /**
> * User-supplied functions called from rx_burst to post-process
> * received packets before passing them to the user
> */
> struct rte_dev_rxtx_callback **post_rx_burst_cbs;
> /**
> * User-supplied functions called from tx_burst to pre-process
> * received packets before passing them to the driver for
>transmission.
> */
> struct rte_dev_rxtx_callback **pre_tx_burst_cbs;
>
>What is the indended use of these callbacks and what is the benefit? If
>the user is supplying the callback, why not putting it directly after the
>RX and before the TX call in user's code (anyway)?
Keith’s Comment.
>
>Just moving the callback routines next to the Rx/Tx routine pointers does
>not make a lot of different in this structure, right?
Keith’s comment end.
>
>
>
> enum rte_dev_type dev_type; /**< Flag
>indicating the device type */
> uint8_t attached; /**< Flag indicating the
>port is
>attached */
>};
>
>From rte_ethdev.h
>
>struct rte_eth_dev_info {
> struct rte_dev_info di; /**< Common device information */
>
> /* Rest of structure is for private device data */
> uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
> uint32_t max_rx_pktlen; /**< Maximum configurable length of RX
>pkt. */
> uint16_t max_rx_queues; /**< Maximum number of RX queues. */
> uint16_t max_tx_queues; /**< Maximum number of TX queues. */
> uint32_t max_mac_addrs; /**< Maximum number of MAC addresses. */
> uint32_t max_hash_mac_addrs;
> /** Maximum number of hash MAC addresses for MTA and UTA. */
> uint16_t max_vfs; /**< Maximum number of VFs. */
> uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
> uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
> uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> uint16_t reta_size;
> /**< Device redirection table size, the total number of entries.
>*/
> /** Bit mask of RSS offloads, the bit offset also means flow type
>*/
> uint64_t flow_type_rss_offloads;
> struct rte_eth_rxconf default_rxconf; /**< Default RX
>configuration */
> struct rte_eth_txconf default_txconf; /**< Default TX
>configuration */
> uint16_t vmdq_queue_base; /**< First queue ID for VMDQ pools. */
> uint16_t vmdq_queue_num; /**< Queue number for VMDQ pools. */
> uint16_t vmdq_pool_base; /**< First ID of VMDQ pools. */
>};
>
>struct rte_eth_dev_data {
> struct rte_dev_data dd; /**<
>Common device data */
>
> /* Rest of the structure is private data */
> struct rte_eth_dev_sriov sriov; /**< SRIOV data */
>
> struct rte_eth_link dev_link;
> /**< Link-level information & status */
>
> struct rte_eth_conf dev_conf; /**< Configuration applied to
>device. */
> struct ether_addr* mac_addrs; /**< Device Ethernet Link
>address. */
> uint64_t mac_pool_sel[ETH_NUM_RECEIVE_MAC_ADDR];
> /** bitmap array of associating Ethernet MAC addresses to pools */
> struct ether_addr* hash_mac_addrs;
>
> /** Device Ethernet MAC addresses of hash filtering. */
> uint8_t promiscuous : 1, /**< RX promiscuous mode ON(1) /
>OFF(0). */
> scattered_rx : 1, /**< RX of scattered packets is
>ON(1) / OFF(0) */
> all_multicast: 1, /**< RX all multicast mode ON(1) /
>OFF(0). */
> dev_started : 1; /**< Device state: STARTED(1) /
>STOPPED(0). */
>};
>
>In the case of rte_ethdev.h the rte_eth_dev turns into a macro '#define
>rte_eth_dev rte_dev', in other devices they may have a rte_xyz_dev
>structure rte_dev as the first member and then private members. Also later
>if we find something private to ethdev then a structure can be created.
>
>
>Most of the code changes outside of these changes are because using a
>macro to fix the code was easier. Rewriting the functions to use real
>variables instead of casting void pointer could yield without the macros.
>Not sure as the performance impact in rewriting some of these functions to
>eliminate the macros.
>
>This is the heart of what I am proposing and it just so happens to allow
>Bruce¹ ideas as well. I have another patch I have not sent that includes
>Bruce¹s ideas using my suggestions.
>
>Being able to use a single API for all devices does have its benefits IMO,
>but I want to cleanup ethdev to allow for other devices to be added to the
>system.
>
>At some point (when the cryptodev is done) I want to add it to this
>framework, plus add more APIs for crypto similar to Linux Crypto API or
>something we can agree is a reasonable set of APIs for DPDK.
>
>I currently don't see the benefit of abstracting beyond the RX/TX
>functions, and I see more cons than pros. Doing so we will also make
>librte_pkt_dev sort of mandatory, even if you are not using it our
>abstracted RX/TX functions at all.
Keith Comment.
>
>I do not see why we would not pull these common items out. Can you list
>your cons as I do not see your point here.
Keith Comment
>
>
>regards
>marc
>
>
>
>Regards,
>
>++Keith
>
>
>Maybe it is oversimplified? With this approach tough, we would barely
>touch the PMDs and the functionality can be built on top of the existing
>PMDs.
>
>Thoughts?
>Marc
>
>
>
>
>Bruce Richardson (4):
> Add example pktdev implementation
> Make ethdev explicitly a subclass of pktdev
> add support for a ring to be a pktdev
> example app showing pktdevs used in a chain
>
> config/common_bsdapp | 5 +
> config/common_linuxapp | 5 +
> examples/pktdev/Makefile | 57 +++++++++++
> examples/pktdev/basicfwd.c | 222
>+++++++++++++++++++++++++++++++++++++++++
> lib/Makefile | 1 +
> lib/librte_ether/rte_ethdev.h | 26 ++---
> lib/librte_pktdev/Makefile | 56 +++++++++++
> lib/librte_pktdev/rte_pktdev.c | 35 +++++++
> lib/librte_pktdev/rte_pktdev.h | 144 ++++++++++++++++++++++++++
> lib/librte_ring/rte_ring.c | 41 ++++++++
> lib/librte_ring/rte_ring.h | 3 +
> 11 files changed, 582 insertions(+), 13 deletions(-)
> create mode 100644 examples/pktdev/Makefile
> create mode 100644 examples/pktdev/basicfwd.c
> create mode 100644 lib/librte_pktdev/Makefile
> create mode 100644 lib/librte_pktdev/rte_pktdev.c
> create mode 100644 lib/librte_pktdev/rte_pktdev.h
>
>
>
next prev parent reply other threads:[~2015-04-20 13:30 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-13 19:44 [dpdk-dev] [RFC PATCH 0/4 v2] Extending DPDK with multiple device support Keith Wiles
2015-04-13 19:44 ` [dpdk-dev] [RFC PATCH 1/4 v2] Adding the common device files for " Keith Wiles
2015-05-04 13:13 ` Marc Sune
2015-05-04 14:44 ` Wiles, Keith
2015-04-13 19:44 ` [dpdk-dev] [RFC PATCH 2/4 v2] Add the ethdev changes " Keith Wiles
2015-04-13 19:44 ` [dpdk-dev] [RFC PATCH 3/4 v2] Add the test file changes for common " Keith Wiles
2015-04-13 19:44 ` [dpdk-dev] [RFC PATCH 4/4 v2] Update PMD files for new " Keith Wiles
2015-04-17 15:16 ` [dpdk-dev] [RFC PATCH 0/4] pktdev Bruce Richardson
2015-04-17 15:16 ` [dpdk-dev] [RFC PATCH 1/4] Add example pktdev implementation Bruce Richardson
2015-04-20 11:26 ` Ananyev, Konstantin
2015-04-20 15:02 ` Bruce Richardson
2015-04-21 8:40 ` Ananyev, Konstantin
2015-04-21 9:23 ` Bruce Richardson
2015-04-17 15:16 ` [dpdk-dev] [RFC PATCH 2/4] Make ethdev explicitly a subclass of pktdev Bruce Richardson
2015-04-17 15:16 ` [dpdk-dev] [RFC PATCH 3/4] add support for a ring to be a pktdev Bruce Richardson
2015-04-17 17:31 ` Neil Horman
2015-04-18 0:00 ` Ouyang, Changchun
2015-04-20 10:32 ` Ananyev, Konstantin
2015-04-17 15:16 ` [dpdk-dev] [RFC PATCH 4/4] example app showing pktdevs used in a chain Bruce Richardson
2015-04-17 17:28 ` [dpdk-dev] [RFC PATCH 0/4] pktdev Neil Horman
2015-04-17 18:49 ` Marc Sune
2015-04-17 19:50 ` Wiles, Keith
2015-04-20 6:51 ` Marc Sune
2015-04-20 10:43 ` Bruce Richardson
2015-04-20 17:03 ` Marc Sune
2015-04-20 13:19 ` Wiles, Keith
2015-04-20 13:30 ` Wiles, Keith [this message]
2015-05-04 13:13 ` [dpdk-dev] [RFC PATCH 0/4 v2] Extending DPDK with multiple device support Marc Sune
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D15A68B1.1D677%keith.wiles@intel.com \
--to=keith.wiles@intel.com \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).