From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id CB386106A for ; Mon, 12 Dec 2016 22:09:55 +0100 (CET) Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP; 12 Dec 2016 13:09:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,338,1477983600"; d="scan'208";a="41560178" Received: from fmsmsx106.amr.corp.intel.com ([10.18.124.204]) by orsmga005.jf.intel.com with ESMTP; 12 Dec 2016 13:09:54 -0800 Received: from fmsmsx113.amr.corp.intel.com ([169.254.13.68]) by FMSMSX106.amr.corp.intel.com ([169.254.5.166]) with mapi id 14.03.0248.002; Mon, 12 Dec 2016 13:09:54 -0800 From: "Wiles, Keith" To: Marc CC: "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v12] net/tap: new TUN/TAP device PMD Thread-Index: AQHSVIWG1K/NAExgrUa/DQt6YE3l/6EFNMcAgAAgcwA= Date: Mon, 12 Dec 2016 21:09:53 +0000 Message-ID: <0DA78301-3A97-49C3-9041-F96EFD43D5CC@intel.com> References: <1476396234-44694-1-git-send-email-keith.wiles@intel.com> <20161212143838.37269-1-keith.wiles@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.254.133.241] Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v12] net/tap: new TUN/TAP device PMD X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Dec 2016 21:09:56 -0000 > On Dec 12, 2016, at 1:13 PM, Marc wrote: >=20 > Keith, >=20 > A bit late, but two very high level questions. Do you have performance nu= mbers compared to KNI? Did you consider using AF_PACKET PACKET_MMAP which c= ould potentially reduce the number of syscalls to 1 for RX and TX of a burs= t? Hi Marc, I was not trying to create a high performance interface, just a Tap interfa= ce to use standard applications and calls to send/receive traffic to the DP= DK application. I did not expect other then some management like interface = in the application would use the Tap PMD. >=20 > Marc >=20 > On 12 December 2016 at 15:38, Keith Wiles wrote: > The PMD allows for DPDK and the host to communicate using a raw > device interface on the host and in the DPDK application. The device > created is a Tap device with a L2 packet header. >=20 > v12- Fixup minor changes for driver_name and version number > v11- Add the tap.rst to the nic/index.rst file > v10- Change the string name used to allow for multiple devices. > v9 - Fix up the docs to use correct syntax > v8 - Fix issue with tap_tx_queue_setup() not return zero on success. > v7 - Reword the comment in common_base and fix the data->name issue > v6 - fixed the checkpatch issues > v5 - merge in changes from list review see related emails > fixed many minor edits > v4 - merge with latest driver changes > v3 - fix includes by removing ifdef for other type besides Linux > Fix the copyright notice in the Makefile > v2 - merge all of the patches into one patch > Fix a typo on naming the tap device > Update the maintainers list >=20 > Signed-off-by: Keith Wiles > --- > MAINTAINERS | 5 + > config/common_base | 9 + > config/common_linuxapp | 1 + > doc/guides/nics/index.rst | 1 + > doc/guides/nics/tap.rst | 136 ++++++ > drivers/net/Makefile | 1 + > drivers/net/tap/Makefile | 57 +++ > drivers/net/tap/rte_eth_tap.c | 765 ++++++++++++++++++++++++++= ++++++ > drivers/net/tap/rte_pmd_tap_version.map | 4 + > mk/rte.app.mk | 1 + > 10 files changed, 980 insertions(+) > create mode 100644 doc/guides/nics/tap.rst > create mode 100644 drivers/net/tap/Makefile > create mode 100644 drivers/net/tap/rte_eth_tap.c > create mode 100644 drivers/net/tap/rte_pmd_tap_version.map >=20 > diff --git a/MAINTAINERS b/MAINTAINERS > index 26d9590..842fb6d 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -398,6 +398,11 @@ F: doc/guides/nics/pcap_ring.rst > F: app/test/test_pmd_ring.c > F: app/test/test_pmd_ring_perf.c >=20 > +Tap PMD > +M: Keith Wiles > +F: drivers/net/tap > +F: doc/guides/nics/tap.rst > + > Null Networking PMD > M: Tetsuya Mukawa > F: drivers/net/null/ > diff --git a/config/common_base b/config/common_base > index 652a839..eb51cdb 100644 > --- a/config/common_base > +++ b/config/common_base > @@ -590,3 +590,12 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=3Dn > CONFIG_RTE_TEST_PMD=3Dy > CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=3Dn > CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=3Dn > + > +# > +# Compile the TAP PMD > +# > +# The TAP PMD is currently only built for Linux and the > +# option is enabled by default in common_linuxapp file, > +# set to 'n' in the common_base file. > +# > +CONFIG_RTE_LIBRTE_PMD_TAP=3Dn > diff --git a/config/common_linuxapp b/config/common_linuxapp > index 2483dfa..782b503 100644 > --- a/config/common_linuxapp > +++ b/config/common_linuxapp > @@ -44,3 +44,4 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=3Dy > CONFIG_RTE_LIBRTE_PMD_AF_PACKET=3Dy > CONFIG_RTE_LIBRTE_POWER=3Dy > CONFIG_RTE_VIRTIO_USER=3Dy > +CONFIG_RTE_LIBRTE_PMD_TAP=3Dy > diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst > index 92d56a5..af92529 100644 > --- a/doc/guides/nics/index.rst > +++ b/doc/guides/nics/index.rst > @@ -51,6 +51,7 @@ Network Interface Controller Drivers > nfp > qede > szedata2 > + tap > thunderx > virtio > vhost > diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst > new file mode 100644 > index 0000000..622b9e7 > --- /dev/null > +++ b/doc/guides/nics/tap.rst > @@ -0,0 +1,136 @@ > +.. BSD LICENSE > + Copyright(c) 2016 Intel Corporation. All rights reserved. > + All rights reserved. > + > + Redistribution and use in source and binary forms, with or without > + modification, are permitted provided that the following conditions > + are met: > + > + * Redistributions of source code must retain the above copyright > + notice, this list of conditions and the following disclaimer. > + * Redistributions in binary form must reproduce the above copyright > + notice, this list of conditions and the following disclaimer in > + the documentation and/or other materials provided with the > + distribution. > + * Neither the name of Intel Corporation nor the names of its > + contributors may be used to endorse or promote products derived > + from this software without specific prior written permission. > + > + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FO= R > + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL= , > + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE= , > + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON AN= Y > + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE US= E > + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + > +Tun/Tap Poll Mode Driver > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > + > +The ``rte_eth_tap.c`` PMD creates a device using TUN/TAP interfaces on t= he > +local host. The PMD allows for DPDK and the host to communicate using a = raw > +device interface on the host and in the DPDK application. > + > +The device created is a TAP device, which sends/receives packet in a raw > +format with a L2 header. The usage for a TAP PMD is for connectivity to = the > +local host using a TAP interface. When the TAP PMD is initialized it wil= l > +create a number of tap devices in the host accessed via ``ifconfig -a`` = or > +``ip`` command. The commands can be used to assign and query the virtual= like > +device. > + > +These TAP interfaces can be used with Wireshark or tcpdump or Pktgen-DPD= K > +along with being able to be used as a network connection to the DPDK > +application. The method enable one or more interfaces is to use the > +``--vdev=3Dnet_tap`` option on the DPDK application command line. Each > +``--vdev=3Dnet_tap`` option give will create an interface named dtap0, d= tap1, > +and so on. > + > +The interfaced name can be changed by adding the ``iface=3Dfoo0``, for e= xample:: > + > + --vdev=3Dnet_tap,iface=3Dfoo0 --vdev=3Dnet_tap,iface=3Dfoo1, ... > + > +Also the speed of the interface can be changed from 10G to whatever numb= er > +needed, but the interface does not enforce that speed, for example:: > + > + --vdev=3Dnet_tap,iface=3Dfoo0,speed=3D25000 > + > +After the DPDK application is started you can send and receive packets o= n the > +interface using the standard rx_burst/tx_burst APIs in DPDK. From the ho= st > +point of view you can use any host tool like tcpdump, Wireshark, ping, P= ktgen > +and others to communicate with the DPDK application. The DPDK applicatio= n may > +not understand network protocols like IPv4/6, UDP or TCP unless the > +application has been written to understand these protocols. > + > +If you need the interface as a real network interface meaning running an= d has > +a valid IP address then you can do this with the following commands:: > + > + sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap= 0 > + sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap= 1 > + > +Please change the IP addresses as you see fit. > + > +If routing is enabled on the host you can also communicate with the DPDK= App > +over the internet via a standard socket layer application as long as you > +account for the protocol handing in the application. > + > +If you have a Network Stack in your DPDK application or something like i= t you > +can utilize that stack to handle the network protocols. Plus you would b= e able > +to address the interface using an IP address assigned to the internal > +interface. > + > +Example > +------- > + > +The following is a simple example of using the TUN/TAP PMD with the Pktg= en > +packet generator. It requires that the ``socat`` utility is installed on= the > +test system. > + > +Build DPDK, then pull down Pktgen and build pktgen using the DPDK SDK/Ta= rget > +used to build the dpdk you pulled down. > + > +Run pktgen from the pktgen directory in a terminal with a commandline li= ke the > +following:: > + > + sudo ./app/app/x86_64-native-linuxapp-gcc/app/pktgen -l 1-5 -n 4 = \ > + --proc-type auto --log-level 8 --socket-mem 512,512 --file-prefix p= g \ > + --vdev=3Dnet_tap --vdev=3Dnet_tap -b 05:00.0 -b 05:00.1 = \ > + -b 04:00.0 -b 04:00.1 -b 04:00.2 -b 04:00.3 = \ > + -b 81:00.0 -b 81:00.1 -b 81:00.2 -b 81:00.3 = \ > + -b 82:00.0 -b 83:00.0 -- -T -P -m [2:3].0 -m [4:5].1 = \ > + -f themes/black-yellow.theme > + > +.. Note: > + > + Change the ``-b`` options to blacklist all of your physical ports. Th= e > + following command line is all one line. > + > + Also, ``-f themes/black-yellow.theme`` is optional if the default col= ors > + work on your system configuration. See the Pktgen docs for more > + information. > + > +Verify with ``ifconfig -a`` command in a different xterm window, should = have a > +``dtap0`` and ``dtap1`` interfaces created. > + > +Next set the links for the two interfaces to up via the commands below:: > + > + sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dta= p0 > + sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dta= p1 > + > +Then use socat to create a loopback for the two interfaces:: > + > + sudo socat interface:dtap0 interface:dtap1 > + > +Then on the Pktgen command line interface you can start sending packets = using > +the commands ``start 0`` and ``start 1`` or you can start both at the sa= me > +time with ``start all``. The command ``str`` is an alias for ``start all= `` and > +``stp`` is an alias for ``stop all``. > + > +While running you should see the 64 byte counters increasing to verify t= he > +traffic is being looped back. You can use ``set all size XXX`` to change= the > +size of the packets after you stop the traffic. Use the pktgen ``help`` > +command to see a list of all commands. You can also use the ``-f`` optio= n to > +load commands at startup. > diff --git a/drivers/net/Makefile b/drivers/net/Makefile > index bc93230..e366a85 100644 > --- a/drivers/net/Makefile > +++ b/drivers/net/Makefile > @@ -51,6 +51,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) +=3D pcap > DIRS-$(CONFIG_RTE_LIBRTE_QEDE_PMD) +=3D qede > DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) +=3D ring > DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) +=3D szedata2 > +DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) +=3D tap > DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) +=3D thunderx > DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=3D virtio > DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) +=3D vmxnet3 > diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile > new file mode 100644 > index 0000000..e18f30c > --- /dev/null > +++ b/drivers/net/tap/Makefile > @@ -0,0 +1,57 @@ > +# BSD LICENSE > +# > +# Copyright(c) 2016 Intel Corporation. All rights reserved. > +# > +# Redistribution and use in source and binary forms, with or without > +# modification, are permitted provided that the following conditions > +# are met: > +# > +# * Redistributions of source code must retain the above copyright > +# notice, this list of conditions and the following disclaimer. > +# * Redistributions in binary form must reproduce the above copyrigh= t > +# notice, this list of conditions and the following disclaimer in > +# the documentation and/or other materials provided with the > +# distribution. > +# * Neither the name of Intel Corporation nor the names of its > +# contributors may be used to endorse or promote products derived > +# from this software without specific prior written permission. > +# > +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FO= R > +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL= , > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE= , > +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON AN= Y > +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE US= E > +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + > +include $(RTE_SDK)/mk/rte.vars.mk > + > +# > +# library name > +# > +LIB =3D librte_pmd_tap.a > + > +EXPORT_MAP :=3D rte_pmd_tap_version.map > + > +LIBABIVER :=3D 1 > + > +CFLAGS +=3D -O3 > +CFLAGS +=3D $(WERROR_FLAGS) > + > +# > +# all source are stored in SRCS-y > +# > +SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) +=3D rte_eth_tap.c > + > +# this lib depends upon: > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) +=3D lib/librte_eal > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) +=3D lib/librte_mbuf > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) +=3D lib/librte_mempool > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) +=3D lib/librte_ether > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) +=3D lib/librte_kvargs > + > +include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.= c > new file mode 100644 > index 0000000..976f2d9 > --- /dev/null > +++ b/drivers/net/tap/rte_eth_tap.c > @@ -0,0 +1,765 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2016 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyrig= ht > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS F= OR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGH= T > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTA= L, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF US= E, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON A= NY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE U= SE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE= . > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* Linux based path to the TUN device */ > +#define TUN_TAP_DEV_PATH "/dev/net/tun" > +#define DEFAULT_TAP_NAME "dtap" > + > +#define ETH_TAP_IFACE_ARG "iface" > +#define ETH_TAP_SPEED_ARG "speed" > + > +#define RTE_PMD_TAP_MAX_QUEUES 16 > + > +static struct rte_vdev_driver pmd_tap_drv; > + > +static const char *valid_arguments[] =3D { > + ETH_TAP_IFACE_ARG, > + ETH_TAP_SPEED_ARG, > + NULL > +}; > + > +static int tap_unit; > + > +static struct rte_eth_link pmd_link =3D { > + .link_speed =3D ETH_SPEED_NUM_10G, > + .link_duplex =3D ETH_LINK_FULL_DUPLEX, > + .link_status =3D ETH_LINK_DOWN, > + .link_autoneg =3D ETH_LINK_SPEED_AUTONEG > +}; > + > +struct pkt_stats { > + uint64_t opackets; /* Number of output packets */ > + uint64_t ipackets; /* Number of input packets */ > + uint64_t obytes; /* Number of bytes on output */ > + uint64_t ibytes; /* Number of bytes on input */ > + uint64_t errs; /* Number of error packets */ > +}; > + > +struct rx_queue { > + struct rte_mempool *mp; /* Mempool for RX packets */ > + uint16_t in_port; /* Port ID */ > + int fd; > + > + struct pkt_stats stats; /* Stats for this RX queue */ > +}; > + > +struct tx_queue { > + int fd; > + struct pkt_stats stats; /* Stats for this TX queue */ > +}; > + > +struct pmd_internals { > + char name[RTE_ETH_NAME_MAX_LEN]; /* Internal Tap device na= me */ > + uint16_t nb_queues; /* Number of queues supported */ > + struct ether_addr eth_addr; /* Mac address of the device port= */ > + > + int if_index; /* IF_INDEX for the port */ > + int fds[RTE_PMD_TAP_MAX_QUEUES]; /* List of all file descriptors = */ > + > + struct rx_queue rxq[RTE_PMD_TAP_MAX_QUEUES]; /* List of RX que= ues */ > + struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES]; /* List of TX que= ues */ > +}; > + > +/* Tun/Tap allocation routine > + * > + * name is the number of the interface to use, unless NULL to take the h= ost > + * supplied name. > + */ > +static int > +tun_alloc(char *name) > +{ > + struct ifreq ifr; > + unsigned int features; > + int fd; > + > + memset(&ifr, 0, sizeof(struct ifreq)); > + > + ifr.ifr_flags =3D IFF_TAP | IFF_NO_PI; > + if (name && name[0]) > + strncpy(ifr.ifr_name, name, IFNAMSIZ); > + > + fd =3D open(TUN_TAP_DEV_PATH, O_RDWR); > + if (fd < 0) { > + RTE_LOG(ERR, PMD, "Unable to create TAP interface"); > + goto error; > + } > + > + /* Grab the TUN features to verify we can work */ > + if (ioctl(fd, TUNGETFEATURES, &features) < 0) { > + RTE_LOG(ERR, PMD, "Unable to get TUN/TAP features\n"); > + goto error; > + } > + RTE_LOG(DEBUG, PMD, "TUN/TAP Features %08x\n", features); > + > + if (!(features & IFF_MULTI_QUEUE) && (RTE_PMD_TAP_MAX_QUEUES > 1)= ) { > + RTE_LOG(DEBUG, PMD, "TUN/TAP device only one queue\n"); > + goto error; > + } else if ((features & IFF_ONE_QUEUE) && > + (RTE_PMD_TAP_MAX_QUEUES =3D=3D 1)) { > + ifr.ifr_flags |=3D IFF_ONE_QUEUE; > + RTE_LOG(DEBUG, PMD, "Single queue only support\n"); > + } else { > + ifr.ifr_flags |=3D IFF_MULTI_QUEUE; > + RTE_LOG(DEBUG, PMD, "Multi-queue support for %d queues\n"= , > + RTE_PMD_TAP_MAX_QUEUES); > + } > + > + /* Set the TUN/TAP configuration and get the name if needed */ > + if (ioctl(fd, TUNSETIFF, (void *)&ifr) < 0) { > + RTE_LOG(ERR, PMD, "Unable to set TUNSETIFF for %s\n", > + ifr.ifr_name); > + perror("TUNSETIFF"); > + goto error; > + } > + > + /* Always set the file descriptor to non-blocking */ > + if (fcntl(fd, F_SETFL, O_NONBLOCK) < 0) { > + RTE_LOG(ERR, PMD, "Unable to set to nonblocking\n"); > + perror("F_SETFL, NONBLOCK"); > + goto error; > + } > + > + /* If the name is different that new name as default */ > + if (name && strcmp(name, ifr.ifr_name)) > + snprintf(name, RTE_ETH_NAME_MAX_LEN - 1, "%s", ifr.ifr_na= me); > + > + return fd; > + > +error: > + if (fd > 0) > + close(fd); > + return -1; > +} > + > +/* Callback to handle the rx burst of packets to the correct interface a= nd > + * file descriptor(s) in a multi-queue setup. > + */ > +static uint16_t > +pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) > +{ > + int len; > + struct rte_mbuf *mbuf; > + struct rx_queue *rxq =3D queue; > + uint16_t num_rx; > + unsigned long num_rx_bytes =3D 0; > + > + for (num_rx =3D 0; num_rx < nb_pkts; ) { > + /* allocate the next mbuf */ > + mbuf =3D rte_pktmbuf_alloc(rxq->mp); > + if (unlikely(!mbuf)) { > + RTE_LOG(WARNING, PMD, "Unable to allocate mbuf\n"= ); > + break; > + } > + > + len =3D read(rxq->fd, rte_pktmbuf_mtod(mbuf, char *), > + rte_pktmbuf_tailroom(mbuf)); > + if (len <=3D 0) { > + rte_pktmbuf_free(mbuf); > + break; > + } > + > + mbuf->data_len =3D len; > + mbuf->pkt_len =3D len; > + mbuf->port =3D rxq->in_port; > + > + /* account for the receive frame */ > + bufs[num_rx++] =3D mbuf; > + num_rx_bytes +=3D mbuf->pkt_len; > + } > + rxq->stats.ipackets +=3D num_rx; > + rxq->stats.ibytes +=3D num_rx_bytes; > + > + return num_rx; > +} > + > +/* Callback to handle sending packets from the tap interface > + */ > +static uint16_t > +pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) > +{ > + struct rte_mbuf *mbuf; > + struct tx_queue *txq =3D queue; > + struct pollfd pfd; > + uint16_t num_tx =3D 0; > + unsigned long num_tx_bytes =3D 0; > + int i, n; > + > + if (unlikely(nb_pkts =3D=3D 0)) > + return 0; > + > + pfd.events =3D POLLOUT; > + pfd.fd =3D txq->fd; > + for (i =3D 0; i < nb_pkts; i++) { > + n =3D poll(&pfd, 1, 0); > + > + if (n <=3D 0) > + break; > + > + if (pfd.revents & POLLOUT) { > + /* copy the tx frame data */ > + mbuf =3D bufs[num_tx]; > + n =3D write(pfd.fd, rte_pktmbuf_mtod(mbuf, void*)= , > + rte_pktmbuf_pkt_len(mbuf)); > + if (n <=3D 0) > + break; > + > + num_tx++; > + num_tx_bytes +=3D mbuf->pkt_len; > + rte_pktmbuf_free(mbuf); > + } > + } > + > + txq->stats.opackets +=3D num_tx; > + txq->stats.errs +=3D nb_pkts - num_tx; > + txq->stats.obytes +=3D num_tx_bytes; > + > + return num_tx; > +} > + > +static int > +tap_dev_start(struct rte_eth_dev *dev) > +{ > + /* Force the Link up */ > + dev->data->dev_link.link_status =3D ETH_LINK_UP; > + > + return 0; > +} > + > +/* This function gets called when the current port gets stopped. > + */ > +static void > +tap_dev_stop(struct rte_eth_dev *dev) > +{ > + int i; > + struct pmd_internals *internals =3D dev->data->dev_private; > + > + for (i =3D 0; i < internals->nb_queues; i++) > + if (internals->fds[i] !=3D -1) > + close(internals->fds[i]); > + > + dev->data->dev_link.link_status =3D ETH_LINK_DOWN; > +} > + > +static int > +tap_dev_configure(struct rte_eth_dev *dev __rte_unused) > +{ > + return 0; > +} > + > +static void > +tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) > +{ > + struct pmd_internals *internals =3D dev->data->dev_private; > + > + dev_info->if_index =3D internals->if_index; > + dev_info->max_mac_addrs =3D 1; > + dev_info->max_rx_pktlen =3D (uint32_t)ETHER_MAX_VLAN_FRAME_LEN; > + dev_info->max_rx_queues =3D internals->nb_queues; > + dev_info->max_tx_queues =3D internals->nb_queues; > + dev_info->min_rx_bufsize =3D 0; > + dev_info->pci_dev =3D NULL; > +} > + > +static void > +tap_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *tap_stats) > +{ > + unsigned int i, imax; > + unsigned long rx_total =3D 0, tx_total =3D 0, tx_err_total =3D 0; > + unsigned long rx_bytes_total =3D 0, tx_bytes_total =3D 0; > + const struct pmd_internals *pmd =3D dev->data->dev_private; > + > + imax =3D (pmd->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? > + pmd->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS; > + > + for (i =3D 0; i < imax; i++) { > + tap_stats->q_ipackets[i] =3D pmd->rxq[i].stats.ipackets; > + tap_stats->q_ibytes[i] =3D pmd->rxq[i].stats.ibytes; > + rx_total +=3D tap_stats->q_ipackets[i]; > + rx_bytes_total +=3D tap_stats->q_ibytes[i]; > + } > + > + for (i =3D 0; i < imax; i++) { > + tap_stats->q_opackets[i] =3D pmd->txq[i].stats.opackets; > + tap_stats->q_errors[i] =3D pmd->txq[i].stats.errs; > + tap_stats->q_obytes[i] =3D pmd->txq[i].stats.obytes; > + tx_total +=3D tap_stats->q_opackets[i]; > + tx_err_total +=3D tap_stats->q_errors[i]; > + tx_bytes_total +=3D tap_stats->q_obytes[i]; > + } > + > + tap_stats->ipackets =3D rx_total; > + tap_stats->ibytes =3D rx_bytes_total; > + tap_stats->opackets =3D tx_total; > + tap_stats->oerrors =3D tx_err_total; > + tap_stats->obytes =3D tx_bytes_total; > +} > + > +static void > +tap_stats_reset(struct rte_eth_dev *dev) > +{ > + int i; > + struct pmd_internals *pmd =3D dev->data->dev_private; > + > + for (i =3D 0; i < pmd->nb_queues; i++) { > + pmd->rxq[i].stats.ipackets =3D 0; > + pmd->rxq[i].stats.ibytes =3D 0; > + } > + > + for (i =3D 0; i < pmd->nb_queues; i++) { > + pmd->txq[i].stats.opackets =3D 0; > + pmd->txq[i].stats.errs =3D 0; > + pmd->txq[i].stats.obytes =3D 0; > + } > +} > + > +static void > +tap_dev_close(struct rte_eth_dev *dev __rte_unused) > +{ > +} > + > +static void > +tap_rx_queue_release(void *queue) > +{ > + struct rx_queue *rxq =3D queue; > + > + if (rxq && (rxq->fd > 0)) { > + close(rxq->fd); > + rxq->fd =3D -1; > + } > +} > + > +static void > +tap_tx_queue_release(void *queue) > +{ > + struct tx_queue *txq =3D queue; > + > + if (txq && (txq->fd > 0)) { > + close(txq->fd); > + txq->fd =3D -1; > + } > +} > + > +static int > +tap_link_update(struct rte_eth_dev *dev __rte_unused, > + int wait_to_complete __rte_unused) > +{ > + return 0; > +} > + > +static int > +tap_setup_queue(struct rte_eth_dev *dev, > + struct pmd_internals *internals, > + uint16_t qid) > +{ > + struct rx_queue *rx =3D &internals->rxq[qid]; > + struct tx_queue *tx =3D &internals->txq[qid]; > + int fd; > + > + fd =3D rx->fd; > + if (fd < 0) { > + fd =3D tx->fd; > + if (fd < 0) { > + RTE_LOG(INFO, PMD, "Add queue to TAP %s for qid %= d\n", > + dev->data->name, qid); > + fd =3D tun_alloc(dev->data->name); > + if (fd < 0) { > + RTE_LOG(ERR, PMD, "tun_alloc(%s) failed\n= ", > + dev->data->name); > + return -1; > + } > + } > + } > + dev->data->rx_queues[qid] =3D rx; > + dev->data->tx_queues[qid] =3D tx; > + > + rx->fd =3D fd; > + tx->fd =3D fd; > + > + return fd; > +} > + > +static int > +tap_rx_queue_setup(struct rte_eth_dev *dev, > + uint16_t rx_queue_id, > + uint16_t nb_rx_desc __rte_unused, > + unsigned int socket_id __rte_unused, > + const struct rte_eth_rxconf *rx_conf __rte_unused, > + struct rte_mempool *mp) > +{ > + struct pmd_internals *internals =3D dev->data->dev_private; > + uint16_t buf_size; > + int fd; > + > + if ((rx_queue_id >=3D internals->nb_queues) || !mp) { > + RTE_LOG(ERR, PMD, "nb_queues %d mp %p\n", > + internals->nb_queues, mp); > + return -1; > + } > + > + internals->rxq[rx_queue_id].mp =3D mp; > + internals->rxq[rx_queue_id].in_port =3D dev->data->port_id; > + > + /* Now get the space available for data in the mbuf */ > + buf_size =3D (uint16_t)(rte_pktmbuf_data_room_size(mp) - > + RTE_PKTMBUF_HEADROOM); > + > + if (buf_size < ETH_FRAME_LEN) { > + RTE_LOG(ERR, PMD, > + "%s: %d bytes will not fit in mbuf (%d bytes)\n", > + dev->data->name, ETH_FRAME_LEN, buf_size); > + return -ENOMEM; > + } > + > + fd =3D tap_setup_queue(dev, internals, rx_queue_id); > + if (fd =3D=3D -1) > + return -1; > + > + internals->fds[rx_queue_id] =3D fd; > + RTE_LOG(INFO, PMD, "RX TAP device name %s, qid %d on fd %d\n", > + dev->data->name, rx_queue_id, internals->rxq[rx_queue_id]= .fd); > + > + return 0; > +} > + > +static int > +tap_tx_queue_setup(struct rte_eth_dev *dev, > + uint16_t tx_queue_id, > + uint16_t nb_tx_desc __rte_unused, > + unsigned int socket_id __rte_unused, > + const struct rte_eth_txconf *tx_conf __rte_unused) > +{ > + struct pmd_internals *internals =3D dev->data->dev_private; > + int ret; > + > + if (tx_queue_id >=3D internals->nb_queues) > + return -1; > + > + ret =3D tap_setup_queue(dev, internals, tx_queue_id); > + if (ret =3D=3D -1) > + return -1; > + > + RTE_LOG(INFO, PMD, "TX TAP device name %s, qid %d on fd %d\n", > + dev->data->name, tx_queue_id, internals->txq[tx_queue_id]= .fd); > + > + return 0; > +} > + > +static const struct eth_dev_ops ops =3D { > + .dev_start =3D tap_dev_start, > + .dev_stop =3D tap_dev_stop, > + .dev_close =3D tap_dev_close, > + .dev_configure =3D tap_dev_configure, > + .dev_infos_get =3D tap_dev_info, > + .rx_queue_setup =3D tap_rx_queue_setup, > + .tx_queue_setup =3D tap_tx_queue_setup, > + .rx_queue_release =3D tap_rx_queue_release, > + .tx_queue_release =3D tap_tx_queue_release, > + .link_update =3D tap_link_update, > + .stats_get =3D tap_stats_get, > + .stats_reset =3D tap_stats_reset, > +}; > + > +static int > +pmd_mac_address(int fd, struct rte_eth_dev *dev, struct ether_addr *addr= ) > +{ > + struct ifreq ifr; > + > + if ((fd <=3D 0) || !dev || !addr) > + return -1; > + > + memset(&ifr, 0, sizeof(ifr)); > + > + if (ioctl(fd, SIOCGIFHWADDR, &ifr) =3D=3D -1) { > + RTE_LOG(ERR, PMD, "ioctl failed (SIOCGIFHWADDR) (%s)\n", > + ifr.ifr_name); > + return -1; > + } > + > + /* Set the host based MAC address to this special MAC format */ > + ifr.ifr_hwaddr.sa_data[0] =3D 'T'; > + ifr.ifr_hwaddr.sa_data[1] =3D 'a'; > + ifr.ifr_hwaddr.sa_data[2] =3D 'p'; > + ifr.ifr_hwaddr.sa_data[3] =3D '-'; > + ifr.ifr_hwaddr.sa_data[4] =3D dev->data->port_id; > + ifr.ifr_hwaddr.sa_data[5] =3D dev->data->numa_node; > + if (ioctl(fd, SIOCSIFHWADDR, &ifr) =3D=3D -1) { > + RTE_LOG(ERR, PMD, "%s: ioctl failed (SIOCSIFHWADDR) (%s)\= n", > + dev->data->name, ifr.ifr_name); > + return -1; > + } > + > + /* Set the local application MAC address, needs to be different t= hen > + * the host based MAC address. > + */ > + ifr.ifr_hwaddr.sa_data[0] =3D 'd'; > + ifr.ifr_hwaddr.sa_data[1] =3D 'n'; > + ifr.ifr_hwaddr.sa_data[2] =3D 'e'; > + ifr.ifr_hwaddr.sa_data[3] =3D 't'; > + ifr.ifr_hwaddr.sa_data[4] =3D dev->data->port_id; > + ifr.ifr_hwaddr.sa_data[5] =3D dev->data->numa_node; > + rte_memcpy(addr, ifr.ifr_hwaddr.sa_data, ETH_ALEN); > + > + return 0; > +} > + > +static int > +eth_dev_tap_create(const char *name, char *tap_name) > +{ > + int numa_node =3D rte_socket_id(); > + struct rte_eth_dev *dev =3D NULL; > + struct pmd_internals *pmd =3D NULL; > + struct rte_eth_dev_data *data =3D NULL; > + int i, fd =3D -1; > + > + RTE_LOG(INFO, PMD, > + "%s: Create TAP Ethernet device with %d queues on numa %u= \n", > + name, RTE_PMD_TAP_MAX_QUEUES, rte_socket_id()); > + > + data =3D rte_zmalloc_socket(tap_name, sizeof(*data), 0, numa_node= ); > + if (!data) { > + RTE_LOG(INFO, PMD, "Failed to allocate data\n"); > + goto error_exit; > + } > + > + pmd =3D rte_zmalloc_socket(tap_name, sizeof(*pmd), 0, numa_node); > + if (!pmd) { > + RTE_LOG(INFO, PMD, "Unable to allocate internal struct\n"= ); > + goto error_exit; > + } > + > + /* Use the name and not the tap_name */ > + dev =3D rte_eth_dev_allocate(tap_name); > + if (!dev) { > + RTE_LOG(INFO, PMD, "Unable to allocate device struct\n"); > + goto error_exit; > + } > + > + snprintf(pmd->name, sizeof(pmd->name), "%s", tap_name); > + > + pmd->nb_queues =3D RTE_PMD_TAP_MAX_QUEUES; > + > + /* Setup some default values */ > + data->dev_private =3D pmd; > + data->port_id =3D dev->data->port_id; > + data->dev_flags =3D RTE_ETH_DEV_DETACHABLE; > + data->kdrv =3D RTE_KDRV_NONE; > + data->drv_name =3D pmd_tap_drv.driver.name; > + data->numa_node =3D numa_node; > + > + data->dev_link =3D pmd_link; > + data->mac_addrs =3D &pmd->eth_addr; > + data->nb_rx_queues =3D pmd->nb_queues; > + data->nb_tx_queues =3D pmd->nb_queues; > + > + dev->data =3D data; > + dev->dev_ops =3D &ops; > + dev->driver =3D NULL; > + dev->rx_pkt_burst =3D pmd_rx_burst; > + dev->tx_pkt_burst =3D pmd_tx_burst; > + snprintf(dev->data->name, sizeof(dev->data->name), "%s", name); > + > + /* Create the first Tap device */ > + fd =3D tun_alloc(tap_name); > + if (fd < 0) { > + RTE_LOG(INFO, PMD, "tun_alloc() failed\n"); > + goto error_exit; > + } > + > + /* Presetup the fds to -1 as being not working */ > + for (i =3D 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) { > + pmd->fds[i] =3D -1; > + pmd->rxq[i].fd =3D -1; > + pmd->txq[i].fd =3D -1; > + } > + > + /* Take the TUN/TAP fd and place in the first location */ > + pmd->rxq[0].fd =3D fd; > + pmd->txq[0].fd =3D fd; > + pmd->fds[0] =3D fd; > + > + if (pmd_mac_address(fd, dev, &pmd->eth_addr) < 0) { > + RTE_LOG(INFO, PMD, "Unable to get MAC address\n"); > + goto error_exit; > + } > + > + return 0; > + > +error_exit: > + RTE_PMD_DEBUG_TRACE("Unable to initialize %s\n", name); > + > + rte_free(data); > + rte_free(pmd); > + > + rte_eth_dev_release_port(dev); > + > + return -EINVAL; > +} > + > +static int > +set_interface_name(const char *key __rte_unused, > + const char *value, > + void *extra_args) > +{ > + char *name =3D (char *)extra_args; > + > + if (value) > + snprintf(name, RTE_ETH_NAME_MAX_LEN - 1, "%s", value); > + else > + snprintf(name, RTE_ETH_NAME_MAX_LEN - 1, "%s%d", > + DEFAULT_TAP_NAME, (tap_unit - 1)); > + > + return 0; > +} > + > +static int > +set_interface_speed(const char *key __rte_unused, > + const char *value, > + void *extra_args) > +{ > + *(int *)extra_args =3D (value) ? atoi(value) : ETH_SPEED_NUM_10G; > + > + return 0; > +} > + > +/* Open a TAP interface device. > + */ > +static int > +rte_pmd_tap_probe(const char *name, const char *params) > +{ > + int ret; > + struct rte_kvargs *kvlist =3D NULL; > + int speed; > + char tap_name[RTE_ETH_NAME_MAX_LEN]; > + > + speed =3D ETH_SPEED_NUM_10G; > + snprintf(tap_name, sizeof(tap_name), "%s%d", > + DEFAULT_TAP_NAME, tap_unit++); > + > + RTE_LOG(INFO, PMD, "Initializing pmd_tap for %s as %s\n", > + name, tap_name); > + > + if (params && (params[0] !=3D '\0')) { > + RTE_LOG(INFO, PMD, "paramaters (%s)\n", params); > + > + kvlist =3D rte_kvargs_parse(params, valid_arguments); > + if (kvlist) { > + if (rte_kvargs_count(kvlist, ETH_TAP_SPEED_ARG) = =3D=3D 1) { > + ret =3D rte_kvargs_process(kvlist, > + ETH_TAP_SPEED_AR= G, > + &set_interface_s= peed, > + &speed); > + if (ret =3D=3D -1) > + goto leave; > + } > + > + if (rte_kvargs_count(kvlist, ETH_TAP_IFACE_ARG) = =3D=3D 1) { > + ret =3D rte_kvargs_process(kvlist, > + ETH_TAP_IFACE_AR= G, > + &set_interface_n= ame, > + tap_name); > + if (ret =3D=3D -1) > + goto leave; > + } > + } > + } > + pmd_link.link_speed =3D speed; > + > + ret =3D eth_dev_tap_create(name, tap_name); > + > +leave: > + if (ret =3D=3D -1) { > + RTE_LOG(INFO, PMD, "Failed to create pmd for %s as %s\n", > + name, tap_name); > + tap_unit--; /* Restore the unit number */ > + } > + rte_kvargs_free(kvlist); > + > + return ret; > +} > + > +/* detach a TAP device. > + */ > +static int > +rte_pmd_tap_remove(const char *name) > +{ > + struct rte_eth_dev *eth_dev =3D NULL; > + struct pmd_internals *internals; > + int i; > + > + RTE_LOG(INFO, PMD, "Closing TUN/TAP Ethernet device on numa %u\n"= , > + rte_socket_id()); > + > + /* find the ethdev entry */ > + eth_dev =3D rte_eth_dev_allocated(name); > + if (!eth_dev) > + return 0; > + > + internals =3D eth_dev->data->dev_private; > + for (i =3D 0; i < internals->nb_queues; i++) > + if (internals->fds[i] !=3D -1) > + close(internals->fds[i]); > + > + rte_free(eth_dev->data->dev_private); > + rte_free(eth_dev->data); > + > + rte_eth_dev_release_port(eth_dev); > + > + return 0; > +} > + > +static struct rte_vdev_driver pmd_tap_drv =3D { > + .probe =3D rte_pmd_tap_probe, > + .remove =3D rte_pmd_tap_remove, > +}; > +RTE_PMD_REGISTER_VDEV(net_tap, pmd_tap_drv); > +RTE_PMD_REGISTER_ALIAS(net_tap, eth_tap); > +RTE_PMD_REGISTER_PARAM_STRING(net_tap, "iface=3D,speed=3DN"); > diff --git a/drivers/net/tap/rte_pmd_tap_version.map b/drivers/net/tap/rt= e_pmd_tap_version.map > new file mode 100644 > index 0000000..31eca32 > --- /dev/null > +++ b/drivers/net/tap/rte_pmd_tap_version.map > @@ -0,0 +1,4 @@ > +DPDK_17.02 { > + > + local: *; > +}; > diff --git a/mk/rte.app.mk b/mk/rte.app.mk > index f75f0e2..02c32ae 100644 > --- a/mk/rte.app.mk > +++ b/mk/rte.app.mk > @@ -124,6 +124,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) +=3D -lrt= e_pmd_pcap -lpcap > _LDLIBS-$(CONFIG_RTE_LIBRTE_QEDE_PMD) +=3D -lrte_pmd_qede > _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_RING) +=3D -lrte_pmd_ring > _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) +=3D -lrte_pmd_szedata2 -lsz= e2 > +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_TAP) +=3D -lrte_pmd_tap > _LDLIBS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) +=3D -lrte_pmd_thunderx_= nicvf -lm > _LDLIBS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=3D -lrte_pmd_virtio > ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) > -- > 2.8.0.GIT >=20 >=20 Regards, Keith