From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1240A43F63; Thu, 2 May 2024 04:53:22 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9167F40A70; Thu, 2 May 2024 04:52:30 +0200 (CEST) Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by mails.dpdk.org (Postfix) with ESMTP id 85C6F402D7 for ; Thu, 2 May 2024 04:52:19 +0200 (CEST) Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-6f2f6142d64so6973688b3a.2 for ; Wed, 01 May 2024 19:52:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1714618339; x=1715223139; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wGideQ2bpQnfmGJPPC4CFtvQWQFZrTOnVvTIlAIOpQc=; b=ud5jZAVNy12/qUb+jrZdf4Ee6+8zXltpcBNFMQi/hg6GCWdKbVK4PMm+9uruvfwQjA yTP9kaIi5qGF1yPJKUJTzK82VspFIxhOnaaqRfPTumYe5e0W6SmJ0me8SzaYeBPfJPI4 K9dGhqXMoo1yzGrjc5nD4FGwhEMsbVB40KIEmWRRkKFWTSmCnIUu/5rozSBFuvue+bCH 97SVlYpFqDvQv7ccbo5LCAQi13UpV6duMqJAQq+2u6HOGnxo2/yCciThQJ6qKVmjz0gI IE5GMAMngFZq8+LG9HK9dMC9TR4rRCPrF+3yzZZRJXqQQARIas2EO8ht/Gd+6Fae7WSa 8aYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714618339; x=1715223139; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wGideQ2bpQnfmGJPPC4CFtvQWQFZrTOnVvTIlAIOpQc=; b=aue0dCQ/GeHzs9s6/XrRFBzKinxwzeWQh6rZr5y/HcTk+1EPVZiiG7CsuaTYM+y3jO vWsa2tR/gXBtJKFsUzKBtB6y8iSd/sDSGEh6wbM1Hd1jfVrnQXtUlxJ/2LCPcHF+w5LL myUU7yvy1hJkamyTADb2sb0vgRQU2MloCmCk3aEd0AcuPIhdHkIW4EdrTIMQQ6wTnE67 epirnzZQcZqpzciPrP7/p9jUqci43Lr8Vilr4gqd5nwkgFUP3InflSf2CNfc6/Ce+V6m +HdH/eILkkVoDm3OlOKC/rsoJjiJcvIchTHDSyyVYrTMEjUYjNdlK6aNWZ5hUi+3cyTa VNag== X-Gm-Message-State: AOJu0YyT0246hmhuLu0/VQoYFZ87MOpE4wvKcLVa/QUUiaur9mH6Yhiv 5knoAkR9x91oRb2atkX8vjWpHXX1vuDGmXrSqhuiNRLjMc3xZuJetq1W6A1V10d467IgI6TxFj6 Pvng= X-Google-Smtp-Source: AGHT+IFCyRtPNHpg9q+NLcpencArjpO7ATi03xh/oC3vNvXpCayzY3pkbIx1XYaSUHiFEuKQFT6LVw== X-Received: by 2002:a05:6a21:6d9e:b0:1a3:8e1d:16b8 with SMTP id wl30-20020a056a216d9e00b001a38e1d16b8mr918594pzb.28.1714618338506; Wed, 01 May 2024 19:52:18 -0700 (PDT) Received: from hermes.local (204-195-96-226.wavecable.com. [204.195.96.226]) by smtp.gmail.com with ESMTPSA id z19-20020aa78893000000b006ecfc3a8d6csm104818pfe.124.2024.05.01.19.52.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 May 2024 19:52:18 -0700 (PDT) From: Stephen Hemminger To: dev@dpdk.org Cc: Stephen Hemminger Subject: [PATCH v11 9/9] net/tap: update documentation Date: Wed, 1 May 2024 19:49:28 -0700 Message-ID: <20240502025201.28322-10-stephen@networkplumber.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240502025201.28322-1-stephen@networkplumber.org> References: <20240130034925.44869-1-stephen@networkplumber.org> <20240502025201.28322-1-stephen@networkplumber.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org The driver support of flows has changed and the wording in the guide was awkward. Drop references to DPDK pktgen in this documentation since it is not required and confusing. Signed-off-by: Stephen Hemminger --- doc/guides/nics/tap.rst | 274 +++++++------------------ doc/guides/rel_notes/release_24_07.rst | 7 + 2 files changed, 84 insertions(+), 197 deletions(-) diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst index d4f45c02a1..55e38fb25b 100644 --- a/doc/guides/nics/tap.rst +++ b/doc/guides/nics/tap.rst @@ -1,47 +1,51 @@ .. SPDX-License-Identifier: BSD-3-Clause Copyright(c) 2016 Intel Corporation. -Tun|Tap Poll Mode Driver -======================== +TAP Poll Mode Driver +==================== -The ``rte_eth_tap.c`` PMD creates a device using TAP interfaces on the -local host. The PMD allows for DPDK and the host to communicate using a raw -device interface on the host and in the DPDK application. +The TAP Poll Mode Driver (PMD) is a virtual device for injecting packets to be processed +by the Linux kernel. This PMD is useful when writing DPDK application +for offloading network functionality (such as tunneling) from the kernel. -The device created is a TAP device, which sends/receives packet in a raw -format with a L2 header. The usage for a TAP PMD is for connectivity to the -local host using a TAP interface. When the TAP PMD is initialized it will -create a number of tap devices in the host accessed via ``ifconfig -a`` or -``ip`` command. The commands can be used to assign and query the virtual like -device. +From the kernel point of view, the TAP device looks like a regular network interface. +The network device can be managed by standard tools such as ``ip`` and ``ethtool`` commands. +It is also possible to use existing packet tools such as ``wireshark`` or ``tcpdump``. -These TAP interfaces can be used with Wireshark or tcpdump or Pktgen-DPDK -along with being able to be used as a network connection to the DPDK -application. The method enable one or more interfaces is to use the -``--vdev=net_tap0`` option on the DPDK application command line. Each -``--vdev=net_tap1`` option given will create an interface named dtap0, dtap1, -and so on. +From the DPDK application, the TAP device looks like a DPDK ethdev. +Packets are sent and received in L2 (Ethernet) format. The standare DPDK +API's to query for information, statistics and send and receive packets +work as expected. -The interface name can be changed by adding the ``iface=foo0``, for example:: +Requirements +~~~~~~~~~~~~ + +The TAP PMD requires kernel support for multiple queues in TAP device as +well as the multi-queue ``multiq`` and incoming ``ingress`` queue disciplines. +These are standard kernel features in most Linux distributions. + +Arguments +--------- + +TAP devices are created with the command line +``--vdev=net_tap0`` option. This option maybe specified more the once by repeating +with a different ``net_tapX`` device. + +By default, the Linux interfaces are named ``dtap0``, ``dtap1``, etc. +The interface name can be specified by adding the ``iface=foo0``, for example:: --vdev=net_tap0,iface=foo0 --vdev=net_tap1,iface=foo1, ... -Normally the PMD will generate a random MAC address, but when testing or with -a static configuration the developer may need a fixed MAC address style. -Using the option ``mac=fixed`` you can create a fixed known MAC address:: +Normally the PMD will generate a random MAC address. +If a static address is desired instead, the ``mac=fixed`` can be used. --vdev=net_tap0,mac=fixed -The MAC address will have a fixed value with the last octet incrementing by one -for each interface string containing ``mac=fixed``. The MAC address is formatted -as 02:'d':'t':'a':'p':[00-FF]. Convert the characters to hex and you get the -actual MAC address: ``02:64:74:61:70:[00-FF]``. - - --vdev=net_tap0,mac="02:64:74:61:70:11" +With the fixed option, the MAC address will have the first octets: +as 02:'d':'t':'a':'p':[00-FF] and the last octets are the interface number. -The MAC address will have a user value passed as string. The MAC address is in -format with delimiter ``:``. The string is byte converted to hex and you get -the actual MAC address: ``02:64:74:61:70:11``. +To specify a specific MAC address use the conventional representation. +The string is byte converted to hex, the result is MAC address: ``02:64:74:61:70:11``. It is possible to specify a remote netdevice to capture packets from by adding ``remote=foo1``, for example:: @@ -59,40 +63,20 @@ netdevice that has no support in the DPDK. It is possible to add explicit rte_flow rules on the tap PMD to capture specific traffic (see next section for examples). -After the DPDK application is started you can send and receive packets on the -interface using the standard rx_burst/tx_burst APIs in DPDK. From the host -point of view you can use any host tool like tcpdump, Wireshark, ping, Pktgen -and others to communicate with the DPDK application. The DPDK application may -not understand network protocols like IPv4/6, UDP or TCP unless the -application has been written to understand these protocols. - -If you need the interface as a real network interface meaning running and has -a valid IP address then you can do this with the following commands:: - - sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0 - sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1 - -Please change the IP addresses as you see fit. - -If routing is enabled on the host you can also communicate with the DPDK App -over the internet via a standard socket layer application as long as you -account for the protocol handling in the application. - -If you have a Network Stack in your DPDK application or something like it you -can utilize that stack to handle the network protocols. Plus you would be able -to address the interface using an IP address assigned to the internal -interface. - Normally, when the DPDK application exits, the TAP device is marked down and is removed. -But this behaviour can be overridden by the use of the persist flag, example:: +But this behavior can be overridden by the use of the persist flag, example:: --vdev=net_tap0,iface=tap0,persist ... -The TUN PMD allows user to create a TUN device on host. The PMD allows user -to transmit and receive packets via DPDK API calls with L3 header and payload. -The devices in host can be accessed via ``ifconfig`` or ``ip`` command. TUN -interfaces are passed to DPDK ``rte_eal_init`` arguments as ``--vdev=net_tunX``, +TUN devices +----------- + +The TAP device can be used an L3 tunnel only device (TUN). +This type of device does not include the Ethernet (L2) header; all packets +are sent and received as IP packets. + +TUN devices are created with the command line arguments ``--vdev=net_tunX``, where X stands for unique id, example:: --vdev=net_tun0 --vdev=net_tun1,iface=foo1, ... @@ -103,27 +87,33 @@ options. Default interface name is ``dtunX``, where X stands for unique id. Flow API support ---------------- -The tap PMD supports major flow API pattern items and actions, when running on -linux kernels above 4.2 ("Flower" classifier required). -The kernel support can be checked with this command:: +The TAP PMD supports major flow API pattern items and actions. + +Requirements +~~~~~~~~~~~~ - zcat /proc/config.gz | ( grep 'CLS_FLOWER=' || echo 'not supported' ) | - tee -a /dev/stderr | grep -q '=m' && - lsmod | ( grep cls_flower || echo 'try modprobe cls_flower' ) +Flow support in TAP driver requires the Linux kernel support of flow based +traffic control filter ``flower``. This was added in Linux 4.3 kernel. -Supported items: +The implementation of RSS action uses an eBPF module that requires additional +libraries and tools. Building the RSS support requires the ``clang`` +compiler to compile the C code to BPF target; ``bpftool`` to convert the +compiled BPF object to a header file; and ``libbpf`` to load the eBPF +action into the kernel. -- eth: src and dst (with variable masks), and eth_type (0xffff mask). -- vlan: vid, pcp, but not eid. (requires kernel 4.9) -- ipv4/6: src and dst (with variable masks), and ip_proto (0xffff mask). -- udp/tcp: src and dst port (0xffff) mask. +Supported match items: + + - eth: src and dst (with variable masks), and eth_type (0xffff mask). + - vlan: vid, pcp, but not eid. (requires kernel 4.9) + - ipv4/6: src and dst (with variable masks), and ip_proto (0xffff mask). + - udp/tcp: src and dst port (0xffff) mask. Supported actions: - DROP - QUEUE - PASSTHRU -- RSS (requires kernel 4.9) +- RSS It is generally not possible to provide a "last" item. However, if the "last" item, once masked, is identical to the masked spec, then it is supported. @@ -133,7 +123,7 @@ full mask (exact match). As rules are translated to TC, it is possible to show them with something like:: - tc -s filter show dev tap1 parent 1: + tc -s filter show dev dtap1 parent 1: Examples of testpmd flow rules ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -174,135 +164,25 @@ The IPC synchronization of Rx/Tx queues is currently limited: - Maximum 8 queues shared - Synchronized on probing, but not on later port update -Example -------- - -The following is a simple example of using the TAP PMD with the Pktgen -packet generator. It requires that the ``socat`` utility is installed on the -test system. - -Build DPDK, then pull down Pktgen and build pktgen using the DPDK SDK/Target -used to build the dpdk you pulled down. - -Run pktgen from the pktgen directory in a terminal with a commandline like the -following:: - - sudo ./app/app/x86_64-native-linux-gcc/app/pktgen -l 1-5 -n 4 \ - --proc-type auto --log-level debug --socket-mem 512,512 --file-prefix pg \ - --vdev=net_tap0 --vdev=net_tap1 -b 05:00.0 -b 05:00.1 \ - -b 04:00.0 -b 04:00.1 -b 04:00.2 -b 04:00.3 \ - -b 81:00.0 -b 81:00.1 -b 81:00.2 -b 81:00.3 \ - -b 82:00.0 -b 83:00.0 -- -T -P -m [2:3].0 -m [4:5].1 \ - -f themes/black-yellow.theme - -.. Note: - - Change the ``-b`` options to exclude all of your physical ports. The - following command line is all one line. - - Also, ``-f themes/black-yellow.theme`` is optional if the default colors - work on your system configuration. See the Pktgen docs for more - information. - -Verify with ``ifconfig -a`` command in a different xterm window, should have a -``dtap0`` and ``dtap1`` interfaces created. - -Next set the links for the two interfaces to up via the commands below:: - - sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0 - sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1 - -Then use socat to create a loopback for the two interfaces:: - - sudo socat interface:dtap0 interface:dtap1 - -Then on the Pktgen command line interface you can start sending packets using -the commands ``start 0`` and ``start 1`` or you can start both at the same -time with ``start all``. The command ``str`` is an alias for ``start all`` and -``stp`` is an alias for ``stop all``. - -While running you should see the 64 byte counters increasing to verify the -traffic is being looped back. You can use ``set all size XXX`` to change the -size of the packets after you stop the traffic. Use pktgen ``help`` -command to see a list of all commands. You can also use the ``-f`` option to -load commands at startup in command line or Lua script in pktgen. RSS specifics ------------- -Packet distribution in TAP is done by the kernel which has a default -distribution. This feature is adding RSS distribution based on eBPF code. -The default eBPF code calculates RSS hash based on Toeplitz algorithm for -a fixed RSS key. It is calculated on fixed packet offsets. For IPv4 and IPv6 it -is calculated over src/dst addresses (8 or 32 bytes for IPv4 or IPv6 -respectively) and src/dst TCP/UDP ports (4 bytes). - -The RSS algorithm is written in file ``tap_bpf_program.c`` which -does not take part in TAP PMD compilation. Instead this file is compiled -in advance to eBPF object file. The eBPF object file is then parsed and -translated into eBPF byte code in the format of C arrays of eBPF -instructions. The C array of eBPF instructions is part of TAP PMD tree and -is taking part in TAP PMD compilation. At run time the C arrays are uploaded to -the kernel via BPF system calls and the RSS hash is calculated by the -kernel. - -It is possible to support different RSS hash algorithms by updating file -``tap_bpf_program.c`` In order to add a new RSS hash algorithm follow these -steps: - -#. Write the new RSS implementation in file ``tap_bpf_program.c`` - - BPF programs which are uploaded to the kernel correspond to - C functions under different ELF sections. - -#. Install ``LLVM`` library and ``clang`` compiler versions 3.7 and above - -#. Use make to compile `tap_bpf_program.c`` via ``LLVM`` into an object file - and extract the resulting instructions into ``tap_bpf_insn.h``:: - - cd bpf; make - -#. Recompile the TAP PMD. - -The C arrays are uploaded to the kernel using BPF system calls. - -``tc`` (traffic control) is a well known user space utility program used to -configure the Linux kernel packet scheduler. It is usually packaged as -part of the ``iproute2`` package. -Since commit 11c39b5e9 ("tc: add eBPF support to f_bpf") ``tc`` can be used -to uploads eBPF code to the kernel and can be patched in order to print the -C arrays of eBPF instructions just before calling the BPF system call. -Please refer to ``iproute2`` package file ``lib/bpf.c`` function -``bpf_prog_load()``. - -An example utility for eBPF instruction generation in the format of C arrays will -be added in next releases - -TAP reports on supported RSS functions as part of dev_infos_get callback: -``RTE_ETH_RSS_IP``, ``RTE_ETH_RSS_UDP`` and ``RTE_ETH_RSS_TCP``. -**Known limitation:** TAP supports all of the above hash functions together -and not in partial combinations. - -Systems supporting flow API ---------------------------- - -- "tc flower" classifier requires linux kernel above 4.2 -- eBPF/RSS requires linux kernel above 4.9 - -+--------------------+-----------------------+ -| RH7.3 | No flow rule support | -+--------------------+-----------------------+ -| RH7.4 | No RSS action support | -+--------------------+-----------------------+ -| RH7.5 | No RSS action support | -+--------------------+-----------------------+ -| SLES 15, | No limitation | -| kernel 4.12 | | -+--------------------+-----------------------+ -| Azure Ubuntu 16.04,| No limitation | -| kernel 4.13 | | -+--------------------+-----------------------+ +The default packet distribution in TAP without flow rules is done by the +kernel which has a default flow based distribution. +When flow rules are used to distribute packets across a set of queues +an eBPF program is used to calculate the RSS based on Toeplitz algorithm for +with the given key. + +The hash is calculated for IPv4 and IPv6, over src/dst addresses +(8 or 32 bytes for IPv4 or IPv6 respectively) and +optionally the src/dst TCP/UDP ports (4 bytes). + Limitations ----------- -* Rx/Tx must have the same number of queues. +- Since TAP device uses a file descriptors to talk to the kernel. + The same number of queues must be specified for receive and transmit. + +- The RSS algorithm only support L3 or L4 functions. It does not support + finer grain selections (for example: only IPV6 packets with extension headers). diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst index a69f24cf99..8652271ed2 100644 --- a/doc/guides/rel_notes/release_24_07.rst +++ b/doc/guides/rel_notes/release_24_07.rst @@ -55,6 +55,13 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **TAP PMD updates.** + + * Fixed support of RSS flow action to work with current Linux + kernels and BPF tooling. Will only be enabled if clang and bpftool + are available. + + * Support up to 8 queues when used by secondary process Removed Items ------------- -- 2.43.0