From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f169.google.com (mail-io0-f169.google.com [209.85.223.169]) by dpdk.org (Postfix) with ESMTP id 9D1C656B7 for ; Sat, 3 Dec 2016 16:56:49 +0100 (CET) Received: by mail-io0-f169.google.com with SMTP id a124so531158606ioe.2 for ; Sat, 03 Dec 2016 07:56:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=K4ngeOi4H72JAwjG5OlxPogeRyPb2gTmD8Mft1BWhj8=; b=DB5wo1bj/0nL0+721tYhogUMcrcDZYlJPQxy1rYuT50Lh3ZxhUIwYxBQFTI19/enO+ es+8hogdiqhWzE7a6820JiN0tOiXwA62TC4v33bTOb+TvbZMi9Mdpzxd5evLLlB68J9/ Q8TPE9RrCShdR1QF+6q3b/1KHw2hGf6q4fOhwFzUUIb4lgoWA9icid80hTlZYL6tq2yw TrcBV97bdpKIclIgA1OXaUJ0CGimCV902/r4uv3JYow7QhH/Vfz+xz15AQ1l+lxZwDQR FYmdwZNSN8qsIWk+Ps8puOvVWERBzcezz6bFE2iKD6bqhebB1MH4qShYJMuDMv5jUpLe LXpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=K4ngeOi4H72JAwjG5OlxPogeRyPb2gTmD8Mft1BWhj8=; b=hjlUZvmkwaoMvoZtdP6uRKLQVV5Ev/3WQlnVWNPv+/wordwQtJoipEQhiBknACt2UH 26TcIETALd90+HRmSLehcYD/Rd9+lO0Ceoe3izuhoW5BSEvK6JtN6njwYimJiwT7eOrm sg39B0BcfYOPAVkNlpPXLNuSoo04mT1agrHC9yYXismi0arQs9SEE7OefSuFWVtFwByI 1UwSv1y+LTzYYMwB2+hynKqb2MZ/TB9ay59l3USxKYtG0KUshQbLV94nOCCnP/KsjRAq 9RrbWRibgW4NDcuiNbtYv7iwb3atdhdbw657IWMwVugpA5mP8qoZTH3b7HuhzHrGwVqi oJ3Q== X-Gm-Message-State: AKaTC01VRIHCn73ZY/gpuLIXqib3fp5KqXHC4XIK3n0fPCSQoPgHAuSC4/UdGst7LV7vtsEba3JrnPOjiAan+w== X-Received: by 10.107.5.210 with SMTP id 201mr40095775iof.189.1480780608318; Sat, 03 Dec 2016 07:56:48 -0800 (PST) MIME-Version: 1.0 Received: by 10.36.66.8 with HTTP; Sat, 3 Dec 2016 07:56:48 -0800 (PST) From: Dave Myer Date: Sat, 3 Dec 2016 07:56:48 -0800 Message-ID: To: users@dpdk.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-users] Intel NIC Flow director? X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Dec 2016 15:56:50 -0000 G'day, Thanks to the DPDK community for all the interesting questions and answers on this list. This isn't entirely a DPDK question, but I've been trying to use the Flow Bifurcation, or Intel "Flow Director, but I'm a little confused on exactly how multicast is supposed to work, and am also having difficulty defining the flow-type rules (l4proto not working). My general objective is to use DPDK for manipulation of multicast data plane traffic, so ideally, I'd like to forward non-link local multicast to a DPDK VF device, and everything else including ICMP, IGMP, PIM to the main linux kernel PF device. The idea would be to allow the multicast control plane, like IGMP to be handled by smcroute running in linux, and let DPDK handle the data plane. Using the the various guides, the VFs are created as follows, but I'm including lots of output to be clear and so others can follow in my footsteps. Most guides didn't highlight the dmesg, so I hope that helps others understand better what's happening: I found that the latest 4.4.6 ixgbe driver is required, as the Ubuntu 16.04 driver is too old. These are the steps to create the VF NIC: #--------------------------------------------------------------- # Kernel version uname -a Linux dpdkhost 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux # Ubuntu release cat /etc/*release | grep -i description DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS" # Kernel boot parameters # Note the "iommu=pt" allows the VFs cat /etc/default/grub | grep huge GRUB_CMDLINE_LINUX_DEFAULT="hugepages=8192 isolcpus=2-15 iommu=pt" # IXGBE configuration, where 3 allows the 256K buffer size #root@dpdkhost:/home/das/ixgbe-4.4.6# cat README | grep -A 6 -e '^FdirPballoc' #FdirPballoc #----------- #Valid Range: 1-3 #Specifies the Flow Director allocated packet buffer size. #1 = 64k #2 = 128k #3 = 256 # See also: https://github.com/torvalds/linux/blob/master/Documentation/networking/ixgbe.txt cat /etc/modprobe.d/ixgbe.conf options ixgbe FdirPballoc=3 # DPDK NIC status BEFORE binding to ixgbe /root/dpdk/tools/dpdk-devbind.py --status | grep 0000 0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio unused=ixgbe 0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio unused=ixgbe 0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb unused=igb_uio 0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb unused=igb_uio # Bind target NIC to Linux IXGBE /root/dpdk/tools/dpdk-devbind.py -b ixgbe 0000:05:00.1 # Dmesg when binding NIC to ixgbe, noting the "Enabled Features: RxQ: 16 TxQ: 16 FdirHash DCA" [ 668.609718] ixgbe: 0000:05:00.1: ixgbe_check_options: FCoE Offload feature enabled [ 668.766451] ixgbe 0000:05:00.1 enp5s0f1: renamed from eth0 [ 668.766527] ixgbe 0000:05:00.1: PCI Express bandwidth of 32GT/s available [ 668.766530] ixgbe 0000:05:00.1: (Speed:5.0GT/s, Width: x8, Encoding Loss:20%) [ 668.766615] ixgbe 0000:05:00.1 enp5s0f1: MAC: 2, PHY: 18, SFP+: 6, PBA No: E66560-002 [ 668.766617] ixgbe 0000:05:00.1: 00:1b:21:66:a9:81 [ 668.766619] ixgbe 0000:05:00.1 enp5s0f1: Enabled Features: RxQ: 16 TxQ: 16 FdirHash DCA [ 668.777948] ixgbe 0000:05:00.1 enp5s0f1: Intel(R) 10 Gigabit Network Connection # DPDK NIC status AFTER binding to ixgbe /root/dpdk/tools/dpdk-devbind.p# DPDK NIC status BEFORE binding to ixgbey --status | grep 0000 0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio unused=ixgbe 0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb unused=igb_uio 0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb unused=igb_uio 0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=enp5s0f1 drv=ixgbe unused=igb_uio # ixgbe driver version ethtool -i enp5s0f1 driver: ixgbe version: 4.4.6 firmware-version: 0x18b30001 expansion-rom-version: bus-info: 0000:05:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes # Create the VF echo 1 > /sys/bus/pci/devices/0000\:05\:00.1/sriov_numvfs # DPDK NIC status AFTER creating the new VF /root/dpdk/tools/dpdk-devbind.py --status | grep 0000 0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio unused=ixgbe 0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb unused=igb_uio 0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb unused=igb_uio 0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=enp5s0f1 drv=ixgbe unused=igb_uio 0000:05:10.1 '82599 Ethernet Controller Virtual Function' if=eth0 drv=ixgbevf unused=igb_uio # Highlighting the new virtual VF device that just got created /root/dpdk/tools/dpdk-devbind.py --status | grep 0000 | grep Virt 0000:05:10.1 '82599 Ethernet Controller Virtual Function' if=eth0 drv=ixgbevf unused=igb_uio # Dmesg when creating the VF [ 736.643865] ixgbe 0000:05:00.1: SR-IOV enabled with 1 VFs [ 736.643870] ixgbe 0000:05:00.1: configure port vlans to keep your VFs secure [ 736.744382] pci 0000:05:10.1: [8086:10ed] type 00 class 0x020000 [ 736.744436] pci 0000:05:10.1: can't set Max Payload Size to 256; if necessary, use "pci=pcie_bus_safe" and report a bug [ 736.762714] ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver - version 2.12.1-k [ 736.762717] ixgbevf: Copyright (c) 2009 - 2015 Intel Corporation. [ 736.762771] ixgbevf 0000:05:10.1: enabling device (0000 -> 0002) [ 736.763967] ixgbevf 0000:05:10.1: PF still in reset state. Is the PF interface up? [ 736.763968] ixgbevf 0000:05:10.1: Assigning random MAC address [ 736.806083] ixgbevf 0000:05:10.1: 0a:ae:33:3c:2c:79 [ 736.806087] ixgbevf 0000:05:10.1: MAC: 1 [ 736.806090] ixgbevf 0000:05:10.1: Intel(R) 82599 Virtual Function #--------------------------------------------------------------- I'm not sure if that message "can't set Max Payload Size to 256" is bad? Maybe this is why the flow-director l4proto flows shown below don't work? #--------------------------------------------------------------- # Ethtool shows the new VRF interface is ixgbevf ethtool -i eth0 driver: ixgbevf version: 2.12.1-k firmware-version: expansion-rom-version: bus-info: 0000:05:10.1 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: yes supports-priv-flags: no # Bind the new VF to DPDK /root/dpdk/tools/dpdk-devbind.py --bind=igb_uio 0000:05:10.1 /root/dpdk/tools/dpdk-devbind.py --status | grep 0000 0000:05:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio unused=ixgbe 0000:05:10.1 '82599 Ethernet Controller Virtual Function' drv=igb_uio unused=ixgbevf 0000:01:00.0 '82576 Gigabit Network Connection' if=enp1s0f0 drv=igb unused=igb_uio 0000:01:00.1 '82576 Gigabit Network Connection' if=enp1s0f1 drv=igb unused=igb_uio 0000:05:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=enp5s0f1 drv=ixgbe unused=igb_uio #--------------------------------------------------------------- The VF is now setup, but we need to direct traffic to it before the DPDK process could receive traffic. Enable flow director: #--------------------------------------------------------------- # Enable the flow-director feature ethtool -K enp5s0f1 ntuple on ethtool --show-ntuple enp5s0f1 4 RX rings available Total 0 rules #--------------------------------------------------------------- Now time for rules, where first I'd like to explicitly direct traffic to linux kernel #--------------------------------------------------------------- # ICMP to main queue, so ping will be via Linux kernel ethtool --config-ntuple enp5s0f1 flow-type ip4 l4proto 1 action 0 loc 1 rmgr: Cannot insert RX class rule: Operation not supported # Try again with latest ethtool 4.8 root@dpdkhost:/home/das/ethtool-4.8# ./ethtool --config-ntuple enp5s0f1 flow-type ip4 l4proto 1 action 0 loc 1 rmgr: Cannot insert RX class rule: Operation not supported #--------------------------------------------------------------- Any thoughts on why "l4proto" doesn't seem to work? Am I being sillly? I'd also like to direct IGMP and PIM to linux via: #--------------- # IGMP ethtool --config-ntuple enp5s0f1 flow-type ip4 l4proto 2 action 0 # PIM ethtool --config-ntuple enp5s0f1 flow-type ip4 l4proto 103 action 0 #--------------- Regardless of the l4proto filters, continue trying to direct multicast to the VF: #--------------------------------------------------------------- # Multicast local to main queue (224.0.0.0/24) ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m 255.255.255.0 action 0 loc 2 # Great the rule went in, as shown: ethtool --show-ntuple enp5s0f1 4 RX rings available Total 1 rules Filter: 2 Rule Type: Raw IPv4 Src IP addr: 0.0.0.0 mask: 255.255.255.255 Dest IP addr: 0.0.0.0 mask: 255.255.255.0 TOS: 0x0 mask: 0xff Protocol: 0 mask: 0xff L4 bytes: 0x0 mask: 0xffffffff VLAN EtherType: 0x0 mask: 0xffff VLAN: 0x0 mask: 0xffff User-defined: 0x0 mask: 0xffffffffffffffff Action: Direct to queue 0 # Now direct all other multicast to the DPDK VF queue 1 (224.0.0.0/4) ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m 240.0.0.0 action 1 loc 3 rmgr: Cannot insert RX class rule: Invalid argument #--------------------------------------------------------------- That's weird, what's wrong? Why won't the second (2nd) rule go in? Try removing and adding only the non-local multicast #--------------------------------------------------------------- # Remove the link local multicast rule ethtool --config-ntuple enp5s0f1 delete 2 # Direct all multicast to queue 1 (224.0.0.0/4) ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m 240.0.0.0 action 1 loc 3 # Great the rule went in that time, as shown: ethtool --show-ntuple enp5s0f1 4 RX rings available Total 1 rules Filter: 3 Rule Type: Raw IPv4 Src IP addr: 0.0.0.0 mask: 255.255.255.255 Dest IP addr: 0.0.0.0 mask: 240.0.0.0 TOS: 0x0 mask: 0xff Protocol: 0 mask: 0xff L4 bytes: 0x0 mask: 0xffffffff VLAN EtherType: 0x0 mask: 0xffff VLAN: 0x0 mask: 0xffff User-defined: 0x0 mask: 0xffffffffffffffff Action: Direct to queue 1 # Can't seem to add the other rule that worked before now either. Weird. ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m 0.0.0.255 action 0 loc 2 rmgr: Cannot insert RX class rule: Invalid argument # What about not specificying the "location" or rule number? ethtool --config-ntuple enp5s0f1 delete 3 ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m 255.255.255.0 action 0 Added rule with ID 2045 # Fingers crossed... ethtool --config-ntuple enp5s0f1 flow-type ip4 dst-ip 224.0.0.0 m 240.0.0.0 action 1 rmgr: Cannot insert RX class rule: Invalid argument # Doh! #--------------------------------------------------------------- Ok, so only one (1) multicast rule seems to go in, but I wonder if that's working? For this test, there is a Cisco switch/router doing a multicast ping from 172.16.1.1 to 226.1.1.1, and also doing a unicast ping from enp5s0f1. #--------------------------------------------------------------- # Bind the VF back to linux to allow tcpdump /root/dpdk/tools/dpdk-devbind.py --bind=ixgbevf 0000:05:10.1 #Bring up PF interface ifconfig enp5s0f1 up # Bring up VF ifconfig eth0 up # Check for the traffic on the PF enp5s0f1 tcpdump -c 10 -nei enp5s0f1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on enp5s0f1, link-type EN10MB (Ethernet), capture size 262144 bytes 15:33:22.472673 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4 (0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id 2074, seq 841, length 64 15:33:22.473482 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4 (0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074, seq 841, length 64 15:33:23.009617 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4 (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq 476, length 180 15:33:23.471987 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4 (0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id 2074, seq 842, length 64 15:33:23.472414 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4 (0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074, seq 842, length 64 15:33:24.471968 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4 (0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id 2074, seq 843, length 64 15:33:24.523256 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4 (0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074, seq 843, length 64 15:33:25.009659 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4 (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq 477, length 180 15:33:25.473568 00:1b:21:66:a9:81 > 00:21:55:84:a3:3f, ethertype IPv4 (0x0800), length 98: 172.16.1.20 > 172.16.1.1: ICMP echo request, id 2074, seq 844, length 64 15:33:25.478962 00:21:55:84:a3:3f > 00:1b:21:66:a9:81, ethertype IPv4 (0x0800), length 98: 172.16.1.1 > 172.16.1.20: ICMP echo reply, id 2074, seq 844, length 64 10 packets captured 10 packets received by filter 0 packets dropped by kernel # That's weird. Didn't expect any traffic multicast, but we got a mix of the unicast ( 172.16.1.1 > 172.16.1.20 ) and multicast (172.16.1.1 > 226.1.1.1). Why is the multicast traffic to 226.1.1.1 still hitting this interface? # Check the VF eth0 tcpdump -c 10 -nei eth0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:34:48.435652 00:21:55:84:a3:3f > 01:00:5e:00:00:01, ethertype IPv4 (0x0800), length 60: 172.16.1.1 > 224.0.0.1: igmp query v3 15:34:49.019941 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4 (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq 519, length 180 15:34:51.056169 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4 (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq 520, length 180 15:34:53.056236 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4 (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq 521, length 180 15:34:55.056390 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4 (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq 522, length 180 15:34:57.056578 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4 (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq 523, length 180 15:34:58.743276 00:21:55:84:a3:3f > 01:00:5e:00:00:01, ethertype IPv4 (0x0800), length 60: 172.16.1.1 > 224.0.0.1: igmp query v3 15:34:59.056686 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4 (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq 524, length 180 15:35:01.056869 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4 (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq 525, length 180 15:35:03.056988 00:21:55:84:a3:3f > 01:00:5e:01:01:01, ethertype IPv4 (0x0800), length 214: 172.16.1.1 > 226.1.1.1: ICMP echo request, id 5, seq 526, length 180 10 packets captured 10 packets received by filter 0 packets dropped by kernel # Ok, so that's good. Multicast traffic only going to the VF. #--------------------------------------------------------------- This is kind of working as I would expect, as the VF is getting the multicast traffic, but why is the PF enp5s0f1 still getting multicast? I thought flow director would have sent all the multicast ONLY to the VF eth0? Just for future searching, it seem the PF only gets the flow director (fdir) counters: #--------------------------------------------------------------- root@smtwin1:/home/das/ethtool-4.8# ethtool -S enp5s0f1 | grep fdir fdir_match: 1040 fdir_miss: 2653 fdir_overflow: 0 root@smtwin1:/home/das/ethtool-4.8# ethtool -S eth0 NIC statistics: rx_packets: 934 tx_packets: 8 rx_bytes: 191520 tx_bytes: 648 tx_busy: 0 tx_restart_queue: 0 tx_timeout_count: 0 multicast: 933 rx_csum_offload_errors: 0 tx_queue_0_packets: 8 tx_queue_0_bytes: 648 tx_queue_0_bp_napi_yield: 0 tx_queue_0_bp_misses: 0 tx_queue_0_bp_cleaned: 0 rx_queue_0_packets: 934 rx_queue_0_bytes: 191520 rx_queue_0_bp_poll_yield: 0 rx_queue_0_bp_misses: 0 rx_queue_0_bp_cleaned: 0 #--------------------------------------------------------------- Thanks in advance for your help! regards, Dave Helpful reference pages: http://dpdk.org/doc/guides/howto/flow_bifurcation.html https://dpdksummit.com/Archive/pdf/2016Userspace/Day02-Session05-JingjingWu-Userspace2016.pdf http://rhelblog.redhat.com/2015/10/02/getting-the-best-of-both-worlds-with-queue-splitting-bifurcated-driver/ https://github.com/pavel-odintsov/fastnetmon/wiki/Traffic-filtration-using-NIC-capabilities-on-wire-speed-(10GE,-14Mpps)