DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Qiu, Michael" <michael.qiu@intel.com>
To: "Assaad, Sami (Sami)" <sami.assaad@alcatel-lucent.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] How do you setup a VM in Promiscuous Mode using PCI Pass-Through (SR-IOV)?
Date: Wed, 20 May 2015 09:19:19 +0000	[thread overview]
Message-ID: <533710CFB86FA344BFBF2D6802E602860467F9C0@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <9478F0FB69DAA249AF0A9BDA1E6ED9521881BDE6@US70TWXCHMBA07.zam.alcatel-lucent.com>



 25 <http://www.cs.fsu.edu/%7Ebaker/devices/lxr/http/source/linux/Documentation/Intel-IOMMU.txt#L25> What is RMRR?
 26 <http://www.cs.fsu.edu/%7Ebaker/devices/lxr/http/source/linux/Documentation/Intel-IOMMU.txt#L26> -------------
 27 <http://www.cs.fsu.edu/%7Ebaker/devices/lxr/http/source/linux/Documentation/Intel-IOMMU.txt#L27> 
 28 <http://www.cs.fsu.edu/%7Ebaker/devices/lxr/http/source/linux/Documentation/Intel-IOMMU.txt#L28> There are some devices the BIOS controls, for e.g USB devices to perform
 29 <http://www.cs.fsu.edu/%7Ebaker/devices/lxr/http/source/linux/Documentation/Intel-IOMMU.txt#L29> PS2 emulation. The regions of memory used for these devices are marked
 30 <http://www.cs.fsu.edu/%7Ebaker/devices/lxr/http/source/linux/Documentation/Intel-IOMMU.txt#L30> reserved in the e820 map. When we turn on DMA translation, DMA to those
 31 <http://www.cs.fsu.edu/%7Ebaker/devices/lxr/http/source/linux/Documentation/Intel-IOMMU.txt#L31> regions will fail. Hence BIOS uses RMRR to specify these regions along with
 32 <http://www.cs.fsu.edu/%7Ebaker/devices/lxr/http/source/linux/Documentation/Intel-IOMMU.txt#L32> devices that need to access these regions. OS is expected to setup
 33 <http://www.cs.fsu.edu/%7Ebaker/devices/lxr/http/source/linux/Documentation/Intel-IOMMU.txt#L33> unity mappings for these regions for these devices to access these regions.


So what type of your NIC? Is on-board device  or a plug in device?

Thanks,
Michael

On 5/20/2015 3:24 AM, Assaad, Sami (Sami) wrote:
> Hello Michael,
>
> I've updated the kernel and QEMU. Here are the packages I'm using:
>
> --> CentOS 7 - 3.10.0-229.4.2.el7.x86_64
>     - qemu-kvm-1.5.3-86.el7_1.2.x86_64
>     - libvirt-1.2.8-16.el7_1.3.x86_64
>     - virt-manager-1.1.0-12.el7.noarch
>     - virt-what-1.13-5.el7.x86_64
>     - libvirt-glib-0.1.7-3.el7.x86_64
>
> I've modified the virtual machine XML file to include the following:
>
> <hostdev mode='subsystem' type='pci' managed='yes'>
>   <driver name='vfio'/>
>     <source>
>       <address domain='0x0000' bus='0x04' slot='0x10' function='0x0'/>
>     </source>
> </hostdev>
> <hostdev mode='subsystem' type='pci' managed='yes'>
>   <driver name='vfio'/>
>     <source>
>       <address domain='0x0000' bus='0x04' slot='0x10' function='0x1'/>
>     </source>
> </hostdev>
>
>
> The syslog error I'm obtaining relating to the iommu is the following:
> #dmesg | grep -e DMAR -e IOMMU
>
> [ 3362.370564] vfio-pci 0000:04:00.0: Device is ineligible for IOMMU domain attach due to platform RMRR requirement.  Contact your platform vendor.
>
>
> From the /var/log/messages file, the complete VM log is the following:
>
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): carrier is OFF
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): new Tun device (driver: 'unknown' ifindex: 30)
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): exported as /org/freedesktop/NetworkManager/Devices/29
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (virbr0): bridge port vnet0 was attached
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): enslaved to virbr0
> May 19 15:10:12 ni-nfvhost01 kernel: device vnet0 entered promiscuous mode
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): link connected
> May 19 15:10:12 ni-nfvhost01 kernel: virbr0: port 2(vnet0) entered listening state
> May 19 15:10:12 ni-nfvhost01 kernel: virbr0: port 2(vnet0) entered listening state
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): device state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41]
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): device state change: unavailable -> disconnected (reason 'connection-assumed') [20 30 41]
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: starting connection 'vnet0'
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: Stage 1 of 5 (Device Prepare) scheduled...
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: Stage 1 of 5 (Device Prepare) started...
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): device state change: disconnected -> prepare (reason 'none') [30 40 0]
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: Stage 2 of 5 (Device Configure) scheduled...
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: Stage 1 of 5 (Device Prepare) complete.
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: Stage 2 of 5 (Device Configure) starting...
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): device state change: prepare -> config (reason 'none') [40 50 0]
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: Stage 2 of 5 (Device Configure) successful.
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: Stage 3 of 5 (IP Configure Start) scheduled.
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: Stage 2 of 5 (Device Configure) complete.
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: Stage 3 of 5 (IP Configure Start) started...
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): device state change: config -> ip-config (reason 'none') [50 70 0]
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: Stage 3 of 5 (IP Configure Start) complete.
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): device state change: ip-config -> secondaries (reason 'none') [70 90 0]
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): device state change: secondaries -> activated (reason 'none') [90 100 0]
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): Activation: successful, device activated.
> May 19 15:10:12 ni-nfvhost01 dbus-daemon: dbus[1295]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
> May 19 15:10:12 ni-nfvhost01 dbus[1295]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
> May 19 15:10:12 ni-nfvhost01 systemd: Starting Network Manager Script Dispatcher Service...
> May 19 15:10:12 ni-nfvhost01 systemd: Starting Virtual Machine qemu-vNIDS-VM1.
> May 19 15:10:12 ni-nfvhost01 systemd-machined: New machine qemu-vNIDS-VM1.
> May 19 15:10:12 ni-nfvhost01 systemd: Started Virtual Machine qemu-vNIDS-VM1.
> May 19 15:10:12 ni-nfvhost01 dbus-daemon: dbus[1295]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
> May 19 15:10:12 ni-nfvhost01 dbus[1295]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
> May 19 15:10:12 ni-nfvhost01 systemd: Started Network Manager Script Dispatcher Service.
> May 19 15:10:12 ni-nfvhost01 nm-dispatcher: Dispatching action 'up' for vnet0
> May 19 15:10:12 ni-nfvhost01 kvm: 1 guest now active
> May 19 15:10:12 ni-nfvhost01 systemd: Unit iscsi.service cannot be reloaded because it is inactive.
> May 19 15:10:12 ni-nfvhost01 kernel: vfio-pci 0000:04:00.0: Device is ineligible for IOMMU domain attach due to platform RMRR requirement.  Contact your platform vendor.
> May 19 15:10:12 ni-nfvhost01 kernel: virbr0: port 2(vnet0) entered disabled state
> May 19 15:10:12 ni-nfvhost01 kernel: device vnet0 left promiscuous mode
> May 19 15:10:12 ni-nfvhost01 kernel: virbr0: port 2(vnet0) entered disabled state
> May 19 15:10:12 ni-nfvhost01 avahi-daemon[1280]: Withdrawing workstation service for vnet0.
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): device state change: activated -> unmanaged (reason 'removed') [100 10 36]
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <info>  (vnet0): deactivating device (reason 'removed') [36]
> May 19 15:10:12 ni-nfvhost01 NetworkManager[1371]: <warn>  (virbr0): failed to detach bridge port vnet0
> May 19 15:10:12 ni-nfvhost01 nm-dispatcher: Dispatching action 'down' for vnet0
> May 19 15:10:12 ni-nfvhost01 journal: Unable to read from monitor: Connection reset by peer
> May 19 15:10:12 ni-nfvhost01 journal: internal error: early end of file from monitor: possible problem:
> 2015-05-19T19:10:12.674077Z qemu-kvm: -device vfio-pci,host=04:00.0,id=hostdev0,bus=pci.0,addr=0x9: vfio: failed to set iommu for container: Operation not permitted
> 2015-05-19T19:10:12.674118Z qemu-kvm: -device vfio-pci,host=04:00.0,id=hostdev0,bus=pci.0,addr=0x9: vfio: failed to setup container for group 19
> 2015-05-19T19:10:12.674128Z qemu-kvm: -device vfio-pci,host=04:00.0,id=hostdev0,bus=pci.0,addr=0x9: vfio: failed to get group 19
> 2015-05-19T19:10:12.674141Z qemu-kvm: -device vfio-pci,host=04:00.0,id=hostdev0,bus=pci.0,addr=0x9: Device initialization failed.
> 2015-05-19T19:10:12.674155Z qemu-kvm: -device vfio-pci,host=04:00.0,id=hostdev0,bus=pci.0,addr=0x9: Device 'vfio-pci' could not be initialized
>
> May 19 15:10:12 ni-nfvhost01 kvm: 0 guests now active
> May 19 15:10:12 ni-nfvhost01 systemd-machined: Machine qemu-vNIDS-VM1 terminated.
> May 19 15:11:01 ni-nfvhost01 systemd: Created slice user-0.slice.
> May 19 15:11:01 ni-nfvhost01 systemd: Starting Session 329 of user root.
>
>
> Overall Hypothesis: The issue seems to be related to the Ethernet Controller's interfaces which I'm trying to bring into the VM. My Ethernet Controller is : Intel 10G x540-AT2 (rev 01).
>                     The problem is associated to RMRR. 
>                     Can this issue be attributed to my BIOS? My Bios is the following: ProLiant System BIOS P89 V1.21 11/03/2014.
>
> Thanks in advance.
>
> Best Regards,
> Sami.
>
> -----Original Message-----
> From: Qiu, Michael [mailto:michael.qiu@intel.com] 
> Sent: Monday, May 18, 2015 6:01 AM
> To: Assaad, Sami (Sami); Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] How do you setup a VM in Promiscuous Mode using PCI Pass-Through (SR-IOV)?
>
> Hi, Sami
>
> Could you mind to supply the syslog? Especially iommu related parts.
>
> Also you could update the qemu or kernel to see if this issue still exists.
>
>
> Thanks,
> Michael
>
> On 5/16/2015 3:31 AM, Assaad, Sami (Sami) wrote:
>> On Fri, May 15, 2015 at 12:54:19PM +0000, Assaad, Sami (Sami) wrote:
>>> Thanks Bruce for your reply.
>>>
>>> Yes, your idea of bringing the PF into the VM looks like an option. However, how do you configure the physical interfaces within the VM supporting SRIOV?
>>> I always believed that the VM needed to be associated with a virtual/emulated interface card. With your suggestion, I would actually configure the physical interface card/non-emulated within the VM.
>>>
>>> If you could provide me some example configuration commands, it would be really appreciated. 
>>>
>> You'd pass in the PF in the same way as the VF, just skip all the steps creating the VF on the host. To the system and hypervisor, both are just PCI devices!
>>
>> As for configuration, the setup and configuration of the PF in the guest is exactly the same as on the host - it's the same hardware with the same PCI bars.
>> It's the IOMMU on your platform that takes care of memory isolation and address translation and that should work with either PF or VF.
>>
>> Regards,
>> /Bruce
>>
>>> Thanks in advance.
>>>
>>> Best Regards,
>>> Sami.
>>>
>>> -----Original Message-----
>>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
>>> Sent: Friday, May 15, 2015 5:27 AM
>>> To: Stephen Hemminger
>>> Cc: Assaad, Sami (Sami); dev@dpdk.org
>>> Subject: Re: [dpdk-dev] How do you setup a VM in Promiscuous Mode using PCI Pass-Through (SR-IOV)?
>>>
>>> On Thu, May 14, 2015 at 04:47:19PM -0700, Stephen Hemminger wrote:
>>>> On Thu, 14 May 2015 21:38:24 +0000
>>>> "Assaad, Sami (Sami)" <sami.assaad@alcatel-lucent.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> My Hardware consists of the following:
>>>>>   - DL380 Gen 9 Server supporting two Haswell Processors (Xeon CPU E5-2680 v3 @ 2.50GHz)
>>>>>   - An x540 Ethernet Controller Card supporting 2x10G ports.
>>>>>
>>>>> Software:
>>>>>   - CentOS 7 (3.10.0-229.1.2.el7.x86_64)
>>>>>   - DPDK 1.8
>>>>>
>>>>> I want all the network traffic received on the two 10G ports to be transmitted to my VM. The issue is that the Virtual Function / Physical Functions have setup the internal virtual switch to only route Ethernet packets with destination MAC address matching the VM virtual interface MAC. How can I configure my virtual environment to provide all network traffic to the VM...i.e. set the virtual functions for both PCI devices in Promiscuous mode?
>>>>>
>>>>> [ If a l2fwd-vf example exists, this would actually solve this 
>>>>> problem ... Is there a DPDK l2fwd-vf example available? ]
>>>>>
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Best Regards,
>>>>> Sami Assaad.
>>>> This is a host side (not DPDK) issue.
>>>>
>>>> Intel PF driver will not allow guest (VF) to go into promiscious 
>>>> mode since it would allow traffic stealing which is a security violation.
>>> Could you maybe try passing the PF directly into the VM, rather than a VF based off it? Since you seem to want all traffic to go to the one VM, there seems little point in creating a VF on the device, and should let the VM control the whole NIC directly.
>>>
>>> Regards,
>>> /Bruce
>> Hi Bruce,
>>
>> I was provided two options:
>> 1. Pass the PF directly into the VM
>> 2. Use ixgbe VF mirroring
>>
>> I decided to first try your proposal of passing the PF directly into the VM. However, I ran into some issues. 
>> But prior to providing the problem details, the following is my  server environment:
>> I'm using CentOS 7 KVM/QEMU
>> [root@ni-nfvhost01 qemu]# uname -a
>> Linux ni-nfvhost01 3.10.0-229.1.2.el7.x86_64 #1 SMP Fri Mar 27 
>> 03:04:26 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>>
>> [root@ni-nfvhost01 qemu]# lspci -n -s 04:00.0
>> 04:00.0 0200: 8086:1528 (rev 01)
>>
>> [root@ni-nfvhost01 qemu]# lspci | grep -i eth
>> 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
>> Gigabit Ethernet PCIe (rev 01)
>> 02:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
>> Gigabit Ethernet PCIe (rev 01)
>> 02:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
>> Gigabit Ethernet PCIe (rev 01)
>> 02:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
>> Gigabit Ethernet PCIe (rev 01)
>> 04:00.0 Ethernet controller: Intel Corporation Ethernet Controller 
>> 10-Gigabit X540-AT2 (rev 01)
>> 04:00.1 Ethernet controller: Intel Corporation Ethernet Controller 
>> 10-Gigabit X540-AT2 (rev 01)
>>
>> - The following is my grub execution:
>> [root@ni-nfvhost01 qemu]# cat  /proc/cmdline
>> BOOT_IMAGE=/vmlinuz-3.10.0-229.1.2.el7.x86_64 
>> root=/dev/mapper/centos-root ro rd.lvm.lv=centos/swap 
>> vconsole.font=latarcyrheb-sun17 rd.lvm.lv=centos/root crashkernel=auto 
>> vconsole.keymap=us rhgb quiet iommu=pt intel_iommu=on hugepages=8192
>>
>>
>> This is the error I'm obtaining when the VM has one of the PCI devices associated to the Ethernet Controller card:
>> [root@ni-nfvhost01 qemu]# qemu-system-x86_64 -m 2048 -vga std -vnc :0 
>> -net none -enable-kvm -device vfio-pci,host=04:00.0,id=net0
>> qemu-system-x86_64: -device vfio-pci,host=04:00.0,id=net0: vfio: 
>> failed to set iommu for container: Operation not permitted
>> qemu-system-x86_64: -device vfio-pci,host=04:00.0,id=net0: vfio: 
>> failed to setup container for group 19
>> qemu-system-x86_64: -device vfio-pci,host=04:00.0,id=net0: vfio: 
>> failed to get group 19
>> qemu-system-x86_64: -device vfio-pci,host=04:00.0,id=net0: Device initialization failed.
>> qemu-system-x86_64: -device vfio-pci,host=04:00.0,id=net0: Device 
>> 'vfio-pci' could not be initialized
>>
>> Hence, I tried the following, but again with no success :-( Decided to 
>> bind the  PCI device associated to the Ethernet Controller to vfio (To 
>> enable the VM PCI device access and have the IOMMU operate properly) Here are the commands I used to configure the PCI pass-through for the Ethernet device:
>>
>> # modprobe vfio-pci
>>
>> 1) Device I want to assign as passthrough:
>> 04:00.0
>>
>> 2) Find the vfio group of this device
>>
>> # readlink /sys/bus/pci/devices/0000:04:00.0/iommu_group
>> ../../../../kernel/iommu_groups/19
>>  
>> ( IOMMU Group = 19 )
>>
>> 3) Check the devices in the group:
>> # ls /sys/bus/pci/devices/0000:04:00.0/iommu_group/devices/
>> 0000:04:00.0
>>  
>> (so this group has only 1 device)
>>  
>> 4) Unbind from device driver
>> # echo 0000:04:00.0 >/sys/bus/pci/devices/0000:04:00.0/driver/unbind
>>  
>> 5) Find vendor & device ID
>> $ lspci -n -s 04:00.0
>>> 04:00.0 0200: 8086:1528 (rev 01)
>>  
>> 6) Bind to vfio-pci
>> $ echo 8086 1528 > /sys/bus/pci/drivers/vfio-pci/new_id
>>  
>> (this results in a new device node "/dev/vfio/19",  which is what qemu 
>> will use to setup the device for passthrough)
>>  
>> 7) chown the device node so it is accessible by qemu user:
>> # chown qemu /dev/vfio/19; chgrp qemu /dev/vfio/19
>>
>> Now, on the VM side, using virt-manager, I removed the initial PCI device and re-added it.
>> After re-booting the VM, I obtained the same issue.
>>
>> What am I doing wrong?
>>
>> Thanks a million!
>>
>> Best Regards,
>> Sami.
>>
>>
>


  parent reply	other threads:[~2015-05-20  9:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-14 21:38 Assaad, Sami (Sami)
2015-05-14 23:47 ` Stephen Hemminger
2015-05-15  9:27   ` Bruce Richardson
2015-05-15  9:31     ` Ananyev, Konstantin
2015-05-15 12:59       ` Assaad, Sami (Sami)
2015-05-15 13:56         ` Ananyev, Konstantin
2015-05-15 12:54     ` Assaad, Sami (Sami)
2015-05-15 13:08       ` Bruce Richardson
2015-05-15 19:30         ` Assaad, Sami (Sami)
2015-05-18 10:01           ` Qiu, Michael
2015-05-19 19:23             ` Assaad, Sami (Sami)
2015-05-20  9:14               ` Qiu, Michael
2015-05-20  9:19               ` Qiu, Michael [this message]
2015-05-20 10:56               ` Gonzalez Monroy, Sergio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=533710CFB86FA344BFBF2D6802E602860467F9C0@SHSMSX101.ccr.corp.intel.com \
    --to=michael.qiu@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=sami.assaad@alcatel-lucent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).