Hi all,
We are working on migrating our DPDK application from igb_uio
to vfio-pci
. Our target environment is a VMware ESXi host running on an AMD Epyc server with NICs configured for PCI Passthrough to a guest VM with Debian Bookworm (Kernel 6.1.0-39-amd64)
We've encountered a couple of issues.
Problem 1:
Initially, attempting to use vfio-pci failed with an error code of -22, and the /sys/class/iommu/ directory was empty. We discovered the "expose IOMMU to guest OS" option in VMware and enabled it.
This led to a new error:
"The virtual machine cannot be powered on because IOMMU virtualization is not compatible with PCI passthru on AMD platforms"
We found a workaround by adding amd.iommu.supportsPcip = "TRUE" to the VM's configuration. The VM now boots, and the IOMMU is visible in the guest.
However, when we run our DPDK application, it hangs after printing "EAL: VFIO support initialized", and shortly after, the guest kernel panics with a soft lockup error, making the system eventually unresponsive.
BUG: soft lockup - CPU#34 stuck for 75s! [kcompactd0:529]
Problem 2:
Separately, we've noticed that our IOMMU groups are not ideal. Many groups contain not only the NICs we need to bind, but also other devices like PCI bridges.
IOMMU Group 7:
0000:00:17.0 - PCI bridge: VMware PCI Express Root Port
0000:00:17.1
0000:00:17.2
0000:00:17.3
0000:00:17.4
0000:00:17.5
0000:00:17.6
0000:00:17.7
0000:13:00.0 - nic
0000:13:00.1 - nic
0000:14:00.0 - nic
0000:14:00.1 - nic
Questions:
amd.iommu.supportsPcip
workaround, the correct approach here?