Hi Tom :) 

This is great, and the SR-IOV option feels quite clear now. I'm trying to better understand the virtio-user option as well for communication within the same host. I've looked at the DPDK resources, the links you've shared (Tom), and the original virtio-user paper. 

1) From your email:

> Internal host networking without going through PCIe can be handled like VMs too: with virtio and the DPDK vhost driver. Memory copies are involved in that case.

Where is the memory copy here? I thought that the point of virtio-user (containers) + vhost-user backend (since on host) is that it is zero-copy. Where does the copy happen? 

2) This may be a basic container networking question. I want to connect multiple (say, three) containers on the same host. From the diagram in the DPDK-provided instructions and virtio-user paper, it appears that a virtual switching infrastructure will be required. (Noting that I believe that Containernet sets up a namespace a la Mininet, but it doesn't set up a "virtual switch".) 

Am I understanding this correctly? Is there additional container networking infrastructure required for connecting containers? Or is the vhost backend + testpmd sufficient? If so, how does the vhost backend "know" where to switch packets? 

Thank you all so much!!
Thea 

On Wed, Nov 20, 2024 at 1:28 AM Tom Barbette <tom.barbette@uclouvain.be> wrote:


Le 20 nov. 2024 à 10:27, Tom Barbette <tom.barbette@uclouvain.be> a écrit :

Hi Stephen, Thea,

If you uses SRIOV, then containers behave essentially like VMs, packets will be exchanged through the PCIe bus and switched on the NIC ASIC, which as Stephen mentions, will identify MAC addresses as « itself » and packets do not physically get out of the NIC . I’d argue these days it’s not as much of a problem. You can typically have a PCIe5 x16 ConnectX7 that has a bus BW of 500Gbps but has actually only one or two 100G ports, so you’ve got plenty of spare bandwidth for internal host exchange. We know NICs are getting smart and take a broader role than pure « external »  I/O.

Internal host networking without going through PCIe can be handled like VMs too : with virtio and the DPDK vhost driver. Memory copies are involved in that case.

I suspect for your matter at hand Thea, the easiest is to use SRIOV. Research-wise, a simple solution is to use —networking=host … 

Eg this is working well but uses privileged container and lets the docker access all host network for fastclick :
sudo docker run -v /mnt/huge:/dev/hugepages -it --privileged --network host tbarbette/fastclick-dpdk:generic --dpdk -a $VF_PCIE_ADDR -- -e "FromDPDKDevice(0) -> Discard;" 

The related sample Dockerfile can be found at : https://github.com/tbarbette/fastclick/blob/main/etc/Dockerfile.dpdk

A problem also with DPDK-based dockers is that you generally don’t want to keep the -march=native, so personally I got that script to build a version of my docker image with many architectures : https://github.com/tbarbette/fastclick/blob/main/etc/docker-build.sh so the user can use the image that targets their own arch.


May that be helpful,

Tom


*sorry I meant Purnima, not Stephen