Hi Stephen, Thea,

If you uses SRIOV, then containers behave essentially like VMs, packets will be exchanged through the PCIe bus and switched on the NIC ASIC, which as Stephen mentions, will identify MAC addresses as « itself » and packets do not physically get out of the NIC . I’d argue these days it’s not as much of a problem. You can typically have a PCIe5 x16 ConnectX7 that has a bus BW of 500Gbps but has actually only one or two 100G ports, so you’ve got plenty of spare bandwidth for internal host exchange. We know NICs are getting smart and take a broader role than pure « external »  I/O.

Internal host networking without going through PCIe can be handled like VMs too : with virtio and the DPDK vhost driver. Memory copies are involved in that case.

I suspect for your matter at hand Thea, the easiest is to use SRIOV. Research-wise, a simple solution is to use —networking=host … 

Eg this is working well but uses privileged container and lets the docker access all host network for fastclick :
sudo docker run -v /mnt/huge:/dev/hugepages -it --privileged --network host tbarbette/fastclick-dpdk:generic --dpdk -a $VF_PCIE_ADDR -- -e "FromDPDKDevice(0) -> Discard;" 

The related sample Dockerfile can be found at : https://github.com/tbarbette/fastclick/blob/main/etc/Dockerfile.dpdk

A problem also with DPDK-based dockers is that you generally don’t want to keep the -march=native, so personally I got that script to build a version of my docker image with many architectures : https://github.com/tbarbette/fastclick/blob/main/etc/docker-build.sh so the user can use the image that targets their own arch.


May that be helpful,

Tom

Le 20 nov. 2024 à 08:10, Kompella V, Purnima <Kompella.Purnima@commscope.com> a écrit :

Vous n’obtenez pas souvent d’e-mail à partir de kompella.purnima@commscope.com. Pourquoi c’est important
Hi Stephen,
 
A parallel question about packet-flow between VFs of the same PF when VFs are assigned to different containers on the same host server
Create 2 SRIOV-VFs of a PF in the host and assign them to 2 containers (one VF per container)
send IP packet from container-1 to container-2 (SRC_MAC address in this ethernet frame = container1 VF’s MAC address, DST_MAC address = container2 VF’s MAC address),
container-1 sends packet by calling rte_eth_tx_burst()
container-2 is polling for packets from its VF by callingrte_eth_rx_burst()
 
Will the packet in above scenario leave the host server, go the switch and then come back to the same host machine for entering container-2 ?
Or, is the SRIOV in PF-NIC smart to identify that SRC_MAC and DST_MAC of the ethernet frame are its own VFs and hence it routes the packet locally within the NIC (packet doesn’t reach the switch at all) ?
 
Regards,
Purnima
 
 
From: Stephen Hemminger <stephen@networkplumber.org> 
Sent: Wednesday, November 20, 2024 3:34 AM
To: Thea Corinne Rossman <thea.rossman@cs.stanford.edu>
Cc: users@dpdk.org
Subject: Re: Containernet (Docker/Container Networking) with DPDK?
 

 
On Tue, 19 Nov 2024 13:39:38 -0800
Thea Corinne Rossman <thea.rossman@cs.stanford.edu> wrote:
 
> This is SO helpful -- thank you so much.
> 
> One follow-up question regarding NICs: can multiple containers on the same
> host share the same PCI device? If I have a host NIC with (say) VFIO driver
> binding, do I have to split it with some kind of SR-IOV so that each
> container has its own "NIC" binding? Or, when running DPDK's "devbind"
> script, can I set up each one with the same PCI address?
 
 
Totally depends on what container system you are using.
If you have two containers sharing same exact PCI device, chaos would ensue.
You might be able to make two VF's on host and pass one to each container;
that would make more sense.