* Containernet (Docker/Container Networking) with DPDK?
@ 2024-11-18 5:42 Thea Corinne Rossman
2024-11-19 20:53 ` Thea Corinne Rossman
0 siblings, 1 reply; 13+ messages in thread
From: Thea Corinne Rossman @ 2024-11-18 5:42 UTC (permalink / raw)
To: users
[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]
Hello all!
I'm hoping for some general help getting started with DPDK in a Containernet
<https://containernet.github.io> topology. I have some DPDK experience, but
I'm very new to container networking :). I've been working with an Ubuntu
24.10 VM, though I can run any experiments on Cloudlab (so am not
necessarily tied to a particular architecture).
First question: for setting up the host machine: Do I need to install DPDK,
set up hugepages, etc., on the host, or is configuration in just the
containers sufficient?
Second question: I'm having trouble creating containers that will let me
run DPDK applications. High-level, I understand that I'll need to create or
find a container image that's configured with DPDK and all dependencies, as
well as the Containernet requirements
<https://github.com/containernet/containernet/wiki>.
I tried to build on this: https://github.com/shanakaprageeth/docker-dpdk .
However, when I ran the setup script, I get this error:
```
ERROR: failed to solve: process "/bin/sh -c apt-get install build-essential
git python pciutils vim -y" did not complete successfully: exit code: 100
Unable to find `image 'ubuntu-dpdk:latest' locally
docker: Error response from daemon: pull access denied for ubuntu-dpdk,
repository does not exist or may require 'docker login': denied: requested
access to the resource is denied.
```
When I downloaded the image directly from docker (docker pull
shanakaprageeth/ubuntu-dpdk), no commands on the container worked
(ifconfig, apt-get, etc.). There was no error message.
I think I fundamentally don't understand how container images work and how
to customize them, especially since I'm trying to write Containernet
scripts (vs. actually ssh-ing into containers). I know I'll need a
container that supports DPDK and has it downloaded, and I know it will also
need to meet the Containernet requirements.
Could anyone please point me in the right direction?
Thank you!
[-- Attachment #2: Type: text/html, Size: 2420 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Containernet (Docker/Container Networking) with DPDK?
2024-11-18 5:42 Containernet (Docker/Container Networking) with DPDK? Thea Corinne Rossman
@ 2024-11-19 20:53 ` Thea Corinne Rossman
2024-11-19 21:29 ` Stephen Hemminger
0 siblings, 1 reply; 13+ messages in thread
From: Thea Corinne Rossman @ 2024-11-19 20:53 UTC (permalink / raw)
To: users
[-- Attachment #1: Type: text/plain, Size: 3280 bytes --]
I'm following up on this to ask a more specific question, since my first
question was a bit all over the place. This is regarding the interplay
between the host and the containers when setting up containers that can run
DPDK applications.
Based on what I've found so far, it looks like I will have to fully
configure DPDK on the host and then mount devices onto each container, even
if there's no need to connect the containers to the outside world. (Is this
correct?) If so, I don't fully understand this, since a container/container
network should be self-contained.
- Why do we need to set up DPDK on the host? (Why isn't the container
enough?)
- Why do we need to set up a DPDK-compatible driver on the host NICs? If
the containers are on the same machine, exchanging packets, why would the
host NIC be involved at all? Nothing is going in or out.
- Why do we need to configure hugepages on the host and then mount them
on the container? Why can't you just configure this on the containers? Is
this something that can't be emulated?
Thank you so much again!
On Sun, Nov 17, 2024 at 9:42 PM Thea Corinne Rossman <
thea.rossman@cs.stanford.edu> wrote:
> Hello all!
>
> I'm hoping for some general help getting started with DPDK in a
> Containernet <https://containernet.github.io> topology. I have some DPDK
> experience, but I'm very new to container networking :). I've been working
> with an Ubuntu 24.10 VM, though I can run any experiments on Cloudlab (so
> am not necessarily tied to a particular architecture).
>
> First question: for setting up the host machine: Do I need to install
> DPDK, set up hugepages, etc., on the host, or is configuration in just the
> containers sufficient?
>
> Second question: I'm having trouble creating containers that will let me
> run DPDK applications. High-level, I understand that I'll need to create or
> find a container image that's configured with DPDK and all dependencies, as
> well as the Containernet requirements
> <https://github.com/containernet/containernet/wiki>.
>
> I tried to build on this: https://github.com/shanakaprageeth/docker-dpdk .
> However, when I ran the setup script, I get this error:
>
> ```
> ERROR: failed to solve: process "/bin/sh -c apt-get install
> build-essential git python pciutils vim -y" did not complete successfully:
> exit code: 100
> Unable to find `image 'ubuntu-dpdk:latest' locally
> docker: Error response from daemon: pull access denied for ubuntu-dpdk,
> repository does not exist or may require 'docker login': denied: requested
> access to the resource is denied.
> ```
>
> When I downloaded the image directly from docker (docker pull
> shanakaprageeth/ubuntu-dpdk), no commands on the container worked
> (ifconfig, apt-get, etc.). There was no error message.
>
> I think I fundamentally don't understand how container images work and how
> to customize them, especially since I'm trying to write Containernet
> scripts (vs. actually ssh-ing into containers). I know I'll need a
> container that supports DPDK and has it downloaded, and I know it will also
> need to meet the Containernet requirements.
>
> Could anyone please point me in the right direction?
>
> Thank you!
>
[-- Attachment #2: Type: text/html, Size: 4033 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Containernet (Docker/Container Networking) with DPDK?
2024-11-19 20:53 ` Thea Corinne Rossman
@ 2024-11-19 21:29 ` Stephen Hemminger
2024-11-19 21:39 ` Thea Corinne Rossman
0 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2024-11-19 21:29 UTC (permalink / raw)
To: Thea Corinne Rossman; +Cc: users
On Tue, 19 Nov 2024 12:53:02 -0800
Thea Corinne Rossman <thea.rossman@cs.stanford.edu> wrote:
> I'm following up on this to ask a more specific question, since my first
> question was a bit all over the place. This is regarding the interplay
> between the host and the containers when setting up containers that can run
> DPDK applications.
>
> Based on what I've found so far, it looks like I will have to fully
> configure DPDK on the host and then mount devices onto each container, even
> if there's no need to connect the containers to the outside world. (Is this
> correct?) If so, I don't fully understand this, since a container/container
> network should be self-contained.
>
> - Why do we need to set up DPDK on the host? (Why isn't the container
> enough?)
> - Why do we need to set up a DPDK-compatible driver on the host NICs? If
> the containers are on the same machine, exchanging packets, why would the
> host NIC be involved at all? Nothing is going in or out.
> - Why do we need to configure hugepages on the host and then mount them
> on the container? Why can't you just configure this on the containers? Is
> this something that can't be emulated?
>
Containers are a made up construct. They are made by setting permissions for
namespaces and groups for resources.
In most cases, DPDK works by passing through the raw hardware (PCI device)
to the userspace application. To make it work with a container system you
need to either acquire the resource in an restricted environment and then
allow access to that resource in the restricted container; or you need
to give the restricted container environment lots of privileges to configure
and setup the raw hardware directly. In the latter case, there really is
no point in having containers.
Same applies to hugepages. I don't think hugepages are namespaced either.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Containernet (Docker/Container Networking) with DPDK?
2024-11-19 21:29 ` Stephen Hemminger
@ 2024-11-19 21:39 ` Thea Corinne Rossman
2024-11-19 22:03 ` Stephen Hemminger
2024-11-19 22:14 ` Thomas Monjalon
0 siblings, 2 replies; 13+ messages in thread
From: Thea Corinne Rossman @ 2024-11-19 21:39 UTC (permalink / raw)
To: stephen; +Cc: users
[-- Attachment #1: Type: text/plain, Size: 2492 bytes --]
This is SO helpful -- thank you so much.
One follow-up question regarding NICs: can multiple containers on the same
host share the same PCI device? If I have a host NIC with (say) VFIO driver
binding, do I have to split it with some kind of SR-IOV so that each
container has its own "NIC" binding? Or, when running DPDK's "devbind"
script, can I set up each one with the same PCI address?
On Tue, Nov 19, 2024 at 1:29 PM Stephen Hemminger <
stephen@networkplumber.org> wrote:
> On Tue, 19 Nov 2024 12:53:02 -0800
> Thea Corinne Rossman <thea.rossman@cs.stanford.edu> wrote:
>
> > I'm following up on this to ask a more specific question, since my first
> > question was a bit all over the place. This is regarding the interplay
> > between the host and the containers when setting up containers that can
> run
> > DPDK applications.
> >
> > Based on what I've found so far, it looks like I will have to fully
> > configure DPDK on the host and then mount devices onto each container,
> even
> > if there's no need to connect the containers to the outside world. (Is
> this
> > correct?) If so, I don't fully understand this, since a
> container/container
> > network should be self-contained.
> >
> > - Why do we need to set up DPDK on the host? (Why isn't the container
> > enough?)
> > - Why do we need to set up a DPDK-compatible driver on the host NICs?
> If
> > the containers are on the same machine, exchanging packets, why would
> the
> > host NIC be involved at all? Nothing is going in or out.
> > - Why do we need to configure hugepages on the host and then mount
> them
> > on the container? Why can't you just configure this on the
> containers? Is
> > this something that can't be emulated?
> >
>
> Containers are a made up construct. They are made by setting permissions
> for
> namespaces and groups for resources.
>
> In most cases, DPDK works by passing through the raw hardware (PCI device)
> to the userspace application. To make it work with a container system you
> need to either acquire the resource in an restricted environment and then
> allow access to that resource in the restricted container; or you need
> to give the restricted container environment lots of privileges to
> configure
> and setup the raw hardware directly. In the latter case, there really is
> no point in having containers.
>
> Same applies to hugepages. I don't think hugepages are namespaced either.
>
[-- Attachment #2: Type: text/html, Size: 3096 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Containernet (Docker/Container Networking) with DPDK?
2024-11-19 21:39 ` Thea Corinne Rossman
@ 2024-11-19 22:03 ` Stephen Hemminger
2024-11-20 7:10 ` Kompella V, Purnima
2024-11-19 22:14 ` Thomas Monjalon
1 sibling, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2024-11-19 22:03 UTC (permalink / raw)
To: Thea Corinne Rossman; +Cc: users
On Tue, 19 Nov 2024 13:39:38 -0800
Thea Corinne Rossman <thea.rossman@cs.stanford.edu> wrote:
> This is SO helpful -- thank you so much.
>
> One follow-up question regarding NICs: can multiple containers on the same
> host share the same PCI device? If I have a host NIC with (say) VFIO driver
> binding, do I have to split it with some kind of SR-IOV so that each
> container has its own "NIC" binding? Or, when running DPDK's "devbind"
> script, can I set up each one with the same PCI address?
Totally depends on what container system you are using.
If you have two containers sharing same exact PCI device, chaos would ensue.
You might be able to make two VF's on host and pass one to each container;
that would make more sense.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Containernet (Docker/Container Networking) with DPDK?
2024-11-19 21:39 ` Thea Corinne Rossman
2024-11-19 22:03 ` Stephen Hemminger
@ 2024-11-19 22:14 ` Thomas Monjalon
2024-11-19 23:23 ` Thea Corinne Rossman
1 sibling, 1 reply; 13+ messages in thread
From: Thomas Monjalon @ 2024-11-19 22:14 UTC (permalink / raw)
To: Thea Corinne Rossman; +Cc: stephen, users
19/11/2024 22:39, Thea Corinne Rossman:
> This is SO helpful -- thank you so much.
>
> One follow-up question regarding NICs: can multiple containers on the same
> host share the same PCI device? If I have a host NIC with (say) VFIO driver
> binding, do I have to split it with some kind of SR-IOV so that each
> container has its own "NIC" binding? Or, when running DPDK's "devbind"
> script, can I set up each one with the same PCI address?
You need to split devices.
SR-IOV VF may help, or you can use SF which was designed exactly for this,
with the help of the Linux auxiliary bus.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Containernet (Docker/Container Networking) with DPDK?
2024-11-19 22:14 ` Thomas Monjalon
@ 2024-11-19 23:23 ` Thea Corinne Rossman
2024-11-19 23:30 ` Thomas Monjalon
0 siblings, 1 reply; 13+ messages in thread
From: Thea Corinne Rossman @ 2024-11-19 23:23 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: stephen, users
[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]
Conceptually, for two containers on the same host, how would exchanging
traffic work under-the-hood? Specifically, how is the physical NIC
involved, if at all?
For example, on a physical host: for TX, a userspace application writes a
packet to host memory and pushes its physical address/metadata to the
appropriate NIC's TX queue. The NIC uses the physical address + DMA to
avoid a copy when serializing/sending. (Similar for RX in the other
direction, where the NIC writes to memory.)
I'm not sure how this would translate to a containerized case on a single
host, since traffic shouldn't need to exit and the container network has
its own namespace. Say that two different containers have NICs mapped to
different PCI addresses (split device). If container A appends to its TX
queue, what happens next?
Thanks again for your help.
On Tue, Nov 19, 2024 at 2:14 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> 19/11/2024 22:39, Thea Corinne Rossman:
> > This is SO helpful -- thank you so much.
> >
> > One follow-up question regarding NICs: can multiple containers on the
> same
> > host share the same PCI device? If I have a host NIC with (say) VFIO
> driver
> > binding, do I have to split it with some kind of SR-IOV so that each
> > container has its own "NIC" binding? Or, when running DPDK's "devbind"
> > script, can I set up each one with the same PCI address?
>
> You need to split devices.
> SR-IOV VF may help, or you can use SF which was designed exactly for this,
> with the help of the Linux auxiliary bus.
>
>
>
>
[-- Attachment #2: Type: text/html, Size: 2058 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Containernet (Docker/Container Networking) with DPDK?
2024-11-19 23:23 ` Thea Corinne Rossman
@ 2024-11-19 23:30 ` Thomas Monjalon
0 siblings, 0 replies; 13+ messages in thread
From: Thomas Monjalon @ 2024-11-19 23:30 UTC (permalink / raw)
To: Thea Corinne Rossman; +Cc: stephen, users
20/11/2024 00:23, Thea Corinne Rossman:
> Conceptually, for two containers on the same host, how would exchanging
> traffic work under-the-hood? Specifically, how is the physical NIC
> involved, if at all?
Traffic between containers on the same host does not need to use a PCI device.
You can check virtio-user for this usage:
https://doc.dpdk.org/guides/howto/virtio_user_for_container_networking.html
> For example, on a physical host: for TX, a userspace application writes a
> packet to host memory and pushes its physical address/metadata to the
> appropriate NIC's TX queue. The NIC uses the physical address + DMA to
> avoid a copy when serializing/sending. (Similar for RX in the other
> direction, where the NIC writes to memory.)
>
> I'm not sure how this would translate to a containerized case on a single
> host, since traffic shouldn't need to exit and the container network has
> its own namespace. Say that two different containers have NICs mapped to
> different PCI addresses (split device). If container A appends to its TX
> queue, what happens next?
>
> Thanks again for your help.
>
> On Tue, Nov 19, 2024 at 2:14 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> > 19/11/2024 22:39, Thea Corinne Rossman:
> > > This is SO helpful -- thank you so much.
> > >
> > > One follow-up question regarding NICs: can multiple containers on the
> > same
> > > host share the same PCI device? If I have a host NIC with (say) VFIO
> > driver
> > > binding, do I have to split it with some kind of SR-IOV so that each
> > > container has its own "NIC" binding? Or, when running DPDK's "devbind"
> > > script, can I set up each one with the same PCI address?
> >
> > You need to split devices.
> > SR-IOV VF may help, or you can use SF which was designed exactly for this,
> > with the help of the Linux auxiliary bus.
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Containernet (Docker/Container Networking) with DPDK?
2024-11-19 22:03 ` Stephen Hemminger
@ 2024-11-20 7:10 ` Kompella V, Purnima
2024-11-20 9:27 ` Tom Barbette
0 siblings, 1 reply; 13+ messages in thread
From: Kompella V, Purnima @ 2024-11-20 7:10 UTC (permalink / raw)
To: Stephen Hemminger, Thea Corinne Rossman; +Cc: users
[-- Attachment #1: Type: text/plain, Size: 2824 bytes --]
Hi Stephen,
A parallel question about packet-flow between VFs of the same PF when VFs are assigned to different containers on the same host server
Create 2 SRIOV-VFs of a PF in the host and assign them to 2 containers (one VF per container)
send IP packet from container-1 to container-2 (SRC_MAC address in this ethernet frame = container1 VF's MAC address, DST_MAC address = container2 VF's MAC address),
container-1 sends packet by calling rte_eth_tx_burst()
container-2 is polling for packets from its VF by calling rte_eth_rx_burst()
Will the packet in above scenario leave the host server, go the switch and then come back to the same host machine for entering container-2 ?
Or, is the SRIOV in PF-NIC smart to identify that SRC_MAC and DST_MAC of the ethernet frame are its own VFs and hence it routes the packet locally within the NIC (packet doesn't reach the switch at all) ?
Regards,
Purnima
From: Stephen Hemminger <stephen@networkplumber.org>
Sent: Wednesday, November 20, 2024 3:34 AM
To: Thea Corinne Rossman <thea.rossman@cs.stanford.edu>
Cc: users@dpdk.org
Subject: Re: Containernet (Docker/Container Networking) with DPDK?
On Tue, 19 Nov 2024 13:39:38 -0800 Thea Corinne Rossman wrote: > This is SO helpful -- thank you so much. > > One follow-up question regarding NICs: can multiple contain
Caution: External (stephen@networkplumber.org<mailto:stephen@networkplumber.org>)
Released From Quarantine Details<https://protection.inkyphishfence.com/details?id=Y29tbXNjb3BlL2tvbXBlbGxhLnB1cm5pbWFAY29tbXNjb3BlLmNvbS9jMDQyYmFkOTQxZDYyOWI1ZDk0OTU1OGM4ODE4ZmIxZi8xNzMyMDg1ODQ2LjI2MjQ1MDc=#key=747cf0aed970ed4b6d359ba2d2af6d8b>
Report This Email<https://protection.inkyphishfence.com/report?id=Y29tbXNjb3BlL2tvbXBlbGxhLnB1cm5pbWFAY29tbXNjb3BlLmNvbS9jMDQyYmFkOTQxZDYyOWI1ZDk0OTU1OGM4ODE4ZmIxZi8xNzMyMDg1ODQ2LjI2MjQ1MDc=#key=747cf0aed970ed4b6d359ba2d2af6d8b> FAQ<https://www.inky.com/banner-faq> Protection by INKY<https://www.inky.com/protection-by-inky>
On Tue, 19 Nov 2024 13:39:38 -0800
Thea Corinne Rossman <thea.rossman@cs.stanford.edu<mailto:thea.rossman@cs.stanford.edu>> wrote:
> This is SO helpful -- thank you so much.
>
> One follow-up question regarding NICs: can multiple containers on the same
> host share the same PCI device? If I have a host NIC with (say) VFIO driver
> binding, do I have to split it with some kind of SR-IOV so that each
> container has its own "NIC" binding? Or, when running DPDK's "devbind"
> script, can I set up each one with the same PCI address?
Totally depends on what container system you are using.
If you have two containers sharing same exact PCI device, chaos would ensue.
You might be able to make two VF's on host and pass one to each container;
that would make more sense.
[-- Attachment #2: Type: text/html, Size: 6722 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Containernet (Docker/Container Networking) with DPDK?
2024-11-20 7:10 ` Kompella V, Purnima
@ 2024-11-20 9:27 ` Tom Barbette
2024-11-20 9:28 ` Tom Barbette
0 siblings, 1 reply; 13+ messages in thread
From: Tom Barbette @ 2024-11-20 9:27 UTC (permalink / raw)
To: Kompella V, Purnima; +Cc: Stephen Hemminger, Thea Corinne Rossman, users
[-- Attachment #1: Type: text/plain, Size: 3949 bytes --]
Hi Stephen, Thea,
If you uses SRIOV, then containers behave essentially like VMs, packets will be exchanged through the PCIe bus and switched on the NIC ASIC, which as Stephen mentions, will identify MAC addresses as « itself » and packets do not physically get out of the NIC . I’d argue these days it’s not as much of a problem. You can typically have a PCIe5 x16 ConnectX7 that has a bus BW of 500Gbps but has actually only one or two 100G ports, so you’ve got plenty of spare bandwidth for internal host exchange. We know NICs are getting smart and take a broader role than pure « external » I/O.
Internal host networking without going through PCIe can be handled like VMs too : with virtio and the DPDK vhost driver. Memory copies are involved in that case.
I suspect for your matter at hand Thea, the easiest is to use SRIOV. Research-wise, a simple solution is to use —networking=host …
Eg this is working well but uses privileged container and lets the docker access all host network for fastclick :
sudo docker run -v /mnt/huge:/dev/hugepages -it --privileged --network host tbarbette/fastclick-dpdk:generic --dpdk -a $VF_PCIE_ADDR -- -e "FromDPDKDevice(0) -> Discard;"
The related sample Dockerfile can be found at : https://github.com/tbarbette/fastclick/blob/main/etc/Dockerfile.dpdk
A problem also with DPDK-based dockers is that you generally don’t want to keep the -march=native, so personally I got that script to build a version of my docker image with many architectures : https://github.com/tbarbette/fastclick/blob/main/etc/docker-build.sh so the user can use the image that targets their own arch.
May that be helpful,
Tom
Le 20 nov. 2024 à 08:10, Kompella V, Purnima <Kompella.Purnima@commscope.com> a écrit :
Vous n’obtenez pas souvent d’e-mail à partir de kompella.purnima@commscope.com<mailto:kompella.purnima@commscope.com>. Pourquoi c’est important<https://aka.ms/LearnAboutSenderIdentification>
Hi Stephen,
A parallel question about packet-flow between VFs of the same PF when VFs are assigned to different containers on the same host server
Create 2 SRIOV-VFs of a PF in the host and assign them to 2 containers (one VF per container)
send IP packet from container-1 to container-2 (SRC_MAC address in this ethernet frame = container1 VF’s MAC address, DST_MAC address = container2 VF’s MAC address),
container-1 sends packet by calling rte_eth_tx_burst()
container-2 is polling for packets from its VF by callingrte_eth_rx_burst()
Will the packet in above scenario leave the host server, go the switch and then come back to the same host machine for entering container-2 ?
Or, is the SRIOV in PF-NIC smart to identify that SRC_MAC and DST_MAC of the ethernet frame are its own VFs and hence it routes the packet locally within the NIC (packet doesn’t reach the switch at all) ?
Regards,
Purnima
From: Stephen Hemminger <stephen@networkplumber.org>
Sent: Wednesday, November 20, 2024 3:34 AM
To: Thea Corinne Rossman <thea.rossman@cs.stanford.edu>
Cc: users@dpdk.org
Subject: Re: Containernet (Docker/Container Networking) with DPDK?
On Tue, 19 Nov 2024 13:39:38 -0800
Thea Corinne Rossman <thea.rossman@cs.stanford.edu<mailto:thea.rossman@cs.stanford.edu>> wrote:
> This is SO helpful -- thank you so much.
>
> One follow-up question regarding NICs: can multiple containers on the same
> host share the same PCI device? If I have a host NIC with (say) VFIO driver
> binding, do I have to split it with some kind of SR-IOV so that each
> container has its own "NIC" binding? Or, when running DPDK's "devbind"
> script, can I set up each one with the same PCI address?
Totally depends on what container system you are using.
If you have two containers sharing same exact PCI device, chaos would ensue.
You might be able to make two VF's on host and pass one to each container;
that would make more sense.
[-- Attachment #2: Type: text/html, Size: 21937 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Containernet (Docker/Container Networking) with DPDK?
2024-11-20 9:27 ` Tom Barbette
@ 2024-11-20 9:28 ` Tom Barbette
2024-11-20 9:28 ` Tom Barbette
2024-11-20 19:49 ` Thea Corinne Rossman
0 siblings, 2 replies; 13+ messages in thread
From: Tom Barbette @ 2024-11-20 9:28 UTC (permalink / raw)
To: Kompella V, Purnima; +Cc: Stephen Hemminger, Thea Corinne Rossman, users
[-- Attachment #1: Type: text/plain, Size: 1819 bytes --]
Le 20 nov. 2024 à 10:27, Tom Barbette <tom.barbette@uclouvain.be> a écrit :
Hi Stephen, Thea,
If you uses SRIOV, then containers behave essentially like VMs, packets will be exchanged through the PCIe bus and switched on the NIC ASIC, which as Stephen mentions, will identify MAC addresses as « itself » and packets do not physically get out of the NIC . I’d argue these days it’s not as much of a problem. You can typically have a PCIe5 x16 ConnectX7 that has a bus BW of 500Gbps but has actually only one or two 100G ports, so you’ve got plenty of spare bandwidth for internal host exchange. We know NICs are getting smart and take a broader role than pure « external » I/O.
Internal host networking without going through PCIe can be handled like VMs too : with virtio and the DPDK vhost driver. Memory copies are involved in that case.
I suspect for your matter at hand Thea, the easiest is to use SRIOV. Research-wise, a simple solution is to use —networking=host …
Eg this is working well but uses privileged container and lets the docker access all host network for fastclick :
sudo docker run -v /mnt/huge:/dev/hugepages -it --privileged --network host tbarbette/fastclick-dpdk:generic --dpdk -a $VF_PCIE_ADDR -- -e "FromDPDKDevice(0) -> Discard;"
The related sample Dockerfile can be found at : https://github.com/tbarbette/fastclick/blob/main/etc/Dockerfile.dpdk
A problem also with DPDK-based dockers is that you generally don’t want to keep the -march=native, so personally I got that script to build a version of my docker image with many architectures : https://github.com/tbarbette/fastclick/blob/main/etc/docker-build.sh so the user can use the image that targets their own arch.
May that be helpful,
Tom
*sorry I meant Purnima, not Stephen
[-- Attachment #2: Type: text/html, Size: 2862 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Containernet (Docker/Container Networking) with DPDK?
2024-11-20 9:28 ` Tom Barbette
@ 2024-11-20 9:28 ` Tom Barbette
2024-11-20 19:49 ` Thea Corinne Rossman
1 sibling, 0 replies; 13+ messages in thread
From: Tom Barbette @ 2024-11-20 9:28 UTC (permalink / raw)
To: Kompella V, Purnima; +Cc: Stephen Hemminger, Thea Corinne Rossman, users
[-- Attachment #1: Type: text/plain, Size: 1819 bytes --]
Le 20 nov. 2024 à 10:27, Tom Barbette <tom.barbette@uclouvain.be> a écrit :
Hi Stephen, Thea,
If you uses SRIOV, then containers behave essentially like VMs, packets will be exchanged through the PCIe bus and switched on the NIC ASIC, which as Stephen mentions, will identify MAC addresses as « itself » and packets do not physically get out of the NIC . I’d argue these days it’s not as much of a problem. You can typically have a PCIe5 x16 ConnectX7 that has a bus BW of 500Gbps but has actually only one or two 100G ports, so you’ve got plenty of spare bandwidth for internal host exchange. We know NICs are getting smart and take a broader role than pure « external » I/O.
Internal host networking without going through PCIe can be handled like VMs too : with virtio and the DPDK vhost driver. Memory copies are involved in that case.
I suspect for your matter at hand Thea, the easiest is to use SRIOV. Research-wise, a simple solution is to use —networking=host …
Eg this is working well but uses privileged container and lets the docker access all host network for fastclick :
sudo docker run -v /mnt/huge:/dev/hugepages -it --privileged --network host tbarbette/fastclick-dpdk:generic --dpdk -a $VF_PCIE_ADDR -- -e "FromDPDKDevice(0) -> Discard;"
The related sample Dockerfile can be found at : https://github.com/tbarbette/fastclick/blob/main/etc/Dockerfile.dpdk
A problem also with DPDK-based dockers is that you generally don’t want to keep the -march=native, so personally I got that script to build a version of my docker image with many architectures : https://github.com/tbarbette/fastclick/blob/main/etc/docker-build.sh so the user can use the image that targets their own arch.
May that be helpful,
Tom
*sorry I meant Purnima, not Stephen
[-- Attachment #2: Type: text/html, Size: 2862 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Containernet (Docker/Container Networking) with DPDK?
2024-11-20 9:28 ` Tom Barbette
2024-11-20 9:28 ` Tom Barbette
@ 2024-11-20 19:49 ` Thea Corinne Rossman
1 sibling, 0 replies; 13+ messages in thread
From: Thea Corinne Rossman @ 2024-11-20 19:49 UTC (permalink / raw)
To: Tom Barbette; +Cc: Kompella V, Purnima, Stephen Hemminger, users
[-- Attachment #1: Type: text/plain, Size: 3442 bytes --]
Hi Tom :)
This is great, and the SR-IOV option feels quite clear now. I'm trying to
better understand the virtio-user option as well for communication within
the same host. I've looked at the DPDK resources, the links you've shared
(Tom), and the original virtio-user paper.
1) From your email:
> Internal host networking without going through PCIe can be handled like
VMs too: with virtio and the DPDK vhost driver. Memory copies are involved
in that case.
Where is the memory copy here? I thought that the point of virtio-user
(containers) + vhost-user backend (since on host) is that it is zero-copy.
Where does the copy happen?
2) This may be a basic container networking question. I want to connect
multiple (say, three) containers on the same host. From the diagram in the
DPDK-provided instructions
<https://doc.dpdk.org/guides/howto/virtio_user_for_container_networking.html>
and virtio-user paper, it appears that a virtual switching infrastructure
will be required. (Noting that I believe that Containernet
<https://containernet.github.io> sets up a namespace a la Mininet, but it
doesn't set up a "virtual switch".)
Am I understanding this correctly? Is there additional container networking
infrastructure required for connecting containers? Or is the vhost
backend + testpmd sufficient? If so, how does the vhost backend "know"
where to switch packets?
Thank you all so much!!
Thea
On Wed, Nov 20, 2024 at 1:28 AM Tom Barbette <tom.barbette@uclouvain.be>
wrote:
>
>
> Le 20 nov. 2024 à 10:27, Tom Barbette <tom.barbette@uclouvain.be> a écrit
> :
>
> Hi Stephen, Thea,
>
> If you uses SRIOV, then containers behave essentially like VMs, packets
> will be exchanged through the PCIe bus and switched on the NIC ASIC, which
> as Stephen mentions, will identify MAC addresses as « itself » and packets
> do not physically get out of the NIC . I’d argue these days it’s not as
> much of a problem. You can typically have a PCIe5 x16 ConnectX7 that has a
> bus BW of 500Gbps but has actually only one or two 100G ports, so you’ve
> got plenty of spare bandwidth for internal host exchange. We know NICs are
> getting smart and take a broader role than pure « external » I/O.
>
> Internal host networking without going through PCIe can be handled like
> VMs too : with virtio and the DPDK vhost driver. Memory copies are involved
> in that case.
>
> I suspect for your matter at hand Thea, the easiest is to use SRIOV.
> Research-wise, a simple solution is to use —networking=host …
>
> Eg this is working well but uses privileged container and lets the docker
> access all host network for fastclick :
> sudo docker run -v /mnt/huge:/dev/hugepages -it --privileged --network
> host tbarbette/fastclick-dpdk:generic --dpdk -a $VF_PCIE_ADDR -- -e
> "FromDPDKDevice(0) -> Discard;"
>
> The related sample Dockerfile can be found at :
> https://github.com/tbarbette/fastclick/blob/main/etc/Dockerfile.dpdk
>
> A problem also with DPDK-based dockers is that you generally don’t want to
> keep the -march=native, so personally I got that script to build a version
> of my docker image with many architectures :
> https://github.com/tbarbette/fastclick/blob/main/etc/docker-build.sh so
> the user can use the image that targets their own arch.
>
>
> May that be helpful,
>
> Tom
>
>
> *sorry I meant Purnima, not Stephen
>
>
>
[-- Attachment #2: Type: text/html, Size: 4829 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-11-20 19:50 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-18 5:42 Containernet (Docker/Container Networking) with DPDK? Thea Corinne Rossman
2024-11-19 20:53 ` Thea Corinne Rossman
2024-11-19 21:29 ` Stephen Hemminger
2024-11-19 21:39 ` Thea Corinne Rossman
2024-11-19 22:03 ` Stephen Hemminger
2024-11-20 7:10 ` Kompella V, Purnima
2024-11-20 9:27 ` Tom Barbette
2024-11-20 9:28 ` Tom Barbette
2024-11-20 9:28 ` Tom Barbette
2024-11-20 19:49 ` Thea Corinne Rossman
2024-11-19 22:14 ` Thomas Monjalon
2024-11-19 23:23 ` Thea Corinne Rossman
2024-11-19 23:30 ` Thomas Monjalon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).