From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E572845D57 for ; Wed, 20 Nov 2024 20:50:10 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8003440650; Wed, 20 Nov 2024 20:50:10 +0100 (CET) Received: from smtp1.cs.Stanford.EDU (smtp1.cs.stanford.edu [171.64.64.25]) by mails.dpdk.org (Postfix) with ESMTP id B6601400EF for ; Wed, 20 Nov 2024 20:50:08 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cs.stanford.edu; s=cs2308; h=Content-Type:Cc:To:Subject:Message-ID:Date: From:In-Reply-To:References:MIME-Version:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=wjFPr72+Qcba2t/wbYVKWEXDfUw3eZt9t2oyu4nzhLU=; t=1732132208; x=1732996208; b=TQh4JWZlXHTZ0A67eWhC1aKrFQqhKdckJxNBfvb/8mwz3hChBEXjlruo9DBJiIyqqX6Kw5Raq/I tFb2dFYEsixP3XhPhU0fBF5vOMPe3fSmUNd+OO3elt2YTci6n5jIsV6prjol7neNkfQ7b0x/h24BM agOtVPt7Llj4bgrCbPPkc3GLQTA8zp+F9DzNZ8ztLrjcxrRNKGpzTJXoMn/OdunvBHGUnrQ6hj7Mw maZ5cYHeqfkzIRfKvlyzisYxlZ9OYKgg+nedxNybmcYeBnCF8iaGWJ2nNdB+OE8x/8/wXD38KkwhF IR5anUas1K79COk0irZC9CW8T4FaLgRSV8Ug==; Received: from mail-pg1-f198.google.com ([209.85.215.198]:54411) by smtp1.cs.Stanford.EDU with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tDqiE-0001hf-Rn for users@dpdk.org; Wed, 20 Nov 2024 11:50:07 -0800 Received: by mail-pg1-f198.google.com with SMTP id 41be03b00d2f7-7f8cdf6168cso184261a12.0 for ; Wed, 20 Nov 2024 11:50:06 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCXvGJoNzWVKjcRowiQVMRVA+UObJkQ5Hos8iFO9hxLiX3trAprIzBx21GR5Y/fYyQlap2vN8g==@dpdk.org X-Gm-Message-State: AOJu0YycQb29cUBwJk/ZDxM3IatVOTPHJu2flyYRQx22FhGAyFf1Xg+h K6DZ9ggeYm64aJ90xJLWT+wvgmM6SNAHlbPhJg22Y4yh/MRRNr1S0cjZxqMuV2GnoQ37/AmPOgF 0OrQu0TpB4Tb4x1RN/yvfpM8UtRr2WbVRLc3FbOoY4jUSvhoxmB7Q9vZy2oGHyAzgWJgty6YNXB d5N2DgKzheDkgZX0bVZqQl1+XQgMD6BOwUP7M3ZZD6AYE= X-Gm-Gg: ASbGnctY/HGQk8tPahJrlSDue+8nxlpPh7yprkhz3yW5fe097vEu8MOKah5aWtY/gt5 flNsx+DS4RLkRfwWQWg9n57sqnQU0yg== X-Received: by 2002:a05:6a20:734a:b0:1db:eead:c588 with SMTP id adf61e73a8af0-1ddb0913306mr6015830637.29.1732132206170; Wed, 20 Nov 2024 11:50:06 -0800 (PST) X-Google-Smtp-Source: AGHT+IGcu4gv3yn9kAzGOwDoq9l5yz7uH2m2g0oSf/ZHy1qVsp+FyHbI3xiXrXXxntTHFrIRTmbQQSSoeHwYWdR1Lic= X-Received: by 2002:a05:6a20:734a:b0:1db:eead:c588 with SMTP id adf61e73a8af0-1ddb0913306mr6015820637.29.1732132205927; Wed, 20 Nov 2024 11:50:05 -0800 (PST) MIME-Version: 1.0 References: <20241119132903.12fefa8c@hermes.local> <20241119140346.1b63a7d9@hermes.local> <80E33329-12A0-418B-B2F1-CB85E2C2388B@uclouvain.be> <00808E3C-24A4-4D0C-A7EC-D07EA1F790B4@uclouvain.be> In-Reply-To: <00808E3C-24A4-4D0C-A7EC-D07EA1F790B4@uclouvain.be> From: Thea Corinne Rossman Date: Wed, 20 Nov 2024 11:49:30 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Containernet (Docker/Container Networking) with DPDK? To: Tom Barbette Cc: "Kompella V, Purnima" , Stephen Hemminger , "users@dpdk.org" Content-Type: multipart/alternative; boundary="000000000000d6c2d606275d74f0" X-proofpoint-id: ef3f9bb5-395f-4fcc-a797-849143d52903 X-Spam-Score: -1.0 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin on smtp1.cs.Stanford.EDU X-Scan-Signature: 00ae0dc23c387355c0c9f5c46aa55045 X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org --000000000000d6c2d606275d74f0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Tom :) This is great, and the SR-IOV option feels quite clear now. I'm trying to better understand the virtio-user option as well for communication within the same host. I've looked at the DPDK resources, the links you've shared (Tom), and the original virtio-user paper. 1) From your email: > Internal host networking without going through PCIe can be handled like VMs too: with virtio and the DPDK vhost driver. Memory copies are involved in that case. Where is the memory copy here? I thought that the point of virtio-user (containers) + vhost-user backend (since on host) is that it is zero-copy. Where does the copy happen? 2) This may be a basic container networking question. I want to connect multiple (say, three) containers on the same host. From the diagram in the DPDK-provided instructions and virtio-user paper, it appears that a virtual switching infrastructure will be required. (Noting that I believe that Containernet sets up a namespace a la Mininet, but it doesn't set up a "virtual switch".) Am I understanding this correctly? Is there additional container networking infrastructure required for connecting containers? Or is the vhost backend + testpmd sufficient? If so, how does the vhost backend "know" where to switch packets? Thank you all so much!! Thea On Wed, Nov 20, 2024 at 1:28=E2=80=AFAM Tom Barbette wrote: > > > Le 20 nov. 2024 =C3=A0 10:27, Tom Barbette a = =C3=A9crit > : > > Hi Stephen, Thea, > > If you uses SRIOV, then containers behave essentially like VMs, packets > will be exchanged through the PCIe bus and switched on the NIC ASIC, whic= h > as Stephen mentions, will identify MAC addresses as =C2=AB itself =C2=BB = and packets > do not physically get out of the NIC . I=E2=80=99d argue these days it=E2= =80=99s not as > much of a problem. You can typically have a PCIe5 x16 ConnectX7 that has = a > bus BW of 500Gbps but has actually only one or two 100G ports, so you=E2= =80=99ve > got plenty of spare bandwidth for internal host exchange. We know NICs ar= e > getting smart and take a broader role than pure =C2=AB external =C2=BB I= /O. > > Internal host networking without going through PCIe can be handled like > VMs too : with virtio and the DPDK vhost driver. Memory copies are involv= ed > in that case. > > I suspect for your matter at hand Thea, the easiest is to use SRIOV. > Research-wise, a simple solution is to use =E2=80=94networking=3Dhost =E2= =80=A6 > > Eg this is working well but uses privileged container and lets the docker > access all host network for fastclick : > sudo docker run -v /mnt/huge:/dev/hugepages -it --privileged --network > host tbarbette/fastclick-dpdk:generic --dpdk -a $VF_PCIE_ADDR -- -e > "FromDPDKDevice(0) -> Discard;" > > The related sample Dockerfile can be found at : > https://github.com/tbarbette/fastclick/blob/main/etc/Dockerfile.dpdk > > A problem also with DPDK-based dockers is that you generally don=E2=80=99= t want to > keep the -march=3Dnative, so personally I got that script to build a vers= ion > of my docker image with many architectures : > https://github.com/tbarbette/fastclick/blob/main/etc/docker-build.sh so > the user can use the image that targets their own arch. > > > May that be helpful, > > Tom > > > *sorry I meant Purnima, not Stephen > > > --000000000000d6c2d606275d74f0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Tom :)=C2=A0

This is grea= t, and the SR-IOV option feels quite clear now. I'm trying to better un= derstand the virtio-user option as well for communication within the same h= ost.=C2=A0I've looked at the DPDK resources, the links you've share= d (Tom), and the original virtio-user paper.=C2=A0

1) From your email:

> Internal host networking= without going through PCIe can be handled like VMs too: with virtio and th= e DPDK vhost driver. Memory copies are involved in that case.
Where is the memory copy here? I thought that the point of virt= io-user (containers) + vhost-user backend (since on host) is that it is zer= o-copy. Where does the copy happen?=C2=A0

2) This = may be a basic container networking question. I want to connect multiple (s= ay, three) containers on the same host. From the diagram in the DPDK-provid= ed instructions and virtio-user paper, it appears that a v= irtual switching infrastructure will be required. (Noting that I believe th= at=C2=A0Containernet=C2=A0sets up a namespace a la M= ininet, but it doesn't set up a "virtual switch".)=C2=A0

Am I understanding this correctly? Is there additional= container networking infrastructure required for connecting containers? Or= is the vhost backend=C2=A0+ testpmd sufficient? If so, how does the vhost = backend "know" where to switch packets?=C2=A0

Thank you all so much!!
Thea=C2=A0

On Wed, Nov 20, 202= 4 at 1:28=E2=80=AFAM Tom Barbette <tom.barbette@uclouvain.be> wrote:


Le 20 nov. 2024 =C3=A0 10:27, Tom Barbette <tom.barbette@uclouvain.be> a= =C3=A9crit :

Hi Stephen, Thea,

If you uses SRIOV, then containers behave essentially like VMs, packet= s will be exchanged through the PCIe bus and switched on the NIC ASIC, whic= h as Stephen mentions, will identify MAC addresses as =C2=AB=C2=A0itself=C2= =A0=C2=BB and packets do not physically get out of the NIC . I=E2=80=99d argue these days it=E2=80=99s not as much of a problem. = You can typically have a PCIe5 x16 ConnectX7 that has a bus BW of 500Gbps b= ut has actually only one or two 100G ports, so you=E2=80=99ve got plenty of= spare bandwidth for internal host exchange. We know NICs are getting smart and take a broader role than pure =C2=AB=C2=A0external= =C2=A0=C2=BB =C2=A0I/O.

Internal host networking without going through PCIe can be handled lik= e VMs too : with virtio and the DPDK vhost driver. Memory copies are involv= ed in that case.

I suspect for your matter at hand Thea, the easiest is to use SRIOV. R= esearch-wise, a simple solution is to use =E2=80=94networking=3Dhost =E2=80= =A6=C2=A0

Eg this is working well but uses privileged container and lets the doc= ker access all host network for fastclick :
sudo docker run -v /mnt/huge:/dev/hugepages -it --privileged --network= host tbarbette/fastclick-dpdk:generic --dpdk -a $VF_PCIE_ADDR -- -e "= FromDPDKDevice(0) -> Discard;"=C2=A0

The related sample Dockerfile can be found at :=C2=A0https://github.com/tbarbette/fastclick/blob/main/etc/Dockerfile.dpdk=

A problem also with DPDK-based dockers is that you generally don=E2=80= =99t want to keep the -march=3Dnative, so personally I got that script to b= uild a version of my docker image with many architectures :=C2=A0https://github.com/tbarbette/fastclick/blob/main/etc/docker-b= uild.sh=C2=A0so the user can use the image that targets their own arch.


May that be helpful,

Tom


*sorry I meant Purnima, not Stephen


--000000000000d6c2d606275d74f0--