From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D563043D57; Tue, 26 Mar 2024 18:44:51 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id BF6EE40EA5; Tue, 26 Mar 2024 18:44:51 +0100 (CET) Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by mails.dpdk.org (Postfix) with ESMTP id 2209D402DA for ; Tue, 26 Mar 2024 18:44:50 +0100 (CET) Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-6ea838bf357so3196801b3a.0 for ; Tue, 26 Mar 2024 10:44:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=damore-org.20230601.gappssmtp.com; s=20230601; t=1711475089; x=1712079889; darn=dpdk.org; h=mime-version:subject:references:in-reply-to:message-id:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=tsI9dGim7nPr+/M3h0VCLxQwSMd5Erp/ReaVZ/xmJd0=; b=fJ1u4KB7dFEBY1A4PY6zFCIlapgzKHF6t/f+x4YncAJ4Z9H3t5aFxkRBd/QgbUDesZ JbWZaH7jbowJGfPYlJRLNe2TpawQRxP5fcOo6to3SdjG72jO3+0B/dWCizaMZVicWj3a KkIQ86ZdwC6PGI6C5T4nPY2OdujkuMskfBJKu08UabjOo/NEiGJubqtHryAyY8zhr91/ TvB3jKI+TwKCjZ6Xi5g6E2vqRiNipPPz/SXFYsYhdxbrbA7ZF+Z+xqaSZ+G4Hl8Rc3Ih wizYgU34/3C+lXkj2Xu+f1CVB8N6DPbXwOBLtktTzCxYven7DGlzZpPMMS95d9JS/roj bbxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711475089; x=1712079889; h=mime-version:subject:references:in-reply-to:message-id:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tsI9dGim7nPr+/M3h0VCLxQwSMd5Erp/ReaVZ/xmJd0=; b=uieoMnP8RHmpdw/HxjRiED5TFFktRLpK12Jp5XZaUZj7VOyBWS3/Evsj0I+Bdz/iUw OUwmwZ3RN3oOo8u967xDyfkCq27RRtT5e3TAzRRDOp5J/M+ryHgGohkQ/axJmPp4m372 qAaIskEsvyXv1N+eAfbs7LZvchMnAIA1zaNBTc//+v8yP4Md3vaCUQQF9HOyjFbZ5YTE at5TtOavNAvSb/mXgbK2PcvaLhuH5An17GAGoulwXFT8VwDVt0A/UKHmzxOxm1QEx7ht Wd8FgbtW+74ht/vQdb88rKuGJ/NLDnDukbpv7NhOLlKeivBJzD/OFVNV+aLM5H8JoSvq Xiag== X-Gm-Message-State: AOJu0YzPibk72BULbKxQIFJQ0gW9/q+0SjbpNjti+FqFoKNoEdTB7FhE g2QtgbITUYms5ObtnLSSgocSm4Tk7zNGXXfDOaSxiaHxiIjTScptvaVKyw+ynbs= X-Google-Smtp-Source: AGHT+IHtW6wB/SWlKyDkQ83e7rUoqWzeTFMe4TjpLR7A99eGcsfjS8PVZc7tasFkdEcSpfz4gw9ECg== X-Received: by 2002:a05:6a20:d49b:b0:1a0:adbc:7a96 with SMTP id im27-20020a056a20d49b00b001a0adbc7a96mr520617pzb.36.1711475089055; Tue, 26 Mar 2024 10:44:49 -0700 (PDT) Received: from [10.41.69.223] ([149.20.194.220]) by smtp.gmail.com with ESMTPSA id z189-20020a6265c6000000b006e66666de0dsm6468640pfb.199.2024.03.26.10.44.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Mar 2024 10:44:48 -0700 (PDT) Date: Tue, 26 Mar 2024 10:44:41 -0700 From: Garrett D'Amore To: Bruce Richardson , Stephen Hemminger , =?utf-8?Q?Morten_Br=C3=B8rup?= , Konstantin Ananyev Cc: "=?utf-8?Q?dev=40dpdk.org?=" , Parthakumar Roy Message-ID: <59a0cd8c-a483-48d4-a98c-35900659de83@Spark> In-Reply-To: <67596383c0e842b197dcae1059900d72@huawei.com> References: <20240325102030.46913a06@hermes.local> <98CBD80474FA8B44BF855DF32C47DC35E9F32D@smartserver.smartshare.dk> <67596383c0e842b197dcae1059900d72@huawei.com> Subject: RE: meson option to customize RTE_PKTMBUF_HEADROOM patch X-Readdle-Message-ID: 59a0cd8c-a483-48d4-a98c-35900659de83@Spark MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="6603098e_46c046e3_118d0" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --6603098e_46c046e3_118d0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline This ETH=5FRX=5FO=46=46LOAD=5FBU=46=46ER=5FSPLIT sounds promising indeed.= On Mar 26, 2024 at 9:14=E2=80=AFAM -0700, Konstantin Ananyev , wrote: > Just wonder what would happen if you=E2=80=99ll receive an ipv6 packet = with options or some fancy encapsulation IP-IP or so=3F > BTW, there is an =C2=A0RTE=5FETH=5FRX=5FO=46=46LOAD=5FBU=46=46ER=5FSPLI= T offload: > https://doc.dpdk.org/api/structrte=5F=5Feth=5F=5Frxseg=5F=5Fsplit.html > Which might be close to what you are looking for, but right now it is s= upported by mlx5 PMD only. > > =46rom: Garrett D'Amore > Sent: Tuesday, March 26, 2024 2:19 PM > To: Bruce Richardson ; Stephen Hemminger = ; Morten Br=C3=B8rup > Cc: dev=40dpdk.org; Parthakumar Roy > Subject: RE: meson option to customize RTE=5FPKTMBU=46=5FHEADROOM patch= > > This could work. Not that we would like to have the exceptional case of= IPv6 use less headroom.=C2=A0 =C2=A0So we would say 40 is our compiled i= n default and then we reduce it by 20 on IPv6 which doesn=E2=80=99t have = to support all the same devices that IPv4 does. This would give the lowes= t disruption to the existing IPv4 stack and allow PMDs to updated increme= ntally. > On Mar 26, 2024 at 1:05=E2=80=AFAM -0700, Morten Br=C3=B8rup , wrote: > > > quote=5Ftype > > Interesting requirement. I can easily imagine how a (non-forwarding, = i.e. traffic terminating) application, which doesn=E2=80=99t really care = about the preceding headers, can benefit from having its actual data at a= specific offset for alignment purposes. I don=E2=80=99t consider this ve= ry exotic. (Even the Linux kernel uses this trick to achieve improved IP = header alignment on RX.) > > > > I think the proper solution would be to add a new offload parameter t= o rte=5Feth=5Frxconf to specify how many bytes the driver should subtract= from RTE=5FPKTMBU=46=5FHEADROOM when writing the RX descriptor to the NI= C hardware. Depending on driver support, this would make it configurable = per device and per RX queue. > > > > If this parameter is set, the driver should adjust m->data=5Foff acco= rdingly on RX, so rte=5Fpktmbuf=5Fmtod=5B=5Foffset=5D() and rte=5Fpktmbuf= =5Fiova=5B=5Foffset=5D() still point to the Ethernet header. > > > > > > Med venlig hilsen / Kind regards, > > -Morten Br=C3=B8rup > > > > =46rom: Garrett D'Amore =5Bmailto:garrett=40damore.org=5D > > Sent: Monday, 25 March 2024 23.56 > > So we need (for reasons that I don't want to get to into in too much = detail) that our UDP payload headers are at a specific offset in the pack= et. > > > > This was not a problem as long as we only used IPv4.=C2=A0=C2=A0(We h= ave configured 40 bytes of headroom, which is more than any of our PMDs n= eed by a hefty margin.) > > > > Now that we're extending to support IPv6, we need to reduce that head= room by 20 bytes, to preserve our UDP payload offset. > > > > This has big ramifications for how we fragment our own upper layer me= ssages, and it has been determined that updating the PMDs to allow us to = change the headroom for this use case (on a per port basis, as we will ha= ve some ports on IPv4 and others on IPv6) is the least effort, but a larg= e margin.=C2=A0=C2=A0(Well, copying the frames via memcpy would be less d= evelopment effort, but would be a performance catastrophe.) > > > > =46or transmit side we don't need this, as we can simply adjust the p= acket as needed.=C2=A0=C2=A0But for the receive side, we are kind of stuc= k, as the PMDs rely on the hard coded RTE=5FPKTMBU=46=5FHEADROOM to progr= am receive locations. > > > > As far as header splitting, that would indeed be a much much nicer so= lution. > > > > I haven't looked in the latest code to see if header splitting is eve= n an option -- the version of the DPDK I'm working with is a little older= (20.11) -- we have to update but we have other local changes and so upda= ting is one of the things that we still have to do. > > > > At any rate, the version I did look at doesn't seem to support header= splits on any device other than =46M10K.=C2=A0=C2=A0That's not terrifica= lly interesting for us.=C2=A0=C2=A0We use Mellanox, E810 (ICE), bnxt, clo= ud NICs (all of them really -- ENA, virtio-net, etc.)=C2=A0 =C2=A0We also= have a fair amount of ixgbe and i40e on client systems in the field. > > > > We also, unfortunately, have an older DPDK 18 with Mellanox contribut= ions for IPoverIB.... though I'm not sure we will try to support IPv6 the= re.=C2=A0=C2=A0(We are working towards replacing that part of stack with = UCX.) > > > > Unless header splitting will work on all of this (excepting the IPoIB= piece), then it's not something we can really use. > > On Mar 25, 2024 at 10:20=E2=80=AFAM -0700, Stephen Hemminger , wrote: > > On Mon, 25 Mar 2024 10:01:52 +0000 > > Bruce Richardson wrote: > > > > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > > > So we right now (at WEKA) have a somewhat older version of DPDK tha= t we > > > have customized heavily, and I am going to to need to to make the > > > headroom *dynamic* (passed in at run time, and per port.) > > > We have this requirement because we need payload to be at a specifi= c > > > offset, but have to deal with different header lengths for IPv4 and= now > > > IPv6. > > > My reason for pointing this out, is that I would dearly like if we > > > could collaborate on this -- this change is going to touch pretty m= uch > > > every PMD (we don't need it on all of them as we only support a sub= set > > > of PMDs, but its still a significant set.) > > > I'm not sure if anyone else has considered such a need -- this > > > particular message caught my eye as I'm looking specifically in thi= s > > > area right now. > > > > > Hi > > > > thanks for reaching out. Can you clarify a little more as to the need= for > > this requirement=3F Can you not just set the headroom value to the ma= x needed > > value for any port and use that=3F Is there an issue with having blan= k space > > at the start of a buffer=3F > > > > Thanks, > > /Bruce > > > > If you have to make such a deep change across all PMD's then maybe > > it is not the best solution. What about being able to do some form of= buffer > > chaining or pullup. --6603098e_46c046e3_118d0 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline
This ETH=5FRX=5FO=46=46LOAD=5FBU=46=46ER=5FSPLIT so= unds promising indeed.
On Mar 26, 2024 at 9:14=E2=80=AFAM = -0700, Konstantin Ananyev <konstantin.ananyev=40huawei.com>, wrote:=

Just wonder what would happen if you=E2=80=99ll re= ceive an ipv6 packet with options or some fancy encapsulation IP-IP or so= =3F

BTW, there is an &=23160;RTE=5FETH=5FRX=5FO=46=46LOAD=5FBU=46=46ER=5FSPLIT= offload:

http= s://doc.dpdk.org/api/structrte=5F=5Feth=5F=5Frxseg=5F=5Fsplit.html=

Whic= h might be close to what you are looking for, but right now it is support= ed by mlx5 PMD only.

&=23160;

=46rom: Garrett D'Amore <garrett=40d= amore.org>
Sent: Tuesday, March 26, 2024 2:19 PM
To: Bruce Richardson <bruce.richardson=40intel.com>; Stephen= Hemminger <stephen=40networkplumber.org>; Morten Br=C3=B8rup <m= b=40smartsharesystems.com>
Cc: dev=40dpdk.org; Parthakumar Roy <Parthakumar.Roy=40ibm.com&= gt;
Subject: RE: meson option to customize RTE=5FPKTMBU=46=5FHEADROOM = patch

&=23160;

This could work. Not that we would like to hav= e the exceptional case of IPv6 use less headroom.&=23160; &=23160;So we w= ould say 40 is our compiled in default and then we reduce it by 20 on IPv= 6 which doesn=E2=80=99t have to support all the same devices that IPv4 do= es. This would give the lowest disruption to the existing IPv4 stack and = allow PMDs to updated incrementally.&=23160;

On Mar 26, 2024 at 1:05=E2=80=AFAM -0700, Mort= en Br=C3=B8rup <mb=40= smartsharesystems.com>, wrote:

Interesting requ= irement. I can easily imagine how a (non-forwarding, i.e. traffic termina= ting) application, which doesn=E2=80=99t really care about the preceding = headers, can benefit from having its actual data at a specific offset for= alignment purposes. I don=E2=80=99t consider this very exotic. (Even the= Linux kernel uses this trick to achieve improved IP header alignment on = RX.)

&=23160;<= /p>

I think the prop= er solution would be to add a new offload parameter to rte=5Feth=5Frxconf= to specify how many bytes the driver should subtract from RTE=5FPKTMBU=46= =5FHEADROOM when writing the RX descriptor to the NIC hardware. Depending= on driver support, this would make it configurable per device and per RX= queue.

&=23160;<= /p>

If this paramete= r is set, the driver should adjust m->data=5Foff accordingly on RX, so= rte=5Fpktmbuf=5Fmtod=5B=5Foffset=5D() and rte=5Fpktmbuf=5Fiova=5B=5Foffs= et=5D() still point to the Ethernet header.

&=23160;<= /p>

&=23160;<= /p>

Med venlig hilse= n / Kind regards,

-Morten Br=C3=B8rup

&=23160;

=46rom: Garrett D'Amore =5Bmailto:garrett=40damore.org=5D Sent: Monday, 25 March 2024 23.56

So we need (for reasons that I don't want to get to in= to in too much detail) that our UDP payload headers are at a specific off= set in the packet.

This was not a problem as long as we only used IPv4.&=23160;&=23160;(We h= ave configured 40 bytes of headroom, which is more than any of our PMDs n= eed by a hefty margin.)

Now that we're extending to support IPv6, we need to reduce that headroom= by 20 bytes, to preserve our UDP payload offset.

This has big ramifications for how we fragment our own upper layer messag= es, and it has been determined that updating the PMDs to allow us to chan= ge the headroom for this use case (on a per port basis, as we will have s= ome ports on IPv4 and others on IPv6) is the least effort, but a large ma= rgin.&=23160;&=23160;(Well, copying the frames via memcpy would be less d= evelopment effort, but would be a performance catastrophe.)

=46or transmit side we don't need this, as we can simply adjust the packe= t as needed.&=23160;&=23160;But for the receive side, we are kind of stuc= k, as the PMDs rely on the hard coded RTE=5FPKTMBU=46=5FHEADROOM to progr= am receive locations.

As far as header splitting, that would indeed be a much much nicer soluti= on.

I haven't looked in the latest code to see if header splitting is even an= option -- the version of the DPDK I'm working with is a little older (20= .11) -- we have to update but we have other local changes and so updating= is one of the things that we still have to do.

At any rate, the version I did look at doesn't seem to support header spl= its on any device other than =46M10K.&=23160;&=23160;That's not terrifica= lly interesting for us.&=23160;&=23160;We use Mellanox, E810 (ICE), bnxt,= cloud NICs (all of them really -- ENA, virtio-net, etc.)&=23160; &=23160= ;We also have a fair amount of ixgbe and i40e on client systems in the fi= eld.

We also, unfortunately, have an older DPDK 18 with Mellanox contributions= for IPoverIB.... though I'm not sure we will try to support IPv6 there.&= =23160;&=23160;(We are working towards replacing that part of stack with = UCX.)

Unless header splitting will work on all of this (excepting the IPoIB pie= ce), then it's not something we can really use.

On Mar 25, 2024 at 10:20=E2=80=AFAM -0700, Stephen Hemminger= <stephen=40networ= kplumber.org>, wrote:

On Mon, 25 Mar 2024 10:01:52 +0000
Bruce Richardson <= bruce.richardson=40intel.com> wrote:

On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wr= ote:

> So we right now (at WEKA) have a somewhat older v= ersion of DPDK that we
> have customized heavily, and I am going to to need to to make the > headroom *dynamic* (passed in at run time, and per port.)
> We have this requirement because we need payload to be at a specific=
> offset, but have to deal with different header lengths for IPv4 and = now
> IPv6.
> My reason for pointing this out, is that I would dearly like if we > could collaborate on this -- this change is going to touch pretty mu= ch
> every PMD (we don't need it on all of them as we only support a subs= et
> of PMDs, but its still a significant set.)
> I'm not sure if anyone else has considered such a need -- this
= > particular message caught my eye as I'm looking specifically in this=
> area right now.
>

Hi

thanks for reaching out. Can you clarify a little more as to the need for=
this requirement=3F Can you not just set the headroom value to the max ne= eded
value for any port and use that=3F Is there an issue with having blank sp= ace
at the start of a buffer=3F

Thanks,
/Bruce


If you have to make such a deep change across all PMD's then maybe
it is not the best solution. What about being able to do some form of buf= fer
chaining or pullup.

--6603098e_46c046e3_118d0--