From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0D4254370F; Tue, 26 Mar 2024 18:43:43 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9273B40EA5; Tue, 26 Mar 2024 18:43:42 +0100 (CET) Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by mails.dpdk.org (Postfix) with ESMTP id 287C5402DA for ; Tue, 26 Mar 2024 18:43:41 +0100 (CET) Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1e0f3052145so8489005ad.2 for ; Tue, 26 Mar 2024 10:43:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=damore-org.20230601.gappssmtp.com; s=20230601; t=1711475020; x=1712079820; darn=dpdk.org; h=mime-version:subject:references:in-reply-to:message-id:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=lCef5CcG8PpZ/0uOAt2UxJ69X0fOViTaCmX0F6d8pGQ=; b=EeD+BZfJ3nJkMihCNIslq0bHhe/1mml97c7HU4WV74WVOwa9yZ6k9udhb5fM2MAfVn EytYZntD1Ec4DEH1bfSCku26aoRN2GgNYX7ziRCOqTL9EXawsOhq5uBGK4qokZ+hHg34 L4AxjtGNvDBMzMFFZjDhfJDvmktER8utYrqnT6VTQsJkIiRinCrfvN6QrVRCRcYOTl0j LPxLoKOjYW2464jy6/p2h9bTsyiB9JmJ2TBiRYBV1ngItrZfB+KUPYBEBpg3pdBVgHDK uo7DOnxN5oxrDku9HPdsX+DilOyx5P8CHPHDYYGVQbhhPvlbRmkxgSvY/j2tioTyJ6lF uRzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711475020; x=1712079820; h=mime-version:subject:references:in-reply-to:message-id:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lCef5CcG8PpZ/0uOAt2UxJ69X0fOViTaCmX0F6d8pGQ=; b=GdUJA9JQRPnZZcB+F06RgdFOk5we6Q0Ny5g/hgyZGKyBlXA5B+CuaQ8M03pwYMOERg 90BikTJMv2gOX0eVmk2pNS60RS8T3jCZdBl+H8ZNEl52L4LvB6ZD5GVozugaReJ3fu3M SE0Bg/q7x3GJhF/fTNBW3tB9ZXwfZzPU1xOUkSmdXpIVXeb3RxPz6/m0Aqdwvl+/81qs YlyrZmSYTCcvQaz79FQwyxrEvGNjtac2/cq4YeAywMe7C9NNVHrNhLFVUbN6LAjyDbMJ l3B5vATHUzmUvVCULQ0w/Kv14YEDQDBq671RcFCdAJ+7REx1mOnMCxzzvR7MYXdR6jAR VTIw== X-Gm-Message-State: AOJu0Yxs8Avzij+K7H0ABY5B7H7AJnUJz1dNKZ29A47tj3m6MLx+A5xF 2QxnZ+OCYKDeu4T57rBacdrybaj1W33lHudG4WARg/bjGg+362gisOsh/DL3KKI= X-Google-Smtp-Source: AGHT+IHxP50suw5U3grrgrT73aFGmDirZKpA55eoZFjzdfvhXv8e50D3f927taGw/kt8zDQivmTi0g== X-Received: by 2002:a17:903:1c8:b0:1e0:d636:4828 with SMTP id e8-20020a17090301c800b001e0d6364828mr1803903plh.1.1711475020152; Tue, 26 Mar 2024 10:43:40 -0700 (PDT) Received: from [10.41.69.223] ([149.20.194.220]) by smtp.gmail.com with ESMTPSA id p2-20020a170902780200b001deeac592absm7236496pll.180.2024.03.26.10.43.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Mar 2024 10:43:39 -0700 (PDT) Date: Tue, 26 Mar 2024 10:43:30 -0700 From: Garrett D'Amore To: =?utf-8?Q?Morten_Br=C3=B8rup?= Cc: dev@dpdk.org, Parthakumar Roy , Bruce Richardson , Stephen Hemminger Message-ID: In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F337@smartserver.smartshare.dk> References: <20240325102030.46913a06@hermes.local> <98CBD80474FA8B44BF855DF32C47DC35E9F32D@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35E9F337@smartserver.smartshare.dk> Subject: RE: meson option to customize RTE_PKTMBUF_HEADROOM patch X-Readdle-Message-ID: e24f1a64-dc34-48cf-8140-8d6dbb41bc3e@Spark MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="66030947_50d7736e_118d0" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --66030947_50d7736e_118d0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline This had occurred to me as well.=C2=A0=C2=A0I think most hardware DMA eng= ines can align on 32-bit boundaries.=C2=A0=C2=A0I've yet to see a device = that actually requires 64-bit DMA alignment.=C2=A0=C2=A0(But I have only = looked at a subset=C2=A0=C2=A0of devices, and most of the=C2=A0=C2=A0ones= I have looked at are not ones that would be considered 'modern'.) On Mar 26, 2024 at 8:06=E2=80=AFAM -0700, Morten Br=C3=B8rup , wrote: > Something just struck me=E2=80=A6 > The buffer address field in the RX descriptor of some NICs may have ali= gnment requirements, i.e. the lowest bits in the buffer address field of = the NIC=E2=80=99s RX descriptor may be used for other purposes (and assum= ed zero for buffer address purposes). 40 is divisible by 8, but offset 20= requires that the NIC hardware supports 4-byte aligned addresses (so onl= y the 2 lowest bits may be used for other purposes). > > Here=E2=80=99s an example of what I mean: > https://docs.amd.com/r/en-US/am011-versal-acap-trm/RX-Descriptor-Words > > If any of your supported NICs have that restriction, i.e. requires an 8= byte aligned buffer address, your concept of having the UDP payload at t= he same fixed offset for both IPv4 and IPv6 is not going to be possible. = (And you were lucky that the offset happens to be sufficiently aligned to= work for IPv4 to begin with.) > > It seems you need to read a bunch of datasheets before proceeding. > > > Med venlig hilsen / Kind regards, > -Morten Br=C3=B8rup > > =46rom: Garrett D'Amore =5Bmailto:garrett=40damore.org=5D > Sent: Tuesday, 26 March 2024 15.19 > > This could work. Not that we would like to have the exceptional case of= IPv6 use less headroom.=C2=A0 =C2=A0So we would say 40 is our compiled i= n default and then we reduce it by 20 on IPv6 which doesn=E2=80=99t have = to support all the same devices that IPv4 does. This would give the lowes= t disruption to the existing IPv4 stack and allow PMDs to updated increme= ntally. > On Mar 26, 2024 at 1:05=E2=80=AFAM -0700, Morten Br=C3=B8rup , wrote: > > Interesting requirement. I can easily imagine how a (non-forwarding, i.= e. traffic terminating) application, which doesn=E2=80=99t really care ab= out the preceding headers, can benefit from having its actual data at a s= pecific offset for alignment purposes. I don=E2=80=99t consider this very= exotic. (Even the Linux kernel uses this trick to achieve improved IP he= ader alignment on RX.) > > I think the proper solution would be to add a new offload parameter to = rte=5Feth=5Frxconf to specify how many bytes the driver should subtract f= rom RTE=5FPKTMBU=46=5FHEADROOM when writing the RX descriptor to the NIC = hardware. Depending on driver support, this would make it configurable pe= r device and per RX queue. > > If this parameter is set, the driver should adjust m->data=5Foff accord= ingly on RX, so rte=5Fpktmbuf=5Fmtod=5B=5Foffset=5D() and rte=5Fpktmbuf=5F= iova=5B=5Foffset=5D() still point to the Ethernet header. > > > Med venlig hilsen / Kind regards, > -Morten Br=C3=B8rup > > =46rom: Garrett D'Amore =5Bmailto:garrett=40damore.org=5D > Sent: Monday, 25 March 2024 23.56 > So we need (for reasons that I don't want to get to into in too much de= tail) that our UDP payload headers are at a specific offset in the packet= . > > This was not a problem as long as we only used IPv4.=C2=A0=C2=A0(We hav= e configured 40 bytes of headroom, which is more than any of our PMDs nee= d by a hefty margin.) > > Now that we're extending to support IPv6, we need to reduce that headro= om by 20 bytes, to preserve our UDP payload offset. > > This has big ramifications for how we fragment our own upper layer mess= ages, and it has been determined that updating the PMDs to allow us to ch= ange the headroom for this use case (on a per port basis, as we will have= some ports on IPv4 and others on IPv6) is the least effort, but a large = margin.=C2=A0=C2=A0(Well, copying the frames via memcpy would be less dev= elopment effort, but would be a performance catastrophe.) > > =46or transmit side we don't need this, as we can simply adjust the pac= ket as needed.=C2=A0=C2=A0But for the receive side, we are kind of stuck,= as the PMDs rely on the hard coded RTE=5FPKTMBU=46=5FHEADROOM to program= receive locations. > > As far as header splitting, that would indeed be a much much nicer solu= tion. > > I haven't looked in the latest code to see if header splitting is even = an option -- the version of the DPDK I'm working with is a little older (= 20.11) -- we have to update but we have other local changes and so updati= ng is one of the things that we still have to do. > > At any rate, the version I did look at doesn't seem to support header s= plits on any device other than =46M10K.=C2=A0=C2=A0That's not terrificall= y interesting for us.=C2=A0=C2=A0We use Mellanox, E810 (ICE), bnxt, cloud= NICs (all of them really -- ENA, virtio-net, etc.)=C2=A0 =C2=A0We also h= ave a fair amount of ixgbe and i40e on client systems in the field. > > We also, unfortunately, have an older DPDK 18 with Mellanox contributio= ns for IPoverIB.... though I'm not sure we will try to support IPv6 there= .=C2=A0=C2=A0(We are working towards replacing that part of stack with UC= X.) > > Unless header splitting will work on all of this (excepting the IPoIB p= iece), then it's not something we can really use. > On Mar 25, 2024 at 10:20=E2=80=AFAM -0700, Stephen Hemminger , wrote: > On Mon, 25 Mar 2024 10:01:52 +0000 > Bruce Richardson wrote: > > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > > So we right now (at WEKA) have a somewhat older version of DPDK that = we > > have customized heavily, and I am going to to need to to make the > > headroom *dynamic* (passed in at run time, and per port.) > > We have this requirement because we need payload to be at a specific > > offset, but have to deal with different header lengths for IPv4 and n= ow > > IPv6. > > My reason for pointing this out, is that I would dearly like if we > > could collaborate on this -- this change is going to touch pretty muc= h > > every PMD (we don't need it on all of them as we only support a subse= t > > of PMDs, but its still a significant set.) > > I'm not sure if anyone else has considered such a need -- this > > particular message caught my eye as I'm looking specifically in this > > area right now. > > > Hi > > thanks for reaching out. Can you clarify a little more as to the need f= or > this requirement=3F Can you not just set the headroom value to the max = needed > value for any port and use that=3F Is there an issue with having blank = space > at the start of a buffer=3F > > Thanks, > /Bruce > > If you have to make such a deep change across all PMD's then maybe > it is not the best solution. What about being able to do some form of b= uffer > chaining or pullup. --66030947_50d7736e_118d0 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline
This had occurred to me as well.&=23160;&=23160;I t= hink most hardware DMA engines can align on 32-bit boundaries.&=23160;&=23= 160;I've yet to see a device that actually requires 64-bit DMA alignment.= &=23160;&=23160;(But I have only looked at a subset&=23160;&=23160;of dev= ices, and most of the&=23160;&=23160;ones I have looked at are not ones t= hat would be considered 'modern'.)
On Mar 26, 2024 at 8:06=E2=80=AFAM = -0700, Morten Br=C3=B8rup <mb=40smartsharesystems.com>, wrote:

Something just struck me=E2=80=A6

The buffer address field i= n the RX descriptor of some NICs may have alignment requirements, i.e. th= e lowest bits in the buffer address field of the NIC=E2=80=99s RX descrip= tor may be used for other purposes (and assumed zero for buffer address p= urposes). 40 is divisible by 8, but offset 20 requires that the NIC hardw= are supports 4-byte aligned addresses (so only the 2 lowest bits may be u= sed for other purposes).

&=23160;

Here=E2=80=99s an example = of what I mean:

https://docs.amd.com/r/en-US/= am011-versal-acap-trm/RX-Descriptor-Words=

&=23160;

If any of your supported N= ICs have that restriction, i.e. requires an 8 byte aligned buffer address= , your concept of having the UDP payload at the same fixed offset for bot= h IPv4 and IPv6 is not going to be possible. (And you were lucky that the= offset happens to be sufficiently aligned to work for IPv4 to begin with= .)

&=23160;

It seems you need to read = a bunch of datasheets before proceeding.

&=23160;

&=23160;

Med venlig hilsen / Kind r= egards,

-Morten Br=C3=B8rup=

&=23160;

=46rom: Garrett D'Amore =5Bm= ailto:garrett=40damore.org=5D
Sent: Tuesday, 26 March 2024 15.19

This could work. Not that we would like to hav= e the exceptional case of IPv6 use less headroom.&=23160; &=23160;So we w= ould say 40 is our compiled in default and then we reduce it by 20 on IPv= 6 which doesn=E2=80=99t have to support all the same devices that IPv4 do= es. This would give the lowest disruption to the existing IPv4 stack and = allow PMDs to updated incrementally.&=23160;

On Mar 26, 2024 at 1:05=E2=80=AFAM -0700, Mort= en Br=C3=B8rup <mb=40smartsharesystems.com>, wrote:

Interesting requirement. I can easily= imagine how a (non-forwarding, i.e. traffic terminating) application, wh= ich doesn=E2=80=99t really care about the preceding headers, can benefit = from having its actual data at a specific offset for alignment purposes. = I don=E2=80=99t consider this very exotic. (Even the Linux kernel uses th= is trick to achieve improved IP header alignment on RX.)

&=23160;

I think the proper solution would be = to add a new offload parameter to rte=5Feth=5Frxconf to specify how many = bytes the driver should subtract from RTE=5FPKTMBU=46=5FHEADROOM when wri= ting the RX descriptor to the NIC hardware. Depending on driver support, = this would make it configurable per device and per RX queue.

&=23160;

If this parameter is set, the driver = should adjust m->data=5Foff accordingly on RX, so rte=5Fpktmbuf=5Fmtod= =5B=5Foffset=5D() and rte=5Fpktmbuf=5Fiova=5B=5Foffset=5D() still point t= o the Ethernet header.

&=23160;

&=23160;

Med venlig hilsen / Kind regards,

-= Morten Br=C3=B8rup

&= =23160;

=46rom: Garrett D'Amore =5Bmailto:garrett=40da= more.org=5D
Sent: Monday, 25 March 2024 23.56

So we need (for reasons that I don't want to get to into i= n too much detail) that our UDP payload headers are at a specific offset = in the packet.

This was not a problem as long as we only used IPv4.&=23160;&=23160;(We h= ave configured 40 bytes of headroom, which is more than any of our PMDs n= eed by a hefty margin.)

Now that we're extending to support IPv6, we need to reduce that headroom= by 20 bytes, to preserve our UDP payload offset.

This has big ramifications for how we fragment our own upper layer messag= es, and it has been determined that updating the PMDs to allow us to chan= ge the headroom for this use case (on a per port basis, as we will have s= ome ports on IPv4 and others on IPv6) is the least effort, but a large ma= rgin.&=23160;&=23160;(Well, copying the frames via memcpy would be less d= evelopment effort, but would be a performance catastrophe.)

=46or transmit side we don't need this, as we can simply adjust the packe= t as needed.&=23160;&=23160;But for the receive side, we are kind of stuc= k, as the PMDs rely on the hard coded RTE=5FPKTMBU=46=5FHEADROOM to progr= am receive locations.

As far as header splitting, that would indeed be a much much nicer soluti= on.

I haven't looked in the latest code to see if header splitting is even an= option -- the version of the DPDK I'm working with is a little older (20= .11) -- we have to update but we have other local changes and so updating= is one of the things that we still have to do.

At any rate, the version I did look at doesn't seem to support header spl= its on any device other than =46M10K.&=23160;&=23160;That's not terrifica= lly interesting for us.&=23160;&=23160;We use Mellanox, E810 (ICE), bnxt,= cloud NICs (all of them really -- ENA, virtio-net, etc.)&=23160; &=23160= ;We also have a fair amount of ixgbe and i40e on client systems in the fi= eld.

We also, unfortunately, have an older DPDK 18 with Mellanox contributions= for IPoverIB.... though I'm not sure we will try to support IPv6 there.&= =23160;&=23160;(We are working towards replacing that part of stack with = UCX.)

Unless header splitting will work on all of this (excepting the IPoIB pie= ce), then it's not something we can really use.

On Mar 25, 2024 at 10:20=E2=80=AFAM -0700, Stephen Hemminger <= ;stephen=40networkplumber.org>, wrote:

On Mon, 25 Mar 2024 10:01:52 +0000
Bruce Richardson <bruce.richardson=40intel.com> wrote:

On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:=

> So we right now (at WEKA) have a somewhat older versi= on of DPDK that we
> have customized heavily, and I am going to to need to to make the > headroom *dynamic* (passed in at run time, and per port.)
> We have this requirement because we need payload to be at a specific=
> offset, but have to deal with different header lengths for IPv4 and = now
> IPv6.
> My reason for pointing this out, is that I would dearly like if we > could collaborate on this -- this change is going to touch pretty mu= ch
> every PMD (we don't need it on all of them as we only support a subs= et
> of PMDs, but its still a significant set.)
> I'm not sure if anyone else has considered such a need -- this
= > particular message caught my eye as I'm looking specifically in this=
> area right now.
>

Hi

thanks for reaching out. Can you clarify a little more as to the need for=
this requirement=3F Can you not just set the headroom value to the max ne= eded
value for any port and use that=3F Is there an issue with having blank sp= ace
at the start of a buffer=3F

Thanks,
/Bruce


If you have to make such a deep change across all PMD's then maybe
it is not the best solution. What about being able to do some form of buf= fer
chaining or pullup.

--66030947_50d7736e_118d0--