From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 03E7C43D54; Tue, 26 Mar 2024 15:19:18 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9415C40265; Tue, 26 Mar 2024 15:19:17 +0100 (CET) Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by mails.dpdk.org (Postfix) with ESMTP id 3ED2840261 for ; Tue, 26 Mar 2024 15:19:16 +0100 (CET) Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-2a072747fc6so1253647a91.2 for ; Tue, 26 Mar 2024 07:19:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=damore-org.20230601.gappssmtp.com; s=20230601; t=1711462755; x=1712067555; darn=dpdk.org; h=mime-version:subject:references:in-reply-to:message-id:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=6orBHv9nvqWwoQodg29s2ulgXNNFOgaihfCAXt6qPCk=; b=rnQ9mad7GXdfxlScUTsf1ejSMN7IX11N9hcjpGwIENTBJ9pWqWI3U73bieJmSmTL1o 1Kmdf2iYgAhy8z+Xjc63wmS+3Ujb8BbbtOL0tdUH0mlvHVPAT5ZQx3Nk4/1GGzBR2u2z G9b2zQCFU0IJtUmvLBi+M6pO6cFIwiY0J5vk7y1F5KeNmryD8F4FLbwZajxWU07WHrAi H/b4LLfTglnM8mo2H5CPlVtNHrUY75fswSg6pCq3QQMzf94s7g2GUcA4NS9pq+2j0Fm+ OprOTUj8DuYlhKaLaxhigSKw0fw24QZ8ie1fysMOq+tLj41nH9WmesaUWD84zoJc1n3S Zspg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711462755; x=1712067555; h=mime-version:subject:references:in-reply-to:message-id:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6orBHv9nvqWwoQodg29s2ulgXNNFOgaihfCAXt6qPCk=; b=locpCQBI2NdGtO72VMLenkIC83a3XId/ZWm/vuPQg57fkx/gksboDPeaHrBMWIKw9L JaJB816WcGSxD1Gab/XB1HD4ayWruvNnV9VOE0xT6aIIberKk4kCL+KjPO/S6LY2NDtW UfLvOh+bR4oaTfl+zskhnJQjDElWSRF2ebgvieBm9ardj4iQsVzUA0+TF7w3+2hIzIwo IppC0VYlsDsGf4EsYyl7p6q6pfb8IJJBhDcL485EUAC3JtNJ68LKGbS+tGq3ixfvn/1I KS2avHEUpakArituSowrcNddH1m8P8qSPrrGvAmUgvvRU8U6DatGLjoqypCh7rxrSkLs y3Sg== X-Gm-Message-State: AOJu0Yyu1P4/rqh+0xTBhvoLaZQVnn59loVkmAuGNZzhL0xgCGcOl/+k rfj8L6ipEGxgmmIyGlRaHUgNzE2EXYF4rwZ3pSoHfZfj+RejleN9W4Xtrl23IGY= X-Google-Smtp-Source: AGHT+IFzHyr9TzWWZ7eznhuwpxSYGnCUq+Qhc9lw45QDd7/0FsFq8Bsgn66M7Hv70oNJqeHHd/EZbA== X-Received: by 2002:a17:90a:8c06:b0:2a0:6503:599c with SMTP id a6-20020a17090a8c0600b002a06503599cmr5807068pjo.20.1711462755114; Tue, 26 Mar 2024 07:19:15 -0700 (PDT) Received: from [10.0.4.187] (ip68-6-176-252.sd.sd.cox.net. [68.6.176.252]) by smtp.gmail.com with ESMTPSA id x17-20020a17090a531100b002a03da6286asm7596841pjh.35.2024.03.26.07.19.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Mar 2024 07:19:14 -0700 (PDT) Date: Tue, 26 Mar 2024 07:19:05 -0700 From: Garrett D'Amore To: Bruce Richardson , Stephen Hemminger , =?utf-8?Q?Morten_Br=C3=B8rup?= Cc: dev@dpdk.org, Parthakumar Roy Message-ID: In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F32D@smartserver.smartshare.dk> References: <20240325102030.46913a06@hermes.local> <98CBD80474FA8B44BF855DF32C47DC35E9F32D@smartserver.smartshare.dk> Subject: RE: meson option to customize RTE_PKTMBUF_HEADROOM patch X-Readdle-Message-ID: c384f102-1a03-4e51-8269-4e3c18ad78ba@Spark MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="6602d960_75fb4773_15bc" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --6602d960_75fb4773_15bc Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline This could work. Not that we would like to have the exceptional case of I= Pv6 use less headroom.=C2=A0 =C2=A0So we would say 40 is our compiled in = default and then we reduce it by 20 on IPv6 which doesn=E2=80=99t have to= support all the same devices that IPv4 does. This would give the lowest = disruption to the existing IPv4 stack and allow PMDs to updated increment= ally. On Mar 26, 2024 at 1:05=E2=80=AFAM -0700, Morten Br=C3=B8rup , wrote: > Interesting requirement. I can easily imagine how a (non-forwarding, i.= e. traffic terminating) application, which doesn=E2=80=99t really care ab= out the preceding headers, can benefit from having its actual data at a s= pecific offset for alignment purposes. I don=E2=80=99t consider this very= exotic. (Even the Linux kernel uses this trick to achieve improved IP he= ader alignment on RX.) > > I think the proper solution would be to add a new offload parameter to = rte=5Feth=5Frxconf to specify how many bytes the driver should subtract f= rom RTE=5FPKTMBU=46=5FHEADROOM when writing the RX descriptor to the NIC = hardware. Depending on driver support, this would make it configurable pe= r device and per RX queue. > > If this parameter is set, the driver should adjust m->data=5Foff accord= ingly on RX, so rte=5Fpktmbuf=5Fmtod=5B=5Foffset=5D() and rte=5Fpktmbuf=5F= iova=5B=5Foffset=5D() still point to the Ethernet header. > > > Med venlig hilsen / Kind regards, > -Morten Br=C3=B8rup > > =46rom: Garrett D'Amore =5Bmailto:garrett=40damore.org=5D > Sent: Monday, 25 March 2024 23.56 > > So we need (for reasons that I don't want to get to into in too much de= tail) that our UDP payload headers are at a specific offset in the packet= . > > This was not a problem as long as we only used IPv4.=C2=A0=C2=A0(We hav= e configured 40 bytes of headroom, which is more than any of our PMDs nee= d by a hefty margin.) > > Now that we're extending to support IPv6, we need to reduce that headro= om by 20 bytes, to preserve our UDP payload offset. > > This has big ramifications for how we fragment our own upper layer mess= ages, and it has been determined that updating the PMDs to allow us to ch= ange the headroom for this use case (on a per port basis, as we will have= some ports on IPv4 and others on IPv6) is the least effort, but a large = margin.=C2=A0=C2=A0(Well, copying the frames via memcpy would be less dev= elopment effort, but would be a performance catastrophe.) > > =46or transmit side we don't need this, as we can simply adjust the pac= ket as needed.=C2=A0=C2=A0But for the receive side, we are kind of stuck,= as the PMDs rely on the hard coded RTE=5FPKTMBU=46=5FHEADROOM to program= receive locations. > > As far as header splitting, that would indeed be a much much nicer solu= tion. > > I haven't looked in the latest code to see if header splitting is even = an option -- the version of the DPDK I'm working with is a little older (= 20.11) -- we have to update but we have other local changes and so updati= ng is one of the things that we still have to do. > > At any rate, the version I did look at doesn't seem to support header s= plits on any device other than =46M10K.=C2=A0=C2=A0That's not terrificall= y interesting for us.=C2=A0=C2=A0We use Mellanox, E810 (ICE), bnxt, cloud= NICs (all of them really -- ENA, virtio-net, etc.)=C2=A0 =C2=A0We also h= ave a fair amount of ixgbe and i40e on client systems in the field. > > We also, unfortunately, have an older DPDK 18 with Mellanox contributio= ns for IPoverIB.... though I'm not sure we will try to support IPv6 there= .=C2=A0=C2=A0(We are working towards replacing that part of stack with UC= X.) > > Unless header splitting will work on all of this (excepting the IPoIB p= iece), then it's not something we can really use. > On Mar 25, 2024 at 10:20=E2=80=AFAM -0700, Stephen Hemminger , wrote: > > On Mon, 25 Mar 2024 10:01:52 +0000 > Bruce Richardson wrote: > > > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > > > So we right now (at WEKA) have a somewhat older version of DPDK that = we > > have customized heavily, and I am going to to need to to make the > > headroom *dynamic* (passed in at run time, and per port.) > > We have this requirement because we need payload to be at a specific > > offset, but have to deal with different header lengths for IPv4 and n= ow > > IPv6. > > My reason for pointing this out, is that I would dearly like if we > > could collaborate on this -- this change is going to touch pretty muc= h > > every PMD (we don't need it on all of them as we only support a subse= t > > of PMDs, but its still a significant set.) > > I'm not sure if anyone else has considered such a need -- this > > particular message caught my eye as I'm looking specifically in this > > area right now. > > > Hi > > thanks for reaching out. Can you clarify a little more as to the need f= or > this requirement=3F Can you not just set the headroom value to the max = needed > value for any port and use that=3F Is there an issue with having blank = space > at the start of a buffer=3F > > Thanks, > /Bruce > > If you have to make such a deep change across all PMD's then maybe > it is not the best solution. What about being able to do some form of b= uffer > chaining or pullup. --6602d960_75fb4773_15bc Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline
This could work. Not that we would like to have the= exceptional case of IPv6 use less headroom.&=23160; &=23160;So we would = say 40 is our compiled in default and then we reduce it by 20 on IPv6 whi= ch doesn=E2=80=99t have to support all the same devices that IPv4 does. T= his would give the lowest disruption to the existing IPv4 stack and allow= PMDs to updated incrementally.&=23160;
On Mar 26, 2024 at 1:05=E2=80=AFAM = -0700, Morten Br=C3=B8rup <mb=40smartsharesystems.com>, wrote:

Interesting requirement. I can easil= y imagine how a (non-forwarding, i.e. traffic terminating) application, w= hich doesn=E2=80=99t really care about the preceding headers, can benefit= from having its actual data at a specific offset for alignment purposes.= I don=E2=80=99t consider this very exotic. (Even the Linux kernel uses t= his trick to achieve improved IP header alignment on RX.)

&=23160;

I think the proper solutio= n would be to add a new offload parameter to rte=5Feth=5Frxconf to specif= y how many bytes the driver should subtract from RTE=5FPKTMBU=46=5FHEADRO= OM when writing the RX descriptor to the NIC hardware. Depending on drive= r support, this would make it configurable per device and per RX queue.

&=23160;

If this parameter is set, = the driver should adjust m->data=5Foff accordingly on RX, so rte=5Fpkt= mbuf=5Fmtod=5B=5Foffset=5D() and rte=5Fpktmbuf=5Fiova=5B=5Foffset=5D() st= ill point to the Ethernet header.

&=23160;

&=23160;

Med venlig hilsen / Kind r= egards,

-Morten Br=C3=B8rup

&=23160;

=46rom: Garrett D'Amore =5Bm= ailto:garrett=40damore.org=5D
Sent: Monday, 25 March 2024 23.56

So we need (for reasons that I don't want to g= et to into in too much detail) that our UDP payload headers are at a spec= ific offset in the packet.

This was not a problem as long as we only used IPv4.&=23160;&=23160;(We h= ave configured 40 bytes of headroom, which is more than any of our PMDs n= eed by a hefty margin.)

Now that we're extending to support IPv6, we need to reduce that headroom= by 20 bytes, to preserve our UDP payload offset.

This has big ramifications for how we fragment our own upper layer messag= es, and it has been determined that updating the PMDs to allow us to chan= ge the headroom for this use case (on a per port basis, as we will have s= ome ports on IPv4 and others on IPv6) is the least effort, but a large ma= rgin.&=23160;&=23160;(Well, copying the frames via memcpy would be less d= evelopment effort, but would be a performance catastrophe.)

=46or transmit side we don't need this, as we can simply adjust the packe= t as needed.&=23160;&=23160;But for the receive side, we are kind of stuc= k, as the PMDs rely on the hard coded RTE=5FPKTMBU=46=5FHEADROOM to progr= am receive locations.

As far as header splitting, that would indeed be a much much nicer soluti= on.

I haven't looked in the latest code to see if header splitting is even an= option -- the version of the DPDK I'm working with is a little older (20= .11) -- we have to update but we have other local changes and so updating= is one of the things that we still have to do.

At any rate, the version I did look at doesn't seem to support header spl= its on any device other than =46M10K.&=23160;&=23160;That's not terrifica= lly interesting for us.&=23160;&=23160;We use Mellanox, E810 (ICE), bnxt,= cloud NICs (all of them really -- ENA, virtio-net, etc.)&=23160; &=23160= ;We also have a fair amount of ixgbe and i40e on client systems in the fi= eld.

We also, unfortunately, have an older DPDK 18 with Mellanox contributions= for IPoverIB.... though I'm not sure we will try to support IPv6 there.&= =23160;&=23160;(We are working towards replacing that part of stack with = UCX.)

Unless header splitting will work on all of this (excepting the IPoIB pie= ce), then it's not something we can really use.

On Mar 25, 2024 at 10:20=E2=80=AFAM -0700, Ste= phen Hemminger <stephen=40networkplumber.org>, wrote:

On Mon, 25 Mar 2024 10:01:52 +0000
Bruce Richardson <bruce.richardson=40intel.com> wrote:


On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garr= ett D'Amore wrote:

> So we right now (at WEKA) have a somewhat= older version of DPDK that we
> have customized heavily, and I am going to to need to to make the > headroom *dynamic* (passed in at run time, and per port.)
> We have this requirement because we need payload to be at a specific=
> offset, but have to deal with different header lengths for IPv4 and = now
> IPv6.
> My reason for pointing this out, is that I would dearly like if we > could collaborate on this -- this change is going to touch pretty mu= ch
> every PMD (we don't need it on all of them as we only support a subs= et
> of PMDs, but its still a significant set.)
> I'm not sure if anyone else has considered such a need -- this
= > particular message caught my eye as I'm looking specifically in this=
> area right now.
>

Hi

thanks for reaching out. Can you clarify a little more as to the need for=
this requirement=3F Can you not just set the headroom value to the max ne= eded
value for any port and use that=3F Is there an issue with having blank sp= ace
at the start of a buffer=3F

Thanks,
/Bruce


If you have to make such a deep change across all PMD's then maybe
it is not the best solution. What about being able to do some form of buf= fer
chaining or pullup.

--6602d960_75fb4773_15bc--