meson option to customize RTE_PKTMBUF

DPDK patches and discussions
 help / color / mirror / Atom feed

* meson option to customize RTE_PKTMBUF_HEADROOM patch
@ 2024-02-15 19:02 Parthakumar Roy
  2024-02-16  7:52 ` David Marchand
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Parthakumar Roy @ 2024-02-15 19:02 UTC (permalink / raw)
  To: dev

[-- Attachment #1: Type: text/plain, Size: 2258 bytes --]

Hello,
Bruce Richardson suggested that I submit this patch - at IBM we needed to adjust the pkt_mbuf_headroom for our application to work. This is my first ever patch through a mailing list, I have only done it through Pull Requests before, so let me know if I need to correct something.

Message:
    Add meson configuration option to adjust RTE_PKTMBUF_HEADROOM


diff --git a/config/meson.build b/config/meson.build

index 7cd375e991..43b765ade1 100644

--- a/config/meson.build

+++ b/config/meson.build

@@ -304,6 +304,7 @@ endforeach

 dpdk_conf.set('RTE_MAX_ETHPORTS', get_option('max_ethports'))

 dpdk_conf.set('RTE_LIBEAL_USE_HPET', get_option('use_hpet'))

 dpdk_conf.set('RTE_ENABLE_TRACE_FP', get_option('enable_trace_fp'))

+dpdk_conf.set('RTE_PKTMBUF_HEADROOM', get_option('pkt_mbuf_headroom'))

 # values which have defaults which may be overridden

 dpdk_conf.set('RTE_MAX_VFIO_GROUPS', 64)

 dpdk_conf.set('RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB', 64)

diff --git a/config/rte_config.h b/config/rte_config.h

index 7b8c85e948..a2bb4ea61b 100644

--- a/config/rte_config.h

+++ b/config/rte_config.h

@@ -51,7 +51,6 @@



 /* mbuf defines */

 #define RTE_MBUF_DEFAULT_MEMPOOL_OPS "ring_mp_mc"

-#define RTE_PKTMBUF_HEADROOM 128



 /* ether defines */

 #define RTE_MAX_QUEUES_PER_PORT 1024

diff --git a/meson_options.txt b/meson_options.txt

index 08528492f7..169fcc94c7 100644

--- a/meson_options.txt

+++ b/meson_options.txt

@@ -36,6 +36,8 @@ option('machine', type: 'string', value: 'auto', description:

        'Alias of cpu_instruction_set.')

 option('max_ethports', type: 'integer', value: 32, description:

        'maximum number of Ethernet devices')

+option('pkt_mbuf_headroom', type: 'integer', value: 128, description:

+       'number of bytes skipped on Rx at the start of the packet buffer to leave room for additional packet headers')

 option('max_lcores', type: 'string', value: 'default', description:

        'Set maximum number of cores/threads supported by EAL; "default" is different per-arch, "detect" detects the number of cores on the build machine.')

 option('max_numa_nodes', type: 'string', value: 'default', description:


[-- Attachment #2: Type: text/html, Size: 10100 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-02-15 19:02 meson option to customize RTE_PKTMBUF_HEADROOM patch Parthakumar Roy
@ 2024-02-16  7:52 ` David Marchand
  2024-02-20 14:57 ` [PATCH v2] build: make buffer headroom configurable Parthakumar Roy
  2024-03-23 20:51 ` meson option to customize RTE_PKTMBUF_HEADROOM patch Garrett D'Amore
  2 siblings, 0 replies; 18+ messages in thread
From: David Marchand @ 2024-02-16  7:52 UTC (permalink / raw)
  To: Parthakumar Roy; +Cc: dev, Bruce Richardson

Hello,

On Thu, Feb 15, 2024 at 8:02 PM Parthakumar Roy <Parthakumar.Roy@ibm.com> wrote:
>
> Hello,
> Bruce Richardson suggested that I submit this patch - at IBM we needed to adjust the pkt_mbuf_headroom for our application to work. This is my first ever patch through a mailing list, I have only done it through Pull Requests before, so let me know if I need to correct something.

Thanks for the contribution.

I have some comments on the form.

First of all, there are issues (the change is included as part of a
multipart mail + html) in the format of the change itself, which
distracted our CI tools.
As a result, the patch was not recognised and can't be tested by the CI.

We also need a SoB.

Please have a look at the contributors guide (esp. chaptors 1 and 8):
https://doc.dpdk.org/guides/contributing/patches.html

>
> Message:
>     Add meson configuration option to adjust RTE_PKTMBUF_HEADROOM

You must describe with more detail the goal/usecase that you want to
address with a change.

For example here, I am curious to know how you need to adjust the headroom.
Is it for increasing it?
I consider the current headroom already quite huge (it is enough to
encapsulate 2 whole vxlan tunnel headers, for example).

-- 
David Marchand

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] build: make buffer headroom configurable
  2024-02-15 19:02 meson option to customize RTE_PKTMBUF_HEADROOM patch Parthakumar Roy
  2024-02-16  7:52 ` David Marchand
@ 2024-02-20 14:57 ` Parthakumar Roy
  2024-02-27 16:02   ` Bruce Richardson
  2024-03-23 20:51 ` meson option to customize RTE_PKTMBUF_HEADROOM patch Garrett D'Amore
  2 siblings, 1 reply; 18+ messages in thread
From: Parthakumar Roy @ 2024-02-20 14:57 UTC (permalink / raw)
  To: dev; +Cc: Parthakumar Roy

The default value for RTE_PKTMBUF_HEADROOM used to be set in
config/rte_config.h. This patch removes it from the file and
instead, a meson option, pkt_mbuf_headroom, is introduced to make
headroom tunable from the build process.

Signed-off-by: Parthakumar Roy <Parthakumar.Roy@ibm.com>
---
 config/meson.build  | 1 +
 config/rte_config.h | 1 -
 meson_options.txt   | 2 ++
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/config/meson.build b/config/meson.build
index a9ccd56deb..f90ff0a290 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -346,6 +346,7 @@ dpdk_conf.set('RTE_MAX_ETHPORTS', get_option('max_ethports'))
 dpdk_conf.set('RTE_LIBEAL_USE_HPET', get_option('use_hpet'))
 dpdk_conf.set('RTE_ENABLE_STDATOMIC', get_option('enable_stdatomic'))
 dpdk_conf.set('RTE_ENABLE_TRACE_FP', get_option('enable_trace_fp'))
+dpdk_conf.set('RTE_PKTMBUF_HEADROOM', get_option('pkt_mbuf_headroom'))
 # values which have defaults which may be overridden
 dpdk_conf.set('RTE_MAX_VFIO_GROUPS', 64)
 dpdk_conf.set('RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB', 64)
diff --git a/config/rte_config.h b/config/rte_config.h
index da265d7dd2..505199f2fe 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -51,7 +51,6 @@
 
 /* mbuf defines */
 #define RTE_MBUF_DEFAULT_MEMPOOL_OPS "ring_mp_mc"
-#define RTE_PKTMBUF_HEADROOM 128
 
 /* ether defines */
 #define RTE_MAX_QUEUES_PER_PORT 1024
diff --git a/meson_options.txt b/meson_options.txt
index 50a9d3669d..68f378c401 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -40,6 +40,8 @@ option('machine', type: 'string', value: 'auto', description:
        'Alias of cpu_instruction_set.')
 option('max_ethports', type: 'integer', value: 32, description:
        'maximum number of Ethernet devices')
+option('pkt_mbuf_headroom', type: 'integer', value: 128, description:
+       'number of bytes skipped on Rx at the start of the packet buffer to leave room for additional packet headers')
 option('max_lcores', type: 'string', value: 'default', description:
        'Set maximum number of cores/threads supported by EAL; "default" is different per-arch, "detect" detects the number of cores on the build machine.')
 option('max_numa_nodes', type: 'string', value: 'default', description:
-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] build: make buffer headroom configurable
  2024-02-20 14:57 ` [PATCH v2] build: make buffer headroom configurable Parthakumar Roy
@ 2024-02-27 16:02   ` Bruce Richardson
  2024-02-27 16:10     ` Morten Brørup
  0 siblings, 1 reply; 18+ messages in thread
From: Bruce Richardson @ 2024-02-27 16:02 UTC (permalink / raw)
  To: Parthakumar Roy; +Cc: dev, Parthakumar Roy

On Tue, Feb 20, 2024 at 02:57:35PM +0000, Parthakumar Roy wrote:
> The default value for RTE_PKTMBUF_HEADROOM used to be set in
> config/rte_config.h. This patch removes it from the file and
> instead, a meson option, pkt_mbuf_headroom, is introduced to make
> headroom tunable from the build process.
> 
> Signed-off-by: Parthakumar Roy <Parthakumar.Roy@ibm.com>
> ---
>  config/meson.build  | 1 +
>  config/rte_config.h | 1 -
>  meson_options.txt   | 2 ++
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
While this is not likely a setting that many will need to change, it's not
the first time I've heard of users looking to adjust it.

Because it's such a simple change, I'd support having this accessible for
easy config by end-users.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] build: make buffer headroom configurable
  2024-02-27 16:02   ` Bruce Richardson
@ 2024-02-27 16:10     ` Morten Brørup
  2024-03-06 16:45       ` Thomas Monjalon
  0 siblings, 1 reply; 18+ messages in thread
From: Morten Brørup @ 2024-02-27 16:10 UTC (permalink / raw)
  To: Bruce Richardson, Parthakumar Roy; +Cc: dev, Parthakumar Roy

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Tuesday, 27 February 2024 17.02
> 
> While this is not likely a setting that many will need to change, it's
> not
> the first time I've heard of users looking to adjust it.
> 
> Because it's such a simple change, I'd support having this accessible
> for
> easy config by end-users.
> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

+1 to what Bruce said.

Acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] build: make buffer headroom configurable
  2024-02-27 16:10     ` Morten Brørup
@ 2024-03-06 16:45       ` Thomas Monjalon
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Monjalon @ 2024-03-06 16:45 UTC (permalink / raw)
  To: Parthakumar Roy, Parthakumar Roy
  Cc: Bruce Richardson, dev, Morten Brørup

27/02/2024 17:10, Morten Brørup:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Tuesday, 27 February 2024 17.02
> > 
> > While this is not likely a setting that many will need to change, it's
> > not
> > the first time I've heard of users looking to adjust it.
> > 
> > Because it's such a simple change, I'd support having this accessible
> > for
> > easy config by end-users.
> > 
> > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> +1 to what Bruce said.
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>

Changed the definition to something shorter:
"Default data offset (in bytes) in a packet buffer to leave room for additional headers."

Applied, thanks.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-02-15 19:02 meson option to customize RTE_PKTMBUF_HEADROOM patch Parthakumar Roy
  2024-02-16  7:52 ` David Marchand
  2024-02-20 14:57 ` [PATCH v2] build: make buffer headroom configurable Parthakumar Roy
@ 2024-03-23 20:51 ` Garrett D'Amore
  2024-03-25 10:01   ` Bruce Richardson
  2 siblings, 1 reply; 18+ messages in thread
From: Garrett D'Amore @ 2024-03-23 20:51 UTC (permalink / raw)
  To: dev, Parthakumar Roy

[-- Attachment #1: Type: text/plain, Size: 3143 bytes --]

So we right now (at WEKA) have a somewhat older version of DPDK that we have customized heavily, and I am going to to need to to make the headroom *dynamic* (passed in at run time, and per port.)

We have this requirement because we need payload to be at a specific offset, but have to deal with different header lengths for IPv4 and now IPv6.

My reason for pointing this out, is that I would dearly like if we could collaborate on this -- this change is going to touch pretty much every PMD (we don't need it on all of them as we only support a subset of PMDs, but its still a significant set.)

I'm not sure if anyone else has considered such a need -- this particular message caught my eye as I'm looking specifically in this area right now.
On Feb 15, 2024 at 11:02 AM -0800, Parthakumar Roy <Parthakumar.Roy@ibm.com>, wrote:
> Hello,
> Bruce Richardson suggested that I submit this patch - at IBM we needed to adjust the pkt_mbuf_headroom for our application to work. This is my first ever patch through a mailing list, I have only done it through Pull Requests before, so let me know if I need to correct something.
>
> Message:
>     Add meson configuration option to adjust RTE_PKTMBUF_HEADROOM
>
> diff --git a/config/meson.build b/config/meson.build
> index 7cd375e991..43b765ade1 100644
> --- a/config/meson.build
> +++ b/config/meson.build
> @@ -304,6 +304,7 @@ endforeach
>  dpdk_conf.set('RTE_MAX_ETHPORTS', get_option('max_ethports'))
>  dpdk_conf.set('RTE_LIBEAL_USE_HPET', get_option('use_hpet'))
>  dpdk_conf.set('RTE_ENABLE_TRACE_FP', get_option('enable_trace_fp'))
> +dpdk_conf.set('RTE_PKTMBUF_HEADROOM', get_option('pkt_mbuf_headroom'))
>  # values which have defaults which may be overridden
>  dpdk_conf.set('RTE_MAX_VFIO_GROUPS', 64)
>  dpdk_conf.set('RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB', 64)
> diff --git a/config/rte_config.h b/config/rte_config.h
> index 7b8c85e948..a2bb4ea61b 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -51,7 +51,6 @@
>
>  /* mbuf defines */
>  #define RTE_MBUF_DEFAULT_MEMPOOL_OPS "ring_mp_mc"
> -#define RTE_PKTMBUF_HEADROOM 128
>
>  /* ether defines */
>  #define RTE_MAX_QUEUES_PER_PORT 1024
> diff --git a/meson_options.txt b/meson_options.txt
> index 08528492f7..169fcc94c7 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -36,6 +36,8 @@ option('machine', type: 'string', value: 'auto', description:
>         'Alias of cpu_instruction_set.')
>  option('max_ethports', type: 'integer', value: 32, description:
>         'maximum number of Ethernet devices')
> +option('pkt_mbuf_headroom', type: 'integer', value: 128, description:
> +       'number of bytes skipped on Rx at the start of the packet buffer to leave room for additional packet headers')
>  option('max_lcores', type: 'string', value: 'default', description:
>         'Set maximum number of cores/threads supported by EAL; "default" is different per-arch, "detect" detects the number of cores on the build machine.')
>  option('max_numa_nodes', type: 'string', value: 'default', description:
>

[-- Attachment #2: Type: text/html, Size: 10985 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-03-23 20:51 ` meson option to customize RTE_PKTMBUF_HEADROOM patch Garrett D'Amore
@ 2024-03-25 10:01   ` Bruce Richardson
  2024-03-25 17:20     ` Stephen Hemminger
  0 siblings, 1 reply; 18+ messages in thread
From: Bruce Richardson @ 2024-03-25 10:01 UTC (permalink / raw)
  To: Garrett D'Amore; +Cc: dev, Parthakumar Roy

On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
>    So we right now (at WEKA) have a somewhat older version of DPDK that we
>    have customized heavily, and I am going to to need to to make the
>    headroom *dynamic* (passed in at run time, and per port.)
>    We have this requirement because we need payload to be at a specific
>    offset, but have to deal with different header lengths for IPv4 and now
>    IPv6.
>    My reason for pointing this out, is that I would dearly like if we
>    could collaborate on this -- this change is going to touch pretty much
>    every PMD (we don't need it on all of them as we only support a subset
>    of PMDs, but its still a significant set.)
>    I'm not sure if anyone else has considered such a need -- this
>    particular message caught my eye as I'm looking specifically in this
>    area right now.
> 
Hi

thanks for reaching out. Can you clarify a little more as to the need for
this requirement? Can you not just set the headroom value to the max needed
value for any port and use that? Is there an issue with having blank space
at the start of a buffer?

Thanks,
/Bruce

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-03-25 10:01   ` Bruce Richardson
@ 2024-03-25 17:20     ` Stephen Hemminger
  2024-03-25 22:56       ` Garrett D'Amore
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2024-03-25 17:20 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Garrett D'Amore, dev, Parthakumar Roy

On Mon, 25 Mar 2024 10:01:52 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
> >    So we right now (at WEKA) have a somewhat older version of DPDK that we
> >    have customized heavily, and I am going to to need to to make the
> >    headroom *dynamic* (passed in at run time, and per port.)
> >    We have this requirement because we need payload to be at a specific
> >    offset, but have to deal with different header lengths for IPv4 and now
> >    IPv6.
> >    My reason for pointing this out, is that I would dearly like if we
> >    could collaborate on this -- this change is going to touch pretty much
> >    every PMD (we don't need it on all of them as we only support a subset
> >    of PMDs, but its still a significant set.)
> >    I'm not sure if anyone else has considered such a need -- this
> >    particular message caught my eye as I'm looking specifically in this
> >    area right now.
> >   
> Hi
> 
> thanks for reaching out. Can you clarify a little more as to the need for
> this requirement? Can you not just set the headroom value to the max needed
> value for any port and use that? Is there an issue with having blank space
> at the start of a buffer?
> 
> Thanks,
> /Bruce

If you have to make such a deep change across all PMD's then maybe
it is not the best solution. What about being able to do some form of buffer
chaining or pullup.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-03-25 17:20     ` Stephen Hemminger
@ 2024-03-25 22:56       ` Garrett D'Amore
  2024-03-26  8:05         ` Morten Brørup
  0 siblings, 1 reply; 18+ messages in thread
From: Garrett D'Amore @ 2024-03-25 22:56 UTC (permalink / raw)
  To: Bruce Richardson, Stephen Hemminger; +Cc: dev, Parthakumar Roy

[-- Attachment #1: Type: text/plain, Size: 3718 bytes --]

So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet.

This was not a problem as long as we only used IPv4.  (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.)

Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset.

This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin.  (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.)

For transmit side we don't need this, as we can simply adjust the packet as needed.  But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations.

As far as header splitting, that would indeed be a much much nicer solution.

I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do.

At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K.  That's not terrifically interesting for us.  We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.)   We also have a fair amount of ixgbe and i40e on client systems in the field.

We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there.  (We are working towards replacing that part of stack with UCX.)

Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use.
On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote:
> On Mon, 25 Mar 2024 10:01:52 +0000
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
> > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
> > > > So we right now (at WEKA) have a somewhat older version of DPDK that we
> > > > have customized heavily, and I am going to to need to to make the
> > > > headroom *dynamic* (passed in at run time, and per port.)
> > > > We have this requirement because we need payload to be at a specific
> > > > offset, but have to deal with different header lengths for IPv4 and now
> > > > IPv6.
> > > > My reason for pointing this out, is that I would dearly like if we
> > > > could collaborate on this -- this change is going to touch pretty much
> > > > every PMD (we don't need it on all of them as we only support a subset
> > > > of PMDs, but its still a significant set.)
> > > > I'm not sure if anyone else has considered such a need -- this
> > > > particular message caught my eye as I'm looking specifically in this
> > > > area right now.
> > > >
> > Hi
> >
> > thanks for reaching out. Can you clarify a little more as to the need for
> > this requirement? Can you not just set the headroom value to the max needed
> > value for any port and use that? Is there an issue with having blank space
> > at the start of a buffer?
> >
> > Thanks,
> > /Bruce
>
> If you have to make such a deep change across all PMD's then maybe
> it is not the best solution. What about being able to do some form of buffer
> chaining or pullup.

[-- Attachment #2: Type: text/html, Size: 4553 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-03-25 22:56       ` Garrett D'Amore
@ 2024-03-26  8:05         ` Morten Brørup
  2024-03-26 14:19           ` Garrett D'Amore
  0 siblings, 1 reply; 18+ messages in thread
From: Morten Brørup @ 2024-03-26  8:05 UTC (permalink / raw)
  To: Garrett D'Amore, Bruce Richardson, Stephen Hemminger
  Cc: dev, Parthakumar Roy

[-- Attachment #1: Type: text/plain, Size: 4641 bytes --]

Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.)

I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue.

If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header.

Med venlig hilsen / Kind regards,

-Morten Brørup

From: Garrett D'Amore [mailto:garrett@damore.org] 
Sent: Monday, 25 March 2024 23.56

So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet.

This was not a problem as long as we only used IPv4.  (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.)

Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset.

This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin.  (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.)

For transmit side we don't need this, as we can simply adjust the packet as needed.  But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations.

As far as header splitting, that would indeed be a much much nicer solution.

I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do.

At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K.  That's not terrifically interesting for us.  We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.)   We also have a fair amount of ixgbe and i40e on client systems in the field.

We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there.  (We are working towards replacing that part of stack with UCX.)

Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use.

On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote:

On Mon, 25 Mar 2024 10:01:52 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:

> So we right now (at WEKA) have a somewhat older version of DPDK that we
> have customized heavily, and I am going to to need to to make the
> headroom *dynamic* (passed in at run time, and per port.)
> We have this requirement because we need payload to be at a specific
> offset, but have to deal with different header lengths for IPv4 and now
> IPv6.
> My reason for pointing this out, is that I would dearly like if we
> could collaborate on this -- this change is going to touch pretty much
> every PMD (we don't need it on all of them as we only support a subset
> of PMDs, but its still a significant set.)
> I'm not sure if anyone else has considered such a need -- this
> particular message caught my eye as I'm looking specifically in this
> area right now.
>

Hi

thanks for reaching out. Can you clarify a little more as to the need for
this requirement? Can you not just set the headroom value to the max needed
value for any port and use that? Is there an issue with having blank space
at the start of a buffer?

Thanks,
/Bruce

If you have to make such a deep change across all PMD's then maybe
it is not the best solution. What about being able to do some form of buffer
chaining or pullup.

[-- Attachment #2: Type: text/html, Size: 8422 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-03-26  8:05         ` Morten Brørup
@ 2024-03-26 14:19           ` Garrett D'Amore
  2024-03-26 15:06             ` Morten Brørup
  2024-03-26 16:14             ` Konstantin Ananyev
  0 siblings, 2 replies; 18+ messages in thread
From: Garrett D'Amore @ 2024-03-26 14:19 UTC (permalink / raw)
  To: Bruce Richardson, Stephen Hemminger, Morten Brørup
  Cc: dev, Parthakumar Roy

[-- Attachment #1: Type: text/plain, Size: 5162 bytes --]

This could work. Not that we would like to have the exceptional case of IPv6 use less headroom.   So we would say 40 is our compiled in default and then we reduce it by 20 on IPv6 which doesn’t have to support all the same devices that IPv4 does. This would give the lowest disruption to the existing IPv4 stack and allow PMDs to updated incrementally.
On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <mb@smartsharesystems.com>, wrote:
> Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.)
>
> I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue.
>
> If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header.
>
>
> Med venlig hilsen / Kind regards,
> -Morten Brørup
>
> From: Garrett D'Amore [mailto:garrett@damore.org]
> Sent: Monday, 25 March 2024 23.56
>
> So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet.
>
> This was not a problem as long as we only used IPv4.  (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.)
>
> Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset.
>
> This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin.  (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.)
>
> For transmit side we don't need this, as we can simply adjust the packet as needed.  But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations.
>
> As far as header splitting, that would indeed be a much much nicer solution.
>
> I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do.
>
> At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K.  That's not terrifically interesting for us.  We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.)   We also have a fair amount of ixgbe and i40e on client systems in the field.
>
> We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there.  (We are working towards replacing that part of stack with UCX.)
>
> Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use.
> On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote:
>
> On Mon, 25 Mar 2024 10:01:52 +0000
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
>
> On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
>
> > So we right now (at WEKA) have a somewhat older version of DPDK that we
> > have customized heavily, and I am going to to need to to make the
> > headroom *dynamic* (passed in at run time, and per port.)
> > We have this requirement because we need payload to be at a specific
> > offset, but have to deal with different header lengths for IPv4 and now
> > IPv6.
> > My reason for pointing this out, is that I would dearly like if we
> > could collaborate on this -- this change is going to touch pretty much
> > every PMD (we don't need it on all of them as we only support a subset
> > of PMDs, but its still a significant set.)
> > I'm not sure if anyone else has considered such a need -- this
> > particular message caught my eye as I'm looking specifically in this
> > area right now.
> >
> Hi
>
> thanks for reaching out. Can you clarify a little more as to the need for
> this requirement? Can you not just set the headroom value to the max needed
> value for any port and use that? Is there an issue with having blank space
> at the start of a buffer?
>
> Thanks,
> /Bruce
>
> If you have to make such a deep change across all PMD's then maybe
> it is not the best solution. What about being able to do some form of buffer
> chaining or pullup.

[-- Attachment #2: Type: text/html, Size: 7829 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-03-26 14:19           ` Garrett D'Amore
@ 2024-03-26 15:06             ` Morten Brørup
  2024-03-26 17:43               ` Garrett D'Amore
  2024-03-26 16:14             ` Konstantin Ananyev
  1 sibling, 1 reply; 18+ messages in thread
From: Morten Brørup @ 2024-03-26 15:06 UTC (permalink / raw)
  To: Garrett D'Amore
  Cc: dev, Parthakumar Roy, Bruce Richardson, Stephen Hemminger

[-- Attachment #1: Type: text/plain, Size: 6283 bytes --]

Something just struck me…

The buffer address field in the RX descriptor of some NICs may have alignment requirements, i.e. the lowest bits in the buffer address field of the NIC’s RX descriptor may be used for other purposes (and assumed zero for buffer address purposes). 40 is divisible by 8, but offset 20 requires that the NIC hardware supports 4-byte aligned addresses (so only the 2 lowest bits may be used for other purposes).

Here’s an example of what I mean:

https://docs.amd.com/r/en-US/am011-versal-acap-trm/RX-Descriptor-Words <https://docs.amd.com/r/en-US/am011-versal-acap-trm/RX-Descriptor-Words> 

If any of your supported NICs have that restriction, i.e. requires an 8 byte aligned buffer address, your concept of having the UDP payload at the same fixed offset for both IPv4 and IPv6 is not going to be possible. (And you were lucky that the offset happens to be sufficiently aligned to work for IPv4 to begin with.)

It seems you need to read a bunch of datasheets before proceeding.

Med venlig hilsen / Kind regards,

-Morten Brørup

From: Garrett D'Amore [mailto:garrett@damore.org] 
Sent: Tuesday, 26 March 2024 15.19

This could work. Not that we would like to have the exceptional case of IPv6 use less headroom.   So we would say 40 is our compiled in default and then we reduce it by 20 on IPv6 which doesn’t have to support all the same devices that IPv4 does. This would give the lowest disruption to the existing IPv4 stack and allow PMDs to updated incrementally. 

On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <mb@smartsharesystems.com>, wrote:

Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.)

I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue.

If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header.

Med venlig hilsen / Kind regards,

-Morten Brørup

From: Garrett D'Amore [mailto:garrett@damore.org]
Sent: Monday, 25 March 2024 23.56

So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet.

This was not a problem as long as we only used IPv4.  (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.)

Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset.

This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin.  (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.)

For transmit side we don't need this, as we can simply adjust the packet as needed.  But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations.

As far as header splitting, that would indeed be a much much nicer solution.

I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do.

At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K.  That's not terrifically interesting for us.  We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.)   We also have a fair amount of ixgbe and i40e on client systems in the field.

We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there.  (We are working towards replacing that part of stack with UCX.)

Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use.

On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote:

On Mon, 25 Mar 2024 10:01:52 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:

> So we right now (at WEKA) have a somewhat older version of DPDK that we
> have customized heavily, and I am going to to need to to make the
> headroom *dynamic* (passed in at run time, and per port.)
> We have this requirement because we need payload to be at a specific
> offset, but have to deal with different header lengths for IPv4 and now
> IPv6.
> My reason for pointing this out, is that I would dearly like if we
> could collaborate on this -- this change is going to touch pretty much
> every PMD (we don't need it on all of them as we only support a subset
> of PMDs, but its still a significant set.)
> I'm not sure if anyone else has considered such a need -- this
> particular message caught my eye as I'm looking specifically in this
> area right now.
>

Hi

thanks for reaching out. Can you clarify a little more as to the need for
this requirement? Can you not just set the headroom value to the max needed
value for any port and use that? Is there an issue with having blank space
at the start of a buffer?

Thanks,
/Bruce

If you have to make such a deep change across all PMD's then maybe
it is not the best solution. What about being able to do some form of buffer
chaining or pullup.

[-- Attachment #2: Type: text/html, Size: 13408 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-03-26 15:06             ` Morten Brørup
@ 2024-03-26 17:43               ` Garrett D'Amore
  2024-03-26 20:35                 ` Stephen Hemminger
  0 siblings, 1 reply; 18+ messages in thread
From: Garrett D'Amore @ 2024-03-26 17:43 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, Parthakumar Roy, Bruce Richardson, Stephen Hemminger

[-- Attachment #1: Type: text/plain, Size: 6683 bytes --]

This had occurred to me as well.  I think most hardware DMA engines can align on 32-bit boundaries.  I've yet to see a device that actually requires 64-bit DMA alignment.  (But I have only looked at a subset  of devices, and most of the  ones I have looked at are not ones that would be considered 'modern'.)
On Mar 26, 2024 at 8:06 AM -0700, Morten Brørup <mb@smartsharesystems.com>, wrote:
> Something just struck me…
> The buffer address field in the RX descriptor of some NICs may have alignment requirements, i.e. the lowest bits in the buffer address field of the NIC’s RX descriptor may be used for other purposes (and assumed zero for buffer address purposes). 40 is divisible by 8, but offset 20 requires that the NIC hardware supports 4-byte aligned addresses (so only the 2 lowest bits may be used for other purposes).
>
> Here’s an example of what I mean:
> https://docs.amd.com/r/en-US/am011-versal-acap-trm/RX-Descriptor-Words
>
> If any of your supported NICs have that restriction, i.e. requires an 8 byte aligned buffer address, your concept of having the UDP payload at the same fixed offset for both IPv4 and IPv6 is not going to be possible. (And you were lucky that the offset happens to be sufficiently aligned to work for IPv4 to begin with.)
>
> It seems you need to read a bunch of datasheets before proceeding.
>
>
> Med venlig hilsen / Kind regards,
> -Morten Brørup
>
> From: Garrett D'Amore [mailto:garrett@damore.org]
> Sent: Tuesday, 26 March 2024 15.19
>
> This could work. Not that we would like to have the exceptional case of IPv6 use less headroom.   So we would say 40 is our compiled in default and then we reduce it by 20 on IPv6 which doesn’t have to support all the same devices that IPv4 does. This would give the lowest disruption to the existing IPv4 stack and allow PMDs to updated incrementally.
> On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <mb@smartsharesystems.com>, wrote:
>
> Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.)
>
> I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue.
>
> If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header.
>
>
> Med venlig hilsen / Kind regards,
> -Morten Brørup
>
> From: Garrett D'Amore [mailto:garrett@damore.org]
> Sent: Monday, 25 March 2024 23.56
> So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet.
>
> This was not a problem as long as we only used IPv4.  (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.)
>
> Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset.
>
> This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin.  (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.)
>
> For transmit side we don't need this, as we can simply adjust the packet as needed.  But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations.
>
> As far as header splitting, that would indeed be a much much nicer solution.
>
> I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do.
>
> At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K.  That's not terrifically interesting for us.  We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.)   We also have a fair amount of ixgbe and i40e on client systems in the field.
>
> We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there.  (We are working towards replacing that part of stack with UCX.)
>
> Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use.
> On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote:
> On Mon, 25 Mar 2024 10:01:52 +0000
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
> On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
> > So we right now (at WEKA) have a somewhat older version of DPDK that we
> > have customized heavily, and I am going to to need to to make the
> > headroom *dynamic* (passed in at run time, and per port.)
> > We have this requirement because we need payload to be at a specific
> > offset, but have to deal with different header lengths for IPv4 and now
> > IPv6.
> > My reason for pointing this out, is that I would dearly like if we
> > could collaborate on this -- this change is going to touch pretty much
> > every PMD (we don't need it on all of them as we only support a subset
> > of PMDs, but its still a significant set.)
> > I'm not sure if anyone else has considered such a need -- this
> > particular message caught my eye as I'm looking specifically in this
> > area right now.
> >
> Hi
>
> thanks for reaching out. Can you clarify a little more as to the need for
> this requirement? Can you not just set the headroom value to the max needed
> value for any port and use that? Is there an issue with having blank space
> at the start of a buffer?
>
> Thanks,
> /Bruce
>
> If you have to make such a deep change across all PMD's then maybe
> it is not the best solution. What about being able to do some form of buffer
> chaining or pullup.

[-- Attachment #2: Type: text/html, Size: 12594 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-03-26 17:43               ` Garrett D'Amore
@ 2024-03-26 20:35                 ` Stephen Hemminger
  2024-03-26 21:10                   ` Garrett D'Amore
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2024-03-26 20:35 UTC (permalink / raw)
  To: Garrett D'Amore
  Cc: Morten Brørup, dev, Parthakumar Roy, Bruce Richardson

On Tue, 26 Mar 2024 10:43:30 -0700
Garrett D'Amore <garrett@damore.org> wrote:

> This had occurred to me as well.  I think most hardware DMA engines can align on 32-bit boundaries.  I've yet to see a device that actually requires 64-bit DMA alignment.  (But I have only looked at a subset  of devices, and most of the  ones I have looked at are not ones that would be considered 'modern'.)

There maybe a PCI transaction performance penalty if not aligned.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-03-26 20:35                 ` Stephen Hemminger
@ 2024-03-26 21:10                   ` Garrett D'Amore
  0 siblings, 0 replies; 18+ messages in thread
From: Garrett D'Amore @ 2024-03-26 21:10 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Morten Brørup, dev, Parthakumar Roy, Bruce Richardson

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

Fair enough... I think that is just something we're going to have to live with.

The other solutions are either much more painful, or much more work.

If we can use header/buffer splitting that would be superior.  Right now we can't use that everywhere because it isn't available everywhere.
On Mar 26, 2024 at 1:35 PM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote:
> On Tue, 26 Mar 2024 10:43:30 -0700
> Garrett D'Amore <garrett@damore.org> wrote:
>
> > This had occurred to me as well.  I think most hardware DMA engines can align on 32-bit boundaries.  I've yet to see a device that actually requires 64-bit DMA alignment.  (But I have only looked at a subset  of devices, and most of the  ones I have looked at are not ones that would be considered 'modern'.)
>
> There maybe a PCI transaction performance penalty if not aligned.

[-- Attachment #2: Type: text/html, Size: 1397 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-03-26 14:19           ` Garrett D'Amore
  2024-03-26 15:06             ` Morten Brørup
@ 2024-03-26 16:14             ` Konstantin Ananyev
  2024-03-26 17:44               ` Garrett D'Amore
  1 sibling, 1 reply; 18+ messages in thread
From: Konstantin Ananyev @ 2024-03-26 16:14 UTC (permalink / raw)
  To: Garrett D'Amore, Bruce Richardson, Stephen Hemminger,
	Morten Brørup
  Cc: dev, Parthakumar Roy

[-- Attachment #1: Type: text/plain, Size: 5827 bytes --]

Just wonder what would happen if you’ll receive an ipv6 packet with options or some fancy encapsulation IP-IP or so?
BTW, there is an  RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload:
https://doc.dpdk.org/api/structrte__eth__rxseg__split.html
Which might be close to what you are looking for, but right now it is supported by mlx5 PMD only.

From: Garrett D'Amore <garrett@damore.org>
Sent: Tuesday, March 26, 2024 2:19 PM
To: Bruce Richardson <bruce.richardson@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; Morten Brørup <mb@smartsharesystems.com>
Cc: dev@dpdk.org; Parthakumar Roy <Parthakumar.Roy@ibm.com>
Subject: RE: meson option to customize RTE_PKTMBUF_HEADROOM patch

This could work. Not that we would like to have the exceptional case of IPv6 use less headroom.   So we would say 40 is our compiled in default and then we reduce it by 20 on IPv6 which doesn’t have to support all the same devices that IPv4 does. This would give the lowest disruption to the existing IPv4 stack and allow PMDs to updated incrementally.
On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <mb@smartsharesystems.com<mailto:mb@smartsharesystems.com>>, wrote:

Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.)

I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue.

If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header.

Med venlig hilsen / Kind regards,
-Morten Brørup

From: Garrett D'Amore [mailto:garrett@damore.org]
Sent: Monday, 25 March 2024 23.56
So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet.

This was not a problem as long as we only used IPv4.  (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.)

Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset.

This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin.  (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.)

For transmit side we don't need this, as we can simply adjust the packet as needed.  But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations.

As far as header splitting, that would indeed be a much much nicer solution.

I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do.

At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K.  That's not terrifically interesting for us.  We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.)   We also have a fair amount of ixgbe and i40e on client systems in the field.

We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there.  (We are working towards replacing that part of stack with UCX.)

Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use.
On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org<mailto:stephen@networkplumber.org>>, wrote:
On Mon, 25 Mar 2024 10:01:52 +0000
Bruce Richardson <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote:

On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
> So we right now (at WEKA) have a somewhat older version of DPDK that we
> have customized heavily, and I am going to to need to to make the
> headroom *dynamic* (passed in at run time, and per port.)
> We have this requirement because we need payload to be at a specific
> offset, but have to deal with different header lengths for IPv4 and now
> IPv6.
> My reason for pointing this out, is that I would dearly like if we
> could collaborate on this -- this change is going to touch pretty much
> every PMD (we don't need it on all of them as we only support a subset
> of PMDs, but its still a significant set.)
> I'm not sure if anyone else has considered such a need -- this
> particular message caught my eye as I'm looking specifically in this
> area right now.
>
Hi

thanks for reaching out. Can you clarify a little more as to the need for
this requirement? Can you not just set the headroom value to the max needed
value for any port and use that? Is there an issue with having blank space
at the start of a buffer?

Thanks,
/Bruce

If you have to make such a deep change across all PMD's then maybe
it is not the best solution. What about being able to do some form of buffer
chaining or pullup.

[-- Attachment #2: Type: text/html, Size: 12357 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch
  2024-03-26 16:14             ` Konstantin Ananyev
@ 2024-03-26 17:44               ` Garrett D'Amore
  0 siblings, 0 replies; 18+ messages in thread
From: Garrett D'Amore @ 2024-03-26 17:44 UTC (permalink / raw)
  To: Bruce Richardson, Stephen Hemminger, Morten Brørup,
	Konstantin Ananyev
  Cc: dev, Parthakumar Roy

[-- Attachment #1: Type: text/plain, Size: 6171 bytes --]

This ETH_RX_OFFLOAD_BUFFER_SPLIT sounds promising indeed.
On Mar 26, 2024 at 9:14 AM -0700, Konstantin Ananyev <konstantin.ananyev@huawei.com>, wrote:
> Just wonder what would happen if you’ll receive an ipv6 packet with options or some fancy encapsulation IP-IP or so?
> BTW, there is an  RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload:
> https://doc.dpdk.org/api/structrte__eth__rxseg__split.html
> Which might be close to what you are looking for, but right now it is supported by mlx5 PMD only.
>
> From: Garrett D'Amore <garrett@damore.org>
> Sent: Tuesday, March 26, 2024 2:19 PM
> To: Bruce Richardson <bruce.richardson@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; Morten Brørup <mb@smartsharesystems.com>
> Cc: dev@dpdk.org; Parthakumar Roy <Parthakumar.Roy@ibm.com>
> Subject: RE: meson option to customize RTE_PKTMBUF_HEADROOM patch
>
> This could work. Not that we would like to have the exceptional case of IPv6 use less headroom.   So we would say 40 is our compiled in default and then we reduce it by 20 on IPv6 which doesn’t have to support all the same devices that IPv4 does. This would give the lowest disruption to the existing IPv4 stack and allow PMDs to updated incrementally.
> On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <mb@smartsharesystems.com>, wrote:
>
> > quote_type
> > Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.)
> >
> > I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue.
> >
> > If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header.
> >
> >
> > Med venlig hilsen / Kind regards,
> > -Morten Brørup
> >
> > From: Garrett D'Amore [mailto:garrett@damore.org]
> > Sent: Monday, 25 March 2024 23.56
> > So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet.
> >
> > This was not a problem as long as we only used IPv4.  (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.)
> >
> > Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset.
> >
> > This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin.  (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.)
> >
> > For transmit side we don't need this, as we can simply adjust the packet as needed.  But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations.
> >
> > As far as header splitting, that would indeed be a much much nicer solution.
> >
> > I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do.
> >
> > At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K.  That's not terrifically interesting for us.  We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.)   We also have a fair amount of ixgbe and i40e on client systems in the field.
> >
> > We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there.  (We are working towards replacing that part of stack with UCX.)
> >
> > Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use.
> > On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote:
> > On Mon, 25 Mar 2024 10:01:52 +0000
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >
> > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
> > > So we right now (at WEKA) have a somewhat older version of DPDK that we
> > > have customized heavily, and I am going to to need to to make the
> > > headroom *dynamic* (passed in at run time, and per port.)
> > > We have this requirement because we need payload to be at a specific
> > > offset, but have to deal with different header lengths for IPv4 and now
> > > IPv6.
> > > My reason for pointing this out, is that I would dearly like if we
> > > could collaborate on this -- this change is going to touch pretty much
> > > every PMD (we don't need it on all of them as we only support a subset
> > > of PMDs, but its still a significant set.)
> > > I'm not sure if anyone else has considered such a need -- this
> > > particular message caught my eye as I'm looking specifically in this
> > > area right now.
> > >
> > Hi
> >
> > thanks for reaching out. Can you clarify a little more as to the need for
> > this requirement? Can you not just set the headroom value to the max needed
> > value for any port and use that? Is there an issue with having blank space
> > at the start of a buffer?
> >
> > Thanks,
> > /Bruce
> >
> > If you have to make such a deep change across all PMD's then maybe
> > it is not the best solution. What about being able to do some form of buffer
> > chaining or pullup.

[-- Attachment #2: Type: text/html, Size: 10658 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-03-26 21:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-15 19:02 meson option to customize RTE_PKTMBUF_HEADROOM patch Parthakumar Roy
2024-02-16  7:52 ` David Marchand
2024-02-20 14:57 ` [PATCH v2] build: make buffer headroom configurable Parthakumar Roy
2024-02-27 16:02   ` Bruce Richardson
2024-02-27 16:10     ` Morten Brørup
2024-03-06 16:45       ` Thomas Monjalon
2024-03-23 20:51 ` meson option to customize RTE_PKTMBUF_HEADROOM patch Garrett D'Amore
2024-03-25 10:01   ` Bruce Richardson
2024-03-25 17:20     ` Stephen Hemminger
2024-03-25 22:56       ` Garrett D'Amore
2024-03-26  8:05         ` Morten Brørup
2024-03-26 14:19           ` Garrett D'Amore
2024-03-26 15:06             ` Morten Brørup
2024-03-26 17:43               ` Garrett D'Amore
2024-03-26 20:35                 ` Stephen Hemminger
2024-03-26 21:10                   ` Garrett D'Amore
2024-03-26 16:14             ` Konstantin Ananyev
2024-03-26 17:44               ` Garrett D'Amore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).