* meson option to customize RTE_PKTMBUF_HEADROOM patch @ 2024-02-15 19:02 Parthakumar Roy 2024-02-16 7:52 ` David Marchand ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Parthakumar Roy @ 2024-02-15 19:02 UTC (permalink / raw) To: dev [-- Attachment #1: Type: text/plain, Size: 2258 bytes --] Hello, Bruce Richardson suggested that I submit this patch - at IBM we needed to adjust the pkt_mbuf_headroom for our application to work. This is my first ever patch through a mailing list, I have only done it through Pull Requests before, so let me know if I need to correct something. Message: Add meson configuration option to adjust RTE_PKTMBUF_HEADROOM diff --git a/config/meson.build b/config/meson.build index 7cd375e991..43b765ade1 100644 --- a/config/meson.build +++ b/config/meson.build @@ -304,6 +304,7 @@ endforeach dpdk_conf.set('RTE_MAX_ETHPORTS', get_option('max_ethports')) dpdk_conf.set('RTE_LIBEAL_USE_HPET', get_option('use_hpet')) dpdk_conf.set('RTE_ENABLE_TRACE_FP', get_option('enable_trace_fp')) +dpdk_conf.set('RTE_PKTMBUF_HEADROOM', get_option('pkt_mbuf_headroom')) # values which have defaults which may be overridden dpdk_conf.set('RTE_MAX_VFIO_GROUPS', 64) dpdk_conf.set('RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB', 64) diff --git a/config/rte_config.h b/config/rte_config.h index 7b8c85e948..a2bb4ea61b 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -51,7 +51,6 @@ /* mbuf defines */ #define RTE_MBUF_DEFAULT_MEMPOOL_OPS "ring_mp_mc" -#define RTE_PKTMBUF_HEADROOM 128 /* ether defines */ #define RTE_MAX_QUEUES_PER_PORT 1024 diff --git a/meson_options.txt b/meson_options.txt index 08528492f7..169fcc94c7 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -36,6 +36,8 @@ option('machine', type: 'string', value: 'auto', description: 'Alias of cpu_instruction_set.') option('max_ethports', type: 'integer', value: 32, description: 'maximum number of Ethernet devices') +option('pkt_mbuf_headroom', type: 'integer', value: 128, description: + 'number of bytes skipped on Rx at the start of the packet buffer to leave room for additional packet headers') option('max_lcores', type: 'string', value: 'default', description: 'Set maximum number of cores/threads supported by EAL; "default" is different per-arch, "detect" detects the number of cores on the build machine.') option('max_numa_nodes', type: 'string', value: 'default', description: [-- Attachment #2: Type: text/html, Size: 10100 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-02-15 19:02 meson option to customize RTE_PKTMBUF_HEADROOM patch Parthakumar Roy @ 2024-02-16 7:52 ` David Marchand 2024-02-20 14:57 ` [PATCH v2] build: make buffer headroom configurable Parthakumar Roy 2024-03-23 20:51 ` meson option to customize RTE_PKTMBUF_HEADROOM patch Garrett D'Amore 2 siblings, 0 replies; 18+ messages in thread From: David Marchand @ 2024-02-16 7:52 UTC (permalink / raw) To: Parthakumar Roy; +Cc: dev, Bruce Richardson Hello, On Thu, Feb 15, 2024 at 8:02 PM Parthakumar Roy <Parthakumar.Roy@ibm.com> wrote: > > Hello, > Bruce Richardson suggested that I submit this patch - at IBM we needed to adjust the pkt_mbuf_headroom for our application to work. This is my first ever patch through a mailing list, I have only done it through Pull Requests before, so let me know if I need to correct something. Thanks for the contribution. I have some comments on the form. First of all, there are issues (the change is included as part of a multipart mail + html) in the format of the change itself, which distracted our CI tools. As a result, the patch was not recognised and can't be tested by the CI. We also need a SoB. Please have a look at the contributors guide (esp. chaptors 1 and 8): https://doc.dpdk.org/guides/contributing/patches.html > > Message: > Add meson configuration option to adjust RTE_PKTMBUF_HEADROOM You must describe with more detail the goal/usecase that you want to address with a change. For example here, I am curious to know how you need to adjust the headroom. Is it for increasing it? I consider the current headroom already quite huge (it is enough to encapsulate 2 whole vxlan tunnel headers, for example). -- David Marchand ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2] build: make buffer headroom configurable 2024-02-15 19:02 meson option to customize RTE_PKTMBUF_HEADROOM patch Parthakumar Roy 2024-02-16 7:52 ` David Marchand @ 2024-02-20 14:57 ` Parthakumar Roy 2024-02-27 16:02 ` Bruce Richardson 2024-03-23 20:51 ` meson option to customize RTE_PKTMBUF_HEADROOM patch Garrett D'Amore 2 siblings, 1 reply; 18+ messages in thread From: Parthakumar Roy @ 2024-02-20 14:57 UTC (permalink / raw) To: dev; +Cc: Parthakumar Roy The default value for RTE_PKTMBUF_HEADROOM used to be set in config/rte_config.h. This patch removes it from the file and instead, a meson option, pkt_mbuf_headroom, is introduced to make headroom tunable from the build process. Signed-off-by: Parthakumar Roy <Parthakumar.Roy@ibm.com> --- config/meson.build | 1 + config/rte_config.h | 1 - meson_options.txt | 2 ++ 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/config/meson.build b/config/meson.build index a9ccd56deb..f90ff0a290 100644 --- a/config/meson.build +++ b/config/meson.build @@ -346,6 +346,7 @@ dpdk_conf.set('RTE_MAX_ETHPORTS', get_option('max_ethports')) dpdk_conf.set('RTE_LIBEAL_USE_HPET', get_option('use_hpet')) dpdk_conf.set('RTE_ENABLE_STDATOMIC', get_option('enable_stdatomic')) dpdk_conf.set('RTE_ENABLE_TRACE_FP', get_option('enable_trace_fp')) +dpdk_conf.set('RTE_PKTMBUF_HEADROOM', get_option('pkt_mbuf_headroom')) # values which have defaults which may be overridden dpdk_conf.set('RTE_MAX_VFIO_GROUPS', 64) dpdk_conf.set('RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB', 64) diff --git a/config/rte_config.h b/config/rte_config.h index da265d7dd2..505199f2fe 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -51,7 +51,6 @@ /* mbuf defines */ #define RTE_MBUF_DEFAULT_MEMPOOL_OPS "ring_mp_mc" -#define RTE_PKTMBUF_HEADROOM 128 /* ether defines */ #define RTE_MAX_QUEUES_PER_PORT 1024 diff --git a/meson_options.txt b/meson_options.txt index 50a9d3669d..68f378c401 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -40,6 +40,8 @@ option('machine', type: 'string', value: 'auto', description: 'Alias of cpu_instruction_set.') option('max_ethports', type: 'integer', value: 32, description: 'maximum number of Ethernet devices') +option('pkt_mbuf_headroom', type: 'integer', value: 128, description: + 'number of bytes skipped on Rx at the start of the packet buffer to leave room for additional packet headers') option('max_lcores', type: 'string', value: 'default', description: 'Set maximum number of cores/threads supported by EAL; "default" is different per-arch, "detect" detects the number of cores on the build machine.') option('max_numa_nodes', type: 'string', value: 'default', description: -- 2.34.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] build: make buffer headroom configurable 2024-02-20 14:57 ` [PATCH v2] build: make buffer headroom configurable Parthakumar Roy @ 2024-02-27 16:02 ` Bruce Richardson 2024-02-27 16:10 ` Morten Brørup 0 siblings, 1 reply; 18+ messages in thread From: Bruce Richardson @ 2024-02-27 16:02 UTC (permalink / raw) To: Parthakumar Roy; +Cc: dev, Parthakumar Roy On Tue, Feb 20, 2024 at 02:57:35PM +0000, Parthakumar Roy wrote: > The default value for RTE_PKTMBUF_HEADROOM used to be set in > config/rte_config.h. This patch removes it from the file and > instead, a meson option, pkt_mbuf_headroom, is introduced to make > headroom tunable from the build process. > > Signed-off-by: Parthakumar Roy <Parthakumar.Roy@ibm.com> > --- > config/meson.build | 1 + > config/rte_config.h | 1 - > meson_options.txt | 2 ++ > 3 files changed, 3 insertions(+), 1 deletion(-) > While this is not likely a setting that many will need to change, it's not the first time I've heard of users looking to adjust it. Because it's such a simple change, I'd support having this accessible for easy config by end-users. Acked-by: Bruce Richardson <bruce.richardson@intel.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v2] build: make buffer headroom configurable 2024-02-27 16:02 ` Bruce Richardson @ 2024-02-27 16:10 ` Morten Brørup 2024-03-06 16:45 ` Thomas Monjalon 0 siblings, 1 reply; 18+ messages in thread From: Morten Brørup @ 2024-02-27 16:10 UTC (permalink / raw) To: Bruce Richardson, Parthakumar Roy; +Cc: dev, Parthakumar Roy > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > Sent: Tuesday, 27 February 2024 17.02 > > While this is not likely a setting that many will need to change, it's > not > the first time I've heard of users looking to adjust it. > > Because it's such a simple change, I'd support having this accessible > for > easy config by end-users. > > Acked-by: Bruce Richardson <bruce.richardson@intel.com> +1 to what Bruce said. Acked-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] build: make buffer headroom configurable 2024-02-27 16:10 ` Morten Brørup @ 2024-03-06 16:45 ` Thomas Monjalon 0 siblings, 0 replies; 18+ messages in thread From: Thomas Monjalon @ 2024-03-06 16:45 UTC (permalink / raw) To: Parthakumar Roy, Parthakumar Roy Cc: Bruce Richardson, dev, Morten Brørup 27/02/2024 17:10, Morten Brørup: > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > Sent: Tuesday, 27 February 2024 17.02 > > > > While this is not likely a setting that many will need to change, it's > > not > > the first time I've heard of users looking to adjust it. > > > > Because it's such a simple change, I'd support having this accessible > > for > > easy config by end-users. > > > > Acked-by: Bruce Richardson <bruce.richardson@intel.com> > > +1 to what Bruce said. > > Acked-by: Morten Brørup <mb@smartsharesystems.com> Changed the definition to something shorter: "Default data offset (in bytes) in a packet buffer to leave room for additional headers." Applied, thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-02-15 19:02 meson option to customize RTE_PKTMBUF_HEADROOM patch Parthakumar Roy 2024-02-16 7:52 ` David Marchand 2024-02-20 14:57 ` [PATCH v2] build: make buffer headroom configurable Parthakumar Roy @ 2024-03-23 20:51 ` Garrett D'Amore 2024-03-25 10:01 ` Bruce Richardson 2 siblings, 1 reply; 18+ messages in thread From: Garrett D'Amore @ 2024-03-23 20:51 UTC (permalink / raw) To: dev, Parthakumar Roy [-- Attachment #1: Type: text/plain, Size: 3143 bytes --] So we right now (at WEKA) have a somewhat older version of DPDK that we have customized heavily, and I am going to to need to to make the headroom *dynamic* (passed in at run time, and per port.) We have this requirement because we need payload to be at a specific offset, but have to deal with different header lengths for IPv4 and now IPv6. My reason for pointing this out, is that I would dearly like if we could collaborate on this -- this change is going to touch pretty much every PMD (we don't need it on all of them as we only support a subset of PMDs, but its still a significant set.) I'm not sure if anyone else has considered such a need -- this particular message caught my eye as I'm looking specifically in this area right now. On Feb 15, 2024 at 11:02 AM -0800, Parthakumar Roy <Parthakumar.Roy@ibm.com>, wrote: > Hello, > Bruce Richardson suggested that I submit this patch - at IBM we needed to adjust the pkt_mbuf_headroom for our application to work. This is my first ever patch through a mailing list, I have only done it through Pull Requests before, so let me know if I need to correct something. > > Message: > Add meson configuration option to adjust RTE_PKTMBUF_HEADROOM > > diff --git a/config/meson.build b/config/meson.build > index 7cd375e991..43b765ade1 100644 > --- a/config/meson.build > +++ b/config/meson.build > @@ -304,6 +304,7 @@ endforeach > dpdk_conf.set('RTE_MAX_ETHPORTS', get_option('max_ethports')) > dpdk_conf.set('RTE_LIBEAL_USE_HPET', get_option('use_hpet')) > dpdk_conf.set('RTE_ENABLE_TRACE_FP', get_option('enable_trace_fp')) > +dpdk_conf.set('RTE_PKTMBUF_HEADROOM', get_option('pkt_mbuf_headroom')) > # values which have defaults which may be overridden > dpdk_conf.set('RTE_MAX_VFIO_GROUPS', 64) > dpdk_conf.set('RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB', 64) > diff --git a/config/rte_config.h b/config/rte_config.h > index 7b8c85e948..a2bb4ea61b 100644 > --- a/config/rte_config.h > +++ b/config/rte_config.h > @@ -51,7 +51,6 @@ > > /* mbuf defines */ > #define RTE_MBUF_DEFAULT_MEMPOOL_OPS "ring_mp_mc" > -#define RTE_PKTMBUF_HEADROOM 128 > > /* ether defines */ > #define RTE_MAX_QUEUES_PER_PORT 1024 > diff --git a/meson_options.txt b/meson_options.txt > index 08528492f7..169fcc94c7 100644 > --- a/meson_options.txt > +++ b/meson_options.txt > @@ -36,6 +36,8 @@ option('machine', type: 'string', value: 'auto', description: > 'Alias of cpu_instruction_set.') > option('max_ethports', type: 'integer', value: 32, description: > 'maximum number of Ethernet devices') > +option('pkt_mbuf_headroom', type: 'integer', value: 128, description: > + 'number of bytes skipped on Rx at the start of the packet buffer to leave room for additional packet headers') > option('max_lcores', type: 'string', value: 'default', description: > 'Set maximum number of cores/threads supported by EAL; "default" is different per-arch, "detect" detects the number of cores on the build machine.') > option('max_numa_nodes', type: 'string', value: 'default', description: > [-- Attachment #2: Type: text/html, Size: 10985 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-03-23 20:51 ` meson option to customize RTE_PKTMBUF_HEADROOM patch Garrett D'Amore @ 2024-03-25 10:01 ` Bruce Richardson 2024-03-25 17:20 ` Stephen Hemminger 0 siblings, 1 reply; 18+ messages in thread From: Bruce Richardson @ 2024-03-25 10:01 UTC (permalink / raw) To: Garrett D'Amore; +Cc: dev, Parthakumar Roy On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > So we right now (at WEKA) have a somewhat older version of DPDK that we > have customized heavily, and I am going to to need to to make the > headroom *dynamic* (passed in at run time, and per port.) > We have this requirement because we need payload to be at a specific > offset, but have to deal with different header lengths for IPv4 and now > IPv6. > My reason for pointing this out, is that I would dearly like if we > could collaborate on this -- this change is going to touch pretty much > every PMD (we don't need it on all of them as we only support a subset > of PMDs, but its still a significant set.) > I'm not sure if anyone else has considered such a need -- this > particular message caught my eye as I'm looking specifically in this > area right now. > Hi thanks for reaching out. Can you clarify a little more as to the need for this requirement? Can you not just set the headroom value to the max needed value for any port and use that? Is there an issue with having blank space at the start of a buffer? Thanks, /Bruce ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-03-25 10:01 ` Bruce Richardson @ 2024-03-25 17:20 ` Stephen Hemminger 2024-03-25 22:56 ` Garrett D'Amore 0 siblings, 1 reply; 18+ messages in thread From: Stephen Hemminger @ 2024-03-25 17:20 UTC (permalink / raw) To: Bruce Richardson; +Cc: Garrett D'Amore, dev, Parthakumar Roy On Mon, 25 Mar 2024 10:01:52 +0000 Bruce Richardson <bruce.richardson@intel.com> wrote: > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > > So we right now (at WEKA) have a somewhat older version of DPDK that we > > have customized heavily, and I am going to to need to to make the > > headroom *dynamic* (passed in at run time, and per port.) > > We have this requirement because we need payload to be at a specific > > offset, but have to deal with different header lengths for IPv4 and now > > IPv6. > > My reason for pointing this out, is that I would dearly like if we > > could collaborate on this -- this change is going to touch pretty much > > every PMD (we don't need it on all of them as we only support a subset > > of PMDs, but its still a significant set.) > > I'm not sure if anyone else has considered such a need -- this > > particular message caught my eye as I'm looking specifically in this > > area right now. > > > Hi > > thanks for reaching out. Can you clarify a little more as to the need for > this requirement? Can you not just set the headroom value to the max needed > value for any port and use that? Is there an issue with having blank space > at the start of a buffer? > > Thanks, > /Bruce If you have to make such a deep change across all PMD's then maybe it is not the best solution. What about being able to do some form of buffer chaining or pullup. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-03-25 17:20 ` Stephen Hemminger @ 2024-03-25 22:56 ` Garrett D'Amore 2024-03-26 8:05 ` Morten Brørup 0 siblings, 1 reply; 18+ messages in thread From: Garrett D'Amore @ 2024-03-25 22:56 UTC (permalink / raw) To: Bruce Richardson, Stephen Hemminger; +Cc: dev, Parthakumar Roy [-- Attachment #1: Type: text/plain, Size: 3718 bytes --] So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet. This was not a problem as long as we only used IPv4. (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.) Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset. This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin. (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.) For transmit side we don't need this, as we can simply adjust the packet as needed. But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations. As far as header splitting, that would indeed be a much much nicer solution. I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do. At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K. That's not terrifically interesting for us. We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.) We also have a fair amount of ixgbe and i40e on client systems in the field. We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there. (We are working towards replacing that part of stack with UCX.) Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use. On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote: > On Mon, 25 Mar 2024 10:01:52 +0000 > Bruce Richardson <bruce.richardson@intel.com> wrote: > > > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > > > > So we right now (at WEKA) have a somewhat older version of DPDK that we > > > > have customized heavily, and I am going to to need to to make the > > > > headroom *dynamic* (passed in at run time, and per port.) > > > > We have this requirement because we need payload to be at a specific > > > > offset, but have to deal with different header lengths for IPv4 and now > > > > IPv6. > > > > My reason for pointing this out, is that I would dearly like if we > > > > could collaborate on this -- this change is going to touch pretty much > > > > every PMD (we don't need it on all of them as we only support a subset > > > > of PMDs, but its still a significant set.) > > > > I'm not sure if anyone else has considered such a need -- this > > > > particular message caught my eye as I'm looking specifically in this > > > > area right now. > > > > > > Hi > > > > thanks for reaching out. Can you clarify a little more as to the need for > > this requirement? Can you not just set the headroom value to the max needed > > value for any port and use that? Is there an issue with having blank space > > at the start of a buffer? > > > > Thanks, > > /Bruce > > If you have to make such a deep change across all PMD's then maybe > it is not the best solution. What about being able to do some form of buffer > chaining or pullup. [-- Attachment #2: Type: text/html, Size: 4553 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-03-25 22:56 ` Garrett D'Amore @ 2024-03-26 8:05 ` Morten Brørup 2024-03-26 14:19 ` Garrett D'Amore 0 siblings, 1 reply; 18+ messages in thread From: Morten Brørup @ 2024-03-26 8:05 UTC (permalink / raw) To: Garrett D'Amore, Bruce Richardson, Stephen Hemminger Cc: dev, Parthakumar Roy [-- Attachment #1: Type: text/plain, Size: 4641 bytes --] Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.) I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue. If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header. Med venlig hilsen / Kind regards, -Morten Brørup From: Garrett D'Amore [mailto:garrett@damore.org] Sent: Monday, 25 March 2024 23.56 So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet. This was not a problem as long as we only used IPv4. (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.) Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset. This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin. (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.) For transmit side we don't need this, as we can simply adjust the packet as needed. But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations. As far as header splitting, that would indeed be a much much nicer solution. I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do. At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K. That's not terrifically interesting for us. We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.) We also have a fair amount of ixgbe and i40e on client systems in the field. We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there. (We are working towards replacing that part of stack with UCX.) Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use. On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote: On Mon, 25 Mar 2024 10:01:52 +0000 Bruce Richardson <bruce.richardson@intel.com> wrote: On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > So we right now (at WEKA) have a somewhat older version of DPDK that we > have customized heavily, and I am going to to need to to make the > headroom *dynamic* (passed in at run time, and per port.) > We have this requirement because we need payload to be at a specific > offset, but have to deal with different header lengths for IPv4 and now > IPv6. > My reason for pointing this out, is that I would dearly like if we > could collaborate on this -- this change is going to touch pretty much > every PMD (we don't need it on all of them as we only support a subset > of PMDs, but its still a significant set.) > I'm not sure if anyone else has considered such a need -- this > particular message caught my eye as I'm looking specifically in this > area right now. > Hi thanks for reaching out. Can you clarify a little more as to the need for this requirement? Can you not just set the headroom value to the max needed value for any port and use that? Is there an issue with having blank space at the start of a buffer? Thanks, /Bruce If you have to make such a deep change across all PMD's then maybe it is not the best solution. What about being able to do some form of buffer chaining or pullup. [-- Attachment #2: Type: text/html, Size: 8422 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-03-26 8:05 ` Morten Brørup @ 2024-03-26 14:19 ` Garrett D'Amore 2024-03-26 15:06 ` Morten Brørup 2024-03-26 16:14 ` Konstantin Ananyev 0 siblings, 2 replies; 18+ messages in thread From: Garrett D'Amore @ 2024-03-26 14:19 UTC (permalink / raw) To: Bruce Richardson, Stephen Hemminger, Morten Brørup Cc: dev, Parthakumar Roy [-- Attachment #1: Type: text/plain, Size: 5162 bytes --] This could work. Not that we would like to have the exceptional case of IPv6 use less headroom. So we would say 40 is our compiled in default and then we reduce it by 20 on IPv6 which doesn’t have to support all the same devices that IPv4 does. This would give the lowest disruption to the existing IPv4 stack and allow PMDs to updated incrementally. On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <mb@smartsharesystems.com>, wrote: > Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.) > > I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue. > > If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header. > > > Med venlig hilsen / Kind regards, > -Morten Brørup > > From: Garrett D'Amore [mailto:garrett@damore.org] > Sent: Monday, 25 March 2024 23.56 > > So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet. > > This was not a problem as long as we only used IPv4. (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.) > > Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset. > > This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin. (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.) > > For transmit side we don't need this, as we can simply adjust the packet as needed. But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations. > > As far as header splitting, that would indeed be a much much nicer solution. > > I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do. > > At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K. That's not terrifically interesting for us. We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.) We also have a fair amount of ixgbe and i40e on client systems in the field. > > We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there. (We are working towards replacing that part of stack with UCX.) > > Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use. > On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote: > > On Mon, 25 Mar 2024 10:01:52 +0000 > Bruce Richardson <bruce.richardson@intel.com> wrote: > > > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > > > So we right now (at WEKA) have a somewhat older version of DPDK that we > > have customized heavily, and I am going to to need to to make the > > headroom *dynamic* (passed in at run time, and per port.) > > We have this requirement because we need payload to be at a specific > > offset, but have to deal with different header lengths for IPv4 and now > > IPv6. > > My reason for pointing this out, is that I would dearly like if we > > could collaborate on this -- this change is going to touch pretty much > > every PMD (we don't need it on all of them as we only support a subset > > of PMDs, but its still a significant set.) > > I'm not sure if anyone else has considered such a need -- this > > particular message caught my eye as I'm looking specifically in this > > area right now. > > > Hi > > thanks for reaching out. Can you clarify a little more as to the need for > this requirement? Can you not just set the headroom value to the max needed > value for any port and use that? Is there an issue with having blank space > at the start of a buffer? > > Thanks, > /Bruce > > If you have to make such a deep change across all PMD's then maybe > it is not the best solution. What about being able to do some form of buffer > chaining or pullup. [-- Attachment #2: Type: text/html, Size: 7829 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-03-26 14:19 ` Garrett D'Amore @ 2024-03-26 15:06 ` Morten Brørup 2024-03-26 17:43 ` Garrett D'Amore 2024-03-26 16:14 ` Konstantin Ananyev 1 sibling, 1 reply; 18+ messages in thread From: Morten Brørup @ 2024-03-26 15:06 UTC (permalink / raw) To: Garrett D'Amore Cc: dev, Parthakumar Roy, Bruce Richardson, Stephen Hemminger [-- Attachment #1: Type: text/plain, Size: 6283 bytes --] Something just struck me… The buffer address field in the RX descriptor of some NICs may have alignment requirements, i.e. the lowest bits in the buffer address field of the NIC’s RX descriptor may be used for other purposes (and assumed zero for buffer address purposes). 40 is divisible by 8, but offset 20 requires that the NIC hardware supports 4-byte aligned addresses (so only the 2 lowest bits may be used for other purposes). Here’s an example of what I mean: https://docs.amd.com/r/en-US/am011-versal-acap-trm/RX-Descriptor-Words <https://docs.amd.com/r/en-US/am011-versal-acap-trm/RX-Descriptor-Words> If any of your supported NICs have that restriction, i.e. requires an 8 byte aligned buffer address, your concept of having the UDP payload at the same fixed offset for both IPv4 and IPv6 is not going to be possible. (And you were lucky that the offset happens to be sufficiently aligned to work for IPv4 to begin with.) It seems you need to read a bunch of datasheets before proceeding. Med venlig hilsen / Kind regards, -Morten Brørup From: Garrett D'Amore [mailto:garrett@damore.org] Sent: Tuesday, 26 March 2024 15.19 This could work. Not that we would like to have the exceptional case of IPv6 use less headroom. So we would say 40 is our compiled in default and then we reduce it by 20 on IPv6 which doesn’t have to support all the same devices that IPv4 does. This would give the lowest disruption to the existing IPv4 stack and allow PMDs to updated incrementally. On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <mb@smartsharesystems.com>, wrote: Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.) I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue. If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header. Med venlig hilsen / Kind regards, -Morten Brørup From: Garrett D'Amore [mailto:garrett@damore.org] Sent: Monday, 25 March 2024 23.56 So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet. This was not a problem as long as we only used IPv4. (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.) Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset. This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin. (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.) For transmit side we don't need this, as we can simply adjust the packet as needed. But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations. As far as header splitting, that would indeed be a much much nicer solution. I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do. At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K. That's not terrifically interesting for us. We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.) We also have a fair amount of ixgbe and i40e on client systems in the field. We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there. (We are working towards replacing that part of stack with UCX.) Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use. On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote: On Mon, 25 Mar 2024 10:01:52 +0000 Bruce Richardson <bruce.richardson@intel.com> wrote: On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > So we right now (at WEKA) have a somewhat older version of DPDK that we > have customized heavily, and I am going to to need to to make the > headroom *dynamic* (passed in at run time, and per port.) > We have this requirement because we need payload to be at a specific > offset, but have to deal with different header lengths for IPv4 and now > IPv6. > My reason for pointing this out, is that I would dearly like if we > could collaborate on this -- this change is going to touch pretty much > every PMD (we don't need it on all of them as we only support a subset > of PMDs, but its still a significant set.) > I'm not sure if anyone else has considered such a need -- this > particular message caught my eye as I'm looking specifically in this > area right now. > Hi thanks for reaching out. Can you clarify a little more as to the need for this requirement? Can you not just set the headroom value to the max needed value for any port and use that? Is there an issue with having blank space at the start of a buffer? Thanks, /Bruce If you have to make such a deep change across all PMD's then maybe it is not the best solution. What about being able to do some form of buffer chaining or pullup. [-- Attachment #2: Type: text/html, Size: 13408 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-03-26 15:06 ` Morten Brørup @ 2024-03-26 17:43 ` Garrett D'Amore 2024-03-26 20:35 ` Stephen Hemminger 0 siblings, 1 reply; 18+ messages in thread From: Garrett D'Amore @ 2024-03-26 17:43 UTC (permalink / raw) To: Morten Brørup Cc: dev, Parthakumar Roy, Bruce Richardson, Stephen Hemminger [-- Attachment #1: Type: text/plain, Size: 6683 bytes --] This had occurred to me as well. I think most hardware DMA engines can align on 32-bit boundaries. I've yet to see a device that actually requires 64-bit DMA alignment. (But I have only looked at a subset of devices, and most of the ones I have looked at are not ones that would be considered 'modern'.) On Mar 26, 2024 at 8:06 AM -0700, Morten Brørup <mb@smartsharesystems.com>, wrote: > Something just struck me… > The buffer address field in the RX descriptor of some NICs may have alignment requirements, i.e. the lowest bits in the buffer address field of the NIC’s RX descriptor may be used for other purposes (and assumed zero for buffer address purposes). 40 is divisible by 8, but offset 20 requires that the NIC hardware supports 4-byte aligned addresses (so only the 2 lowest bits may be used for other purposes). > > Here’s an example of what I mean: > https://docs.amd.com/r/en-US/am011-versal-acap-trm/RX-Descriptor-Words > > If any of your supported NICs have that restriction, i.e. requires an 8 byte aligned buffer address, your concept of having the UDP payload at the same fixed offset for both IPv4 and IPv6 is not going to be possible. (And you were lucky that the offset happens to be sufficiently aligned to work for IPv4 to begin with.) > > It seems you need to read a bunch of datasheets before proceeding. > > > Med venlig hilsen / Kind regards, > -Morten Brørup > > From: Garrett D'Amore [mailto:garrett@damore.org] > Sent: Tuesday, 26 March 2024 15.19 > > This could work. Not that we would like to have the exceptional case of IPv6 use less headroom. So we would say 40 is our compiled in default and then we reduce it by 20 on IPv6 which doesn’t have to support all the same devices that IPv4 does. This would give the lowest disruption to the existing IPv4 stack and allow PMDs to updated incrementally. > On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <mb@smartsharesystems.com>, wrote: > > Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.) > > I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue. > > If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header. > > > Med venlig hilsen / Kind regards, > -Morten Brørup > > From: Garrett D'Amore [mailto:garrett@damore.org] > Sent: Monday, 25 March 2024 23.56 > So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet. > > This was not a problem as long as we only used IPv4. (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.) > > Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset. > > This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin. (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.) > > For transmit side we don't need this, as we can simply adjust the packet as needed. But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations. > > As far as header splitting, that would indeed be a much much nicer solution. > > I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do. > > At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K. That's not terrifically interesting for us. We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.) We also have a fair amount of ixgbe and i40e on client systems in the field. > > We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there. (We are working towards replacing that part of stack with UCX.) > > Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use. > On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote: > On Mon, 25 Mar 2024 10:01:52 +0000 > Bruce Richardson <bruce.richardson@intel.com> wrote: > > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > > So we right now (at WEKA) have a somewhat older version of DPDK that we > > have customized heavily, and I am going to to need to to make the > > headroom *dynamic* (passed in at run time, and per port.) > > We have this requirement because we need payload to be at a specific > > offset, but have to deal with different header lengths for IPv4 and now > > IPv6. > > My reason for pointing this out, is that I would dearly like if we > > could collaborate on this -- this change is going to touch pretty much > > every PMD (we don't need it on all of them as we only support a subset > > of PMDs, but its still a significant set.) > > I'm not sure if anyone else has considered such a need -- this > > particular message caught my eye as I'm looking specifically in this > > area right now. > > > Hi > > thanks for reaching out. Can you clarify a little more as to the need for > this requirement? Can you not just set the headroom value to the max needed > value for any port and use that? Is there an issue with having blank space > at the start of a buffer? > > Thanks, > /Bruce > > If you have to make such a deep change across all PMD's then maybe > it is not the best solution. What about being able to do some form of buffer > chaining or pullup. [-- Attachment #2: Type: text/html, Size: 12594 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-03-26 17:43 ` Garrett D'Amore @ 2024-03-26 20:35 ` Stephen Hemminger 2024-03-26 21:10 ` Garrett D'Amore 0 siblings, 1 reply; 18+ messages in thread From: Stephen Hemminger @ 2024-03-26 20:35 UTC (permalink / raw) To: Garrett D'Amore Cc: Morten Brørup, dev, Parthakumar Roy, Bruce Richardson On Tue, 26 Mar 2024 10:43:30 -0700 Garrett D'Amore <garrett@damore.org> wrote: > This had occurred to me as well. I think most hardware DMA engines can align on 32-bit boundaries. I've yet to see a device that actually requires 64-bit DMA alignment. (But I have only looked at a subset of devices, and most of the ones I have looked at are not ones that would be considered 'modern'.) There maybe a PCI transaction performance penalty if not aligned. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-03-26 20:35 ` Stephen Hemminger @ 2024-03-26 21:10 ` Garrett D'Amore 0 siblings, 0 replies; 18+ messages in thread From: Garrett D'Amore @ 2024-03-26 21:10 UTC (permalink / raw) To: Stephen Hemminger Cc: Morten Brørup, dev, Parthakumar Roy, Bruce Richardson [-- Attachment #1: Type: text/plain, Size: 875 bytes --] Fair enough... I think that is just something we're going to have to live with. The other solutions are either much more painful, or much more work. If we can use header/buffer splitting that would be superior. Right now we can't use that everywhere because it isn't available everywhere. On Mar 26, 2024 at 1:35 PM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote: > On Tue, 26 Mar 2024 10:43:30 -0700 > Garrett D'Amore <garrett@damore.org> wrote: > > > This had occurred to me as well. I think most hardware DMA engines can align on 32-bit boundaries. I've yet to see a device that actually requires 64-bit DMA alignment. (But I have only looked at a subset of devices, and most of the ones I have looked at are not ones that would be considered 'modern'.) > > There maybe a PCI transaction performance penalty if not aligned. [-- Attachment #2: Type: text/html, Size: 1397 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-03-26 14:19 ` Garrett D'Amore 2024-03-26 15:06 ` Morten Brørup @ 2024-03-26 16:14 ` Konstantin Ananyev 2024-03-26 17:44 ` Garrett D'Amore 1 sibling, 1 reply; 18+ messages in thread From: Konstantin Ananyev @ 2024-03-26 16:14 UTC (permalink / raw) To: Garrett D'Amore, Bruce Richardson, Stephen Hemminger, Morten Brørup Cc: dev, Parthakumar Roy [-- Attachment #1: Type: text/plain, Size: 5827 bytes --] Just wonder what would happen if you’ll receive an ipv6 packet with options or some fancy encapsulation IP-IP or so? BTW, there is an RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload: https://doc.dpdk.org/api/structrte__eth__rxseg__split.html Which might be close to what you are looking for, but right now it is supported by mlx5 PMD only. From: Garrett D'Amore <garrett@damore.org> Sent: Tuesday, March 26, 2024 2:19 PM To: Bruce Richardson <bruce.richardson@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; Morten Brørup <mb@smartsharesystems.com> Cc: dev@dpdk.org; Parthakumar Roy <Parthakumar.Roy@ibm.com> Subject: RE: meson option to customize RTE_PKTMBUF_HEADROOM patch This could work. Not that we would like to have the exceptional case of IPv6 use less headroom. So we would say 40 is our compiled in default and then we reduce it by 20 on IPv6 which doesn’t have to support all the same devices that IPv4 does. This would give the lowest disruption to the existing IPv4 stack and allow PMDs to updated incrementally. On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <mb@smartsharesystems.com<mailto:mb@smartsharesystems.com>>, wrote: Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.) I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue. If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header. Med venlig hilsen / Kind regards, -Morten Brørup From: Garrett D'Amore [mailto:garrett@damore.org] Sent: Monday, 25 March 2024 23.56 So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet. This was not a problem as long as we only used IPv4. (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.) Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset. This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin. (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.) For transmit side we don't need this, as we can simply adjust the packet as needed. But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations. As far as header splitting, that would indeed be a much much nicer solution. I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do. At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K. That's not terrifically interesting for us. We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.) We also have a fair amount of ixgbe and i40e on client systems in the field. We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there. (We are working towards replacing that part of stack with UCX.) Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use. On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org<mailto:stephen@networkplumber.org>>, wrote: On Mon, 25 Mar 2024 10:01:52 +0000 Bruce Richardson <bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>> wrote: On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > So we right now (at WEKA) have a somewhat older version of DPDK that we > have customized heavily, and I am going to to need to to make the > headroom *dynamic* (passed in at run time, and per port.) > We have this requirement because we need payload to be at a specific > offset, but have to deal with different header lengths for IPv4 and now > IPv6. > My reason for pointing this out, is that I would dearly like if we > could collaborate on this -- this change is going to touch pretty much > every PMD (we don't need it on all of them as we only support a subset > of PMDs, but its still a significant set.) > I'm not sure if anyone else has considered such a need -- this > particular message caught my eye as I'm looking specifically in this > area right now. > Hi thanks for reaching out. Can you clarify a little more as to the need for this requirement? Can you not just set the headroom value to the max needed value for any port and use that? Is there an issue with having blank space at the start of a buffer? Thanks, /Bruce If you have to make such a deep change across all PMD's then maybe it is not the best solution. What about being able to do some form of buffer chaining or pullup. [-- Attachment #2: Type: text/html, Size: 12357 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: meson option to customize RTE_PKTMBUF_HEADROOM patch 2024-03-26 16:14 ` Konstantin Ananyev @ 2024-03-26 17:44 ` Garrett D'Amore 0 siblings, 0 replies; 18+ messages in thread From: Garrett D'Amore @ 2024-03-26 17:44 UTC (permalink / raw) To: Bruce Richardson, Stephen Hemminger, Morten Brørup, Konstantin Ananyev Cc: dev, Parthakumar Roy [-- Attachment #1: Type: text/plain, Size: 6171 bytes --] This ETH_RX_OFFLOAD_BUFFER_SPLIT sounds promising indeed. On Mar 26, 2024 at 9:14 AM -0700, Konstantin Ananyev <konstantin.ananyev@huawei.com>, wrote: > Just wonder what would happen if you’ll receive an ipv6 packet with options or some fancy encapsulation IP-IP or so? > BTW, there is an RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload: > https://doc.dpdk.org/api/structrte__eth__rxseg__split.html > Which might be close to what you are looking for, but right now it is supported by mlx5 PMD only. > > From: Garrett D'Amore <garrett@damore.org> > Sent: Tuesday, March 26, 2024 2:19 PM > To: Bruce Richardson <bruce.richardson@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; Morten Brørup <mb@smartsharesystems.com> > Cc: dev@dpdk.org; Parthakumar Roy <Parthakumar.Roy@ibm.com> > Subject: RE: meson option to customize RTE_PKTMBUF_HEADROOM patch > > This could work. Not that we would like to have the exceptional case of IPv6 use less headroom. So we would say 40 is our compiled in default and then we reduce it by 20 on IPv6 which doesn’t have to support all the same devices that IPv4 does. This would give the lowest disruption to the existing IPv4 stack and allow PMDs to updated incrementally. > On Mar 26, 2024 at 1:05 AM -0700, Morten Brørup <mb@smartsharesystems.com>, wrote: > > > quote_type > > Interesting requirement. I can easily imagine how a (non-forwarding, i.e. traffic terminating) application, which doesn’t really care about the preceding headers, can benefit from having its actual data at a specific offset for alignment purposes. I don’t consider this very exotic. (Even the Linux kernel uses this trick to achieve improved IP header alignment on RX.) > > > > I think the proper solution would be to add a new offload parameter to rte_eth_rxconf to specify how many bytes the driver should subtract from RTE_PKTMBUF_HEADROOM when writing the RX descriptor to the NIC hardware. Depending on driver support, this would make it configurable per device and per RX queue. > > > > If this parameter is set, the driver should adjust m->data_off accordingly on RX, so rte_pktmbuf_mtod[_offset]() and rte_pktmbuf_iova[_offset]() still point to the Ethernet header. > > > > > > Med venlig hilsen / Kind regards, > > -Morten Brørup > > > > From: Garrett D'Amore [mailto:garrett@damore.org] > > Sent: Monday, 25 March 2024 23.56 > > So we need (for reasons that I don't want to get to into in too much detail) that our UDP payload headers are at a specific offset in the packet. > > > > This was not a problem as long as we only used IPv4. (We have configured 40 bytes of headroom, which is more than any of our PMDs need by a hefty margin.) > > > > Now that we're extending to support IPv6, we need to reduce that headroom by 20 bytes, to preserve our UDP payload offset. > > > > This has big ramifications for how we fragment our own upper layer messages, and it has been determined that updating the PMDs to allow us to change the headroom for this use case (on a per port basis, as we will have some ports on IPv4 and others on IPv6) is the least effort, but a large margin. (Well, copying the frames via memcpy would be less development effort, but would be a performance catastrophe.) > > > > For transmit side we don't need this, as we can simply adjust the packet as needed. But for the receive side, we are kind of stuck, as the PMDs rely on the hard coded RTE_PKTMBUF_HEADROOM to program receive locations. > > > > As far as header splitting, that would indeed be a much much nicer solution. > > > > I haven't looked in the latest code to see if header splitting is even an option -- the version of the DPDK I'm working with is a little older (20.11) -- we have to update but we have other local changes and so updating is one of the things that we still have to do. > > > > At any rate, the version I did look at doesn't seem to support header splits on any device other than FM10K. That's not terrifically interesting for us. We use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, virtio-net, etc.) We also have a fair amount of ixgbe and i40e on client systems in the field. > > > > We also, unfortunately, have an older DPDK 18 with Mellanox contributions for IPoverIB.... though I'm not sure we will try to support IPv6 there. (We are working towards replacing that part of stack with UCX.) > > > > Unless header splitting will work on all of this (excepting the IPoIB piece), then it's not something we can really use. > > On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger <stephen@networkplumber.org>, wrote: > > On Mon, 25 Mar 2024 10:01:52 +0000 > > Bruce Richardson <bruce.richardson@intel.com> wrote: > > > > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote: > > > So we right now (at WEKA) have a somewhat older version of DPDK that we > > > have customized heavily, and I am going to to need to to make the > > > headroom *dynamic* (passed in at run time, and per port.) > > > We have this requirement because we need payload to be at a specific > > > offset, but have to deal with different header lengths for IPv4 and now > > > IPv6. > > > My reason for pointing this out, is that I would dearly like if we > > > could collaborate on this -- this change is going to touch pretty much > > > every PMD (we don't need it on all of them as we only support a subset > > > of PMDs, but its still a significant set.) > > > I'm not sure if anyone else has considered such a need -- this > > > particular message caught my eye as I'm looking specifically in this > > > area right now. > > > > > Hi > > > > thanks for reaching out. Can you clarify a little more as to the need for > > this requirement? Can you not just set the headroom value to the max needed > > value for any port and use that? Is there an issue with having blank space > > at the start of a buffer? > > > > Thanks, > > /Bruce > > > > If you have to make such a deep change across all PMD's then maybe > > it is not the best solution. What about being able to do some form of buffer > > chaining or pullup. [-- Attachment #2: Type: text/html, Size: 10658 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-03-26 21:10 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-02-15 19:02 meson option to customize RTE_PKTMBUF_HEADROOM patch Parthakumar Roy 2024-02-16 7:52 ` David Marchand 2024-02-20 14:57 ` [PATCH v2] build: make buffer headroom configurable Parthakumar Roy 2024-02-27 16:02 ` Bruce Richardson 2024-02-27 16:10 ` Morten Brørup 2024-03-06 16:45 ` Thomas Monjalon 2024-03-23 20:51 ` meson option to customize RTE_PKTMBUF_HEADROOM patch Garrett D'Amore 2024-03-25 10:01 ` Bruce Richardson 2024-03-25 17:20 ` Stephen Hemminger 2024-03-25 22:56 ` Garrett D'Amore 2024-03-26 8:05 ` Morten Brørup 2024-03-26 14:19 ` Garrett D'Amore 2024-03-26 15:06 ` Morten Brørup 2024-03-26 17:43 ` Garrett D'Amore 2024-03-26 20:35 ` Stephen Hemminger 2024-03-26 21:10 ` Garrett D'Amore 2024-03-26 16:14 ` Konstantin Ananyev 2024-03-26 17:44 ` Garrett D'Amore
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).