DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Scott Mitchell" <scott.k.mitch1@gmail.com>
Cc: <dev@dpdk.org>, <stephen@networkplumber.org>
Subject: RE: [PATCH v11] net: optimize raw checksum computation
Date: Fri, 9 Jan 2026 23:12:56 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35F65642@smartserver.smartshare.dk> (raw)
In-Reply-To: <CAFn2buDULHpj-5m1TXEcp0xUfMpTA9dKqEuEC98UhDp4qW21og@mail.gmail.com>

> From: Morten Brørup
> Sent: Friday, 9 January 2026 16.58
> 
> > From: Scott Mitchell [mailto:scott.k.mitch1@gmail.com]
> > Sent: Friday, 9 January 2026 16.27
> >
> > On Fri, Jan 9, 2026 at 4:26 AM Morten Brørup
> <mb@smartsharesystems.com>
> > wrote:
> > >
> > > > Changes in v8:
> > > > - __rte_raw_cksum: use native pointer arithmetic instead of
> > RTE_PTR_ADD
> > > >   to avoid incorrect results with -O3 for UDP checksums. Also
> > improves
> > > >   performance due to less assembly generated with Clang.
> > >
> > > Personally, I also have observed GCC's optimizer behave as if it
> > loses some contextual information when using RTE_PTR_ADD, and thus
> > emitting less optimal code.
> > > I didn't look further into it, and thus have no data or examples to
> > back up the claim. Which is why I haven't started a discussion about
> > discouraging the use of RTE_PTR_ADD.
> > > In other words: I support this change.
> >
> > Sounds good! I observed ~600 (dpdk ptr macros) vs ~500 (native c ptr
> > operations) TSC cycles/block in cksum_perf_autotest.
> 
> That is a significant performance degradation caused by the
> RTE_PTR_ADD() macros. We really should look into that - some day. ;-)
> Our application code base has RTE_CONST_PTR_ADD/SUB() for type
> consistency reasons (not for performance reasons). But I haven't gotten
> around to submitting them to the DPDK project yet.
> I wonder if the implicit stripping of "const" when using the
> RTE_PTR_ADD() macros makes the difference, or if the difference stems
> from other optimizer context getting lost.

I just tested building our application with modified macros using GCC.
With the changes below, GCC emitted slightly more efficient code (fewer instructions).
It shows that using uintptr_t instead of normal pointer arithmetic confuses the compiler.

/**
  * add a byte-value offset to a const pointer
  */
-#define RTE_CONST_PTR_ADD(ptr, x) ((const void*)((uintptr_t)(ptr) + (x)))
+#define RTE_CONST_PTR_ADD(ptr, x) ((const void*)((const char*)(ptr) + (x)))

 /**
  * subtract a byte-value offset from a const pointer
  */
-#define RTE_CONST_PTR_SUB(ptr, x) ((const void*)((uintptr_t)(ptr) - (x)))
+#define RTE_CONST_PTR_SUB(ptr, x) ((const void*)((const char*)(ptr) - (x)))


Snippet of code emitted using const char*:

    if (likely((eth->ether_type == RTE_BE16(RTE_ETHER_TYPE_VLAN)) |
  4dc47e:	41 0f b7 45 0c       	movzwl 0xc(%r13),%eax
  4dc483:	89 c7                	mov    %eax,%edi
  4dc485:	83 e7 ef             	and    $0xffffffef,%edi
  4dc488:	66 81 ff 81 00       	cmp    $0x81,%di
  4dc48d:	74 0a                	je     4dc499 <service_ingress_dedicated_management+0x1b9>
  4dc48f:	66 3d 88 a8          	cmp    $0xa888,%ax
  4dc493:	0f 85 3f 01 00 00    	jne    4dc5d8 <service_ingress_dedicated_management+0x2f8>
        if (vhdr->eth_proto == RTE_BE16(RTE_ETHER_TYPE_VLAN)) {
  4dc499:	66 41 81 7d 10 81 00 	cmpw   $0x81,0x10(%r13)
  4dc4a0:	0f 84 1a 01 00 00    	je     4dc5c0 <service_ingress_dedicated_management+0x2e0>
            eth = RTE_CONST_PTR_ADD(eth, sizeof(struct rte_vlan_hdr));
  4dc4a6:	49 8d 7d 04          	lea    0x4(%r13),%rdi
            packet_type = (union packet_type){
  4dc4aa:	b8 01 00 00 00       	mov    $0x1,%eax
  4dc4af:	41 b8 12 00 00 00    	mov    $0x12,%r8d
    if (likely((*(const rte_be32_t *)RTE_CONST_PTR_ADD(eth, offsetof(struct rte_ether_hdr, ether_type)) & RTE_BE32(0xFFFFFF00)) ==
  4dc4b5:	44 8b 4f 0c          	mov    0xc(%rdi),%r9d

Snippet of code emitted using uintptr_t:

    if (likely((eth->ether_type == RTE_BE16(RTE_ETHER_TYPE_VLAN)) |
  4dc47e:	41 0f b7 45 0c       	movzwl 0xc(%r13),%eax
+            eth = RTE_CONST_PTR_ADD(eth, 2 * sizeof(struct rte_vlan_hdr));
+  4dc483:	4c 89 ef             	mov    %r13,%rdi
+    if (likely((eth->ether_type == RTE_BE16(RTE_ETHER_TYPE_VLAN)) |
  4dc486:	41 89 c0             	mov    %eax,%r8d
  4dc489:	41 83 e0 ef          	and    $0xffffffef,%r8d
  4dc48d:	66 41 81 f8 81 00    	cmp    $0x81,%r8w
  4dc493:	74 0a                	je     4dc49f <service_ingress_dedicated_management+0x1bf>
  4dc495:	66 3d 88 a8          	cmp    $0xa888,%ax
  4dc499:	0f 85 51 01 00 00    	jne    4dc5f0 <service_ingress_dedicated_management+0x310>
        if (vhdr->eth_proto == RTE_BE16(RTE_ETHER_TYPE_VLAN)) {
  4dc49f:	66 41 81 7d 10 81 00 	cmpw   $0x81,0x10(%r13)
  4dc4a6:	0f 84 24 01 00 00    	je     4dc5d0 <service_ingress_dedicated_management+0x2f0>
            eth = RTE_CONST_PTR_ADD(eth, sizeof(struct rte_vlan_hdr));
  4dc4ac:	49 8d 7d 04          	lea    0x4(%r13),%rdi
            packet_type = (union packet_type){
  4dc4b0:	b8 01 00 00 00       	mov    $0x1,%eax
  4dc4b5:	41 b8 12 00 00 00    	mov    $0x12,%r8d
+            eth = RTE_CONST_PTR_ADD(eth, sizeof(struct rte_vlan_hdr));
+  4dc4bb:	49 89 f9             	mov    %rdi,%r9
    if (likely((*(const rte_be32_t *)RTE_CONST_PTR_ADD(eth, offsetof(struct rte_ether_hdr, ether_type)) & RTE_BE32(0xFFFFFF00)) ==
  4dc4be:	8b 7f 0c             	mov    0xc(%rdi),%edi


  parent reply	other threads:[~2026-01-09 22:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-08 23:05 scott.k.mitch1
2026-01-09  0:44 ` Scott Mitchell
2026-01-09  9:26 ` Morten Brørup
2026-01-09 15:27   ` Scott Mitchell
2026-01-09 15:58     ` Morten Brørup
2026-01-09 17:23       ` Scott Mitchell
2026-01-09 22:12     ` Morten Brørup [this message]
2026-01-10  4:19       ` Scott Mitchell
2026-01-09 18:28 ` Morten Brørup
2026-01-10  3:41   ` Scott Mitchell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35F65642@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=dev@dpdk.org \
    --cc=scott.k.mitch1@gmail.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).