RE: Question about loop unrolling in rte_ring datastructure.

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Aditya Ambadipudi" <Aditya.Ambadipudi@arm.com>
Cc: <dev@dpdk.org>
Subject: RE: Question about loop unrolling in rte_ring datastructure.
Date: Tue, 14 Nov 2023 09:26:59 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F01E@smartserver.smartshare.dk> (raw)
In-Reply-To: <PAVPR08MB9185BCDC0DDF8CB6FB4AAD88EFB3A@PAVPR08MB9185.eurprd08.prod.outlook.com>

> From: Aditya Ambadipudi [mailto:Aditya.Ambadipudi@arm.com] 
> Sent: Monday, 13 November 2023 19.15
> 
> Hello all.
> 
> My name is Aditya Ambadipudi. I am not the sharpest tool in the shed.
> 
> I was reading through the rte_ring datastructure. And I have two questions about the optimizations that are being made there.
> 1. Loop unrolling:
> https://github.com/DPDK/dpdk/blob/main/lib/ring/rte_ring_elem_pvt.h#L28-L35
> Why are we unrolling these loops manually. GCC will generate SIMD instructions for these loops automatically. Irrespective of wheither or not we unroll the loops
> Unrolled loop: https://godbolt.org/z/n97noqYn7
> Regular loop:https://godbolt.org/z/h6G9o9773

You should make "int count" a function parameter, to make it unknown at compile time.
In your examples, the compiler knows that count is 100, and can optimize for that.

> 
> This is true of both x86 and ARM.

Much of the code in DPDK is quite old, dating back to a time when compilers were not good at optimizing, so the developers optimized by hand.
Some of the optimizations, such as manual loop unrolling, may not be relevant anymore, where modern compilers might do a better job.

I don't know if this is the case here.

Experiments and suggestions for improvements are welcome!

After building DPDK, you can test the ring library performance with this command:

./app/test/dpdk-test --no-huge ring_perf_autotest

> 
> 2. Normalizing to few fixed types:
> 
> It looks like we separate out enqueue/dequeue operations into 3 functions. One for each element size 32, 64, 128.
> 
> Again I am not clear on why we are doing this. Both 128 & 64 are multiples of 32. Why can't we just normalize everything to 32?
> 
> I feel like this is in some shape or form related to loop unrolling. But I am not able to figure it out on my own.

They are optimization for some common use cases.

Please note that the "esize" parameter is known at compile time, so the compiler will e.g. for __rte_ring_dequeue_elems() choose __rte_ring_dequeue_elems_64(), __rte_ring_dequeue_elems_128() or the 32-bit loop at build time, and omit the alternatives.

If nothing else, they tell the compiler that "count" (number of 32 bit elements) is divisble by 4 (for 128 bit element size) or 2 (for 64 bit element size):

Some CPU architectures have 64 and/or 128 bit registers, so a copy loop using those registers does not need to be followed by a trailing copy of any remaining 32-bit memory when the element size is known to be 64 or 128 bit.

> 
> I am working on a patch that is closely related to this. And I would greatly appreciate any assistance anyone can provide on this. 
> 
> Thank you,
> Aditya Ambadipudi

> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. 

It does not remain confidential when you post it on a public mailing list.
Please omit this footer when posting to the DPDK mailing lists.

Med venlig hilsen / Kind regards,
-Morten Brørup

     prev parent reply	other threads:[~2023-11-14  8:30 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-13 18:14 Aditya Ambadipudi
2023-11-14  8:26 ` Morten Brørup [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9F01E@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=Aditya.Ambadipudi@arm.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).