From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
"olivier.matz@6wind.com" <olivier.matz@6wind.com>,
"Honnappa.Nagarahalli@arm.com" <Honnappa.Nagarahalli@arm.com>,
"nd@arm.com" <nd@arm.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] ring: empty and count optimizations
Date: Thu, 14 May 2020 16:46:40 +0000 [thread overview]
Message-ID: <BYAPR11MB330101699614B8EA0EF36D8E9ABC0@BYAPR11MB3301.namprd11.prod.outlook.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35C60FC5@smartserver.smartshare.dk>
>
> > > -static inline unsigned
> > > +static inline unsigned int
> > > rte_ring_count(const struct rte_ring *r)
> > > {
> > > uint32_t prod_tail = r->prod.tail;
> > > uint32_t cons_tail = r->cons.tail;
> > > uint32_t count = (prod_tail - cons_tail) & r->mask;
> > > - return (count > r->capacity) ? r->capacity : count;
> > > + return likely(count <= r->capacity) ? count : r->capacity;
> >
> > Honestly, I don't see there is any point of that change:
> > I think it wouldn't change anything in terms of functionality
> > or performance.
>
> Chapter 3.4.1 "Branch Prediction Optimization" in the Intel 64 and IA-32 Architectures Optimization Reference Manual recommends this
> kind of optimization as Assembly/Compiler Coding Rule 3, which is why I rearranged the trigraph. Essentially, there is a limit to the number
> of BTB (Branch Target Buffer) entries, so they should be conserved if possible.
>
> In addition to that, I have added the likely() because I consider it nearly impossible that the count will exceed the capacity.
>
> However, it's not the first time I see this kind of response to a suggested branch optimization on the DPDK mailing list. Everyone seem to
> think that branch prediction is infinite and always works. It may seem as if infinite on trivial applications, but BTB entries may be a scarce
> resource on complex applications. I assume Intel's recommendations are not just for the fun of it.
I think it is better to leave such level of micro-optimizations to the compiler.
BTW, in that particular case, compiler most likely will generate a code
without any branches at all (at least for IA).
Let say on my box with gcc 7.3:
$ cat trc1.c
#include <stdint.h>
#include <rte_config.h>
#include <rte_ring.h>
uint32_t
fffx1(const struct rte_ring *r)
{
uint32_t prod_tail = r->prod.tail;
uint32_t cons_tail = r->cons.tail;
uint32_t count = (prod_tail - cons_tail) & r->mask;
return (count > r->capacity) ? r->capacity : count;
}
uint32_t
fffx2(const struct rte_ring *r)
{
uint32_t prod_tail = r->prod.tail;
uint32_t cons_tail = r->cons.tail;
uint32_t count = (prod_tail - cons_tail) & r->mask;
return likely(count <= r->capacity) ? count : r->capacity;
}
$ gcc -m64 -O3 -march=native -I${RTE_SDK}/x86_64-native-linuxapp-gcc/include -c trc1.c
$ objdump -d trc1.o
0000000000000000 <fffx1>:
0: 8b 87 84 00 00 00 mov 0x84(%rdi),%eax
6: 8b 97 04 01 00 00 mov 0x104(%rdi),%edx
c: 29 d0 sub %edx,%eax
e: 8b 57 38 mov 0x38(%rdi),%edx
11: 23 47 34 and 0x34(%rdi),%eax
14: 39 d0 cmp %edx,%eax
16: 0f 47 c2 cmova %edx,%eax
19: c3 retq
1a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000000020 <fffx2>:
20: 8b 87 84 00 00 00 mov 0x84(%rdi),%eax
26: 8b 97 04 01 00 00 mov 0x104(%rdi),%edx
2c: 29 d0 sub %edx,%eax
2e: 8b 57 38 mov 0x38(%rdi),%edx
31: 23 47 34 and 0x34(%rdi),%eax
34: 39 d0 cmp %edx,%eax
36: 0f 47 c2 cmova %edx,%eax
39: c3 retq
As you can see, there is no difference.
>
> Konstantin, please note that I'm letting out my frustration about the general misconception about branch prediction here. You are doing a
> great job, so I feel bad about responding like this to you.
No worries, in fact I am glad to know that DPDK contributors
read IA optimization manual that thoughtfully 😊
Konstantin
next prev parent reply other threads:[~2020-05-14 16:46 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-13 15:31 Morten Brørup
2020-05-13 15:35 ` Morten Brørup
2020-05-13 17:08 ` Morten Brørup
2020-05-14 12:23 ` Ananyev, Konstantin
2020-05-14 13:45 ` Morten Brørup
2020-05-14 16:46 ` Ananyev, Konstantin [this message]
2020-05-14 18:00 ` Morten Brørup
2020-05-19 15:27 ` [dpdk-dev] [PATCH 0/2] ring: empty optimization Morten Brørup
2020-05-19 15:27 ` [dpdk-dev] [PATCH 1/2] ring: coding style cleanup Morten Brørup
2020-05-22 12:34 ` Ananyev, Konstantin
2020-05-19 15:27 ` [dpdk-dev] [PATCH 2/2] ring: empty optimization Morten Brørup
2020-05-19 15:52 ` Stephen Hemminger
2020-05-19 16:02 ` Morten Brørup
2020-07-01 9:19 ` David Marchand
2020-05-22 12:32 ` Ananyev, Konstantin
2020-07-01 9:20 ` [dpdk-dev] [PATCH 0/2] " David Marchand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BYAPR11MB330101699614B8EA0EF36D8E9ABC0@BYAPR11MB3301.namprd11.prod.outlook.com \
--to=konstantin.ananyev@intel.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=dev@dpdk.org \
--cc=mb@smartsharesystems.com \
--cc=nd@arm.com \
--cc=olivier.matz@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).