* Re: [dpdk-dev] [PATCH 1/1] net: fix aliasing issue in checksum computation
2021-10-14 17:20 ` Ferruh Yigit
@ 2021-10-14 20:22 ` Morten Brørup
2021-10-16 8:21 ` Morten Brørup
2021-10-16 8:24 ` Morten Brørup
2 siblings, 0 replies; 9+ messages in thread
From: Morten Brørup @ 2021-10-14 20:22 UTC (permalink / raw)
To: Ferruh Yigit, Georg Sauthoff, Olivier Matz, Thomas Monjalon,
David Marchand, David Marchand
Cc: dev
> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Thursday, 14 October 2021 19.20
>
> On 9/18/2021 12:49 PM, Georg Sauthoff wrote:
> > That means a superfluous cast is removed and aliasing through a
> uint8_t
> > pointer is eliminated. Note that uint8_t doesn't have the same
> > strict-aliasing properties as unsigned char.
> >
> > Also simplified the loop since a modern C compiler can speed up (i.e.
> > auto-vectorize) it in a similar way. For example, GCC auto-vectorizes
> it
> > for Haswell using AVX registers while halving the number of
> instructions
> > in the generated code.
> >
> > Signed-off-by: Georg Sauthoff <mail@gms.tf>
>
> + Morten. (Because of past reviews on cksum code)
Thanks, Ferruh.
I have not verified the claimed benefits of the patch, but I have reviewed the code thoroughly, and it looks perfectly good to me.
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
BTW: It makes me wonder if other parts of DPDK could benefit from the same treatment. Especially some of the older DPDK code, where we were trying to optimize by hand what a modern compiler can optimize for us today.
>
> > ---
> > lib/net/rte_ip.h | 27 ++++++++-------------------
> > 1 file changed, 8 insertions(+), 19 deletions(-)
> >
> > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> > index 05948b69b7..386db94c85 100644
> > --- a/lib/net/rte_ip.h
> > +++ b/lib/net/rte_ip.h
> > @@ -141,29 +141,18 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
> *ipv4_hdr)
> > static inline uint32_t
> > __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> > {
> > - /* workaround gcc strict-aliasing warning */
> > - uintptr_t ptr = (uintptr_t)buf;
> > + /* extend strict-aliasing rules */
> > typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > - const u16_p *u16_buf = (const u16_p *)ptr;
> > -
> > - while (len >= (sizeof(*u16_buf) * 4)) {
> > - sum += u16_buf[0];
> > - sum += u16_buf[1];
> > - sum += u16_buf[2];
> > - sum += u16_buf[3];
> > - len -= sizeof(*u16_buf) * 4;
> > - u16_buf += 4;
> > - }
> > - while (len >= sizeof(*u16_buf)) {
> > + const u16_p *u16_buf = (const u16_p *)buf;
> > + const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > +
> > + for (; u16_buf != end; ++u16_buf)
Personally I would prefer post-incrementing here. It makes no difference, so I don't see any need to revise the patch.
> > sum += *u16_buf;
> > - len -= sizeof(*u16_buf);
> > - u16_buf += 1;
> > - }
> >
> > - /* if length is in odd bytes */
> > - if (len == 1) {
> > + /* if length is odd, keeping it byte order independent */
> > + if (len % 2) {
> > uint16_t left = 0;
> > - *(uint8_t *)&left = *(const uint8_t *)u16_buf;
> > + *(unsigned char*)&left = *(const unsigned char *)end;
> > sum += left;
> > }
> >
> >
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH 1/1] net: fix aliasing issue in checksum computation
2021-10-14 17:20 ` Ferruh Yigit
2021-10-14 20:22 ` Morten Brørup
@ 2021-10-16 8:21 ` Morten Brørup
2021-10-16 17:17 ` Georg Sauthoff
2021-10-16 8:24 ` Morten Brørup
2 siblings, 1 reply; 9+ messages in thread
From: Morten Brørup @ 2021-10-16 8:21 UTC (permalink / raw)
To: Georg Sauthoff
Cc: dev, Ferruh Yigit, Olivier Matz, Thomas Monjalon, David Marchand
Geoff,
I have given this some more thoughts.
Most bytes transferred in real life are transferred in large packets, so faster processing of large packets is a great improvement!
Furthermore, a quick analysis of a recent packet sample from an ISP customer of ours shows that less than 8 % of the packets are odd size. Would you consider adding an unlikely() to the branch handling the odd byte at the end?
-Morten
> -----Original Message-----
> From: Morten Brørup
> Sent: Thursday, 14 October 2021 22.22
>
> > -----Original Message-----
> > From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> > Sent: Thursday, 14 October 2021 19.20
> >
> > On 9/18/2021 12:49 PM, Georg Sauthoff wrote:
> > > That means a superfluous cast is removed and aliasing through a
> > uint8_t
> > > pointer is eliminated. Note that uint8_t doesn't have the same
> > > strict-aliasing properties as unsigned char.
> > >
> > > Also simplified the loop since a modern C compiler can speed up
> (i.e.
> > > auto-vectorize) it in a similar way. For example, GCC auto-
> vectorizes
> > it
> > > for Haswell using AVX registers while halving the number of
> > instructions
> > > in the generated code.
> > >
> > > Signed-off-by: Georg Sauthoff <mail@gms.tf>
> >
> > + Morten. (Because of past reviews on cksum code)
>
> Thanks, Ferruh.
>
> I have not verified the claimed benefits of the patch, but I have
> reviewed the code thoroughly, and it looks perfectly good to me.
>
> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
>
> BTW: It makes me wonder if other parts of DPDK could benefit from the
> same treatment. Especially some of the older DPDK code, where we were
> trying to optimize by hand what a modern compiler can optimize for us
> today.
>
> >
> > > ---
> > > lib/net/rte_ip.h | 27 ++++++++-------------------
> > > 1 file changed, 8 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> > > index 05948b69b7..386db94c85 100644
> > > --- a/lib/net/rte_ip.h
> > > +++ b/lib/net/rte_ip.h
> > > @@ -141,29 +141,18 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
> > *ipv4_hdr)
> > > static inline uint32_t
> > > __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> > > {
> > > - /* workaround gcc strict-aliasing warning */
> > > - uintptr_t ptr = (uintptr_t)buf;
> > > + /* extend strict-aliasing rules */
> > > typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > > - const u16_p *u16_buf = (const u16_p *)ptr;
> > > -
> > > - while (len >= (sizeof(*u16_buf) * 4)) {
> > > - sum += u16_buf[0];
> > > - sum += u16_buf[1];
> > > - sum += u16_buf[2];
> > > - sum += u16_buf[3];
> > > - len -= sizeof(*u16_buf) * 4;
> > > - u16_buf += 4;
> > > - }
> > > - while (len >= sizeof(*u16_buf)) {
> > > + const u16_p *u16_buf = (const u16_p *)buf;
> > > + const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > +
> > > + for (; u16_buf != end; ++u16_buf)
>
> Personally I would prefer post-incrementing here. It makes no
> difference, so I don't see any need to revise the patch.
>
> > > sum += *u16_buf;
> > > - len -= sizeof(*u16_buf);
> > > - u16_buf += 1;
> > > - }
> > >
> > > - /* if length is in odd bytes */
> > > - if (len == 1) {
> > > + /* if length is odd, keeping it byte order independent */
> > > + if (len % 2) {
I assume that the compiler already optimizes "% 2" to "& 1".
> > > uint16_t left = 0;
> > > - *(uint8_t *)&left = *(const uint8_t *)u16_buf;
> > > + *(unsigned char*)&left = *(const unsigned char *)end;
> > > sum += left;
> > > }
> > >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH 1/1] net: fix aliasing issue in checksum computation
2021-10-16 8:21 ` Morten Brørup
@ 2021-10-16 17:17 ` Georg Sauthoff
0 siblings, 0 replies; 9+ messages in thread
From: Georg Sauthoff @ 2021-10-16 17:17 UTC (permalink / raw)
To: Morten Brørup
Cc: dev, Ferruh Yigit, Olivier Matz, Thomas Monjalon, David Marchand
Hello,
On Sat, Oct 16, 2021 at 10:21:03AM +0200, Morten Brørup wrote:
> I have given this some more thoughts.
>
> Most bytes transferred in real life are transferred in large packets,
> so faster processing of large packets is a great improvement!
>
> Furthermore, a quick analysis of a recent packet sample from an ISP
> customer of ours shows that less than 8 % of the packets are odd size.
> Would you consider adding an unlikely() to the branch handling the odd
> byte at the end?
sure, I don't see a problem with adding unlikely() there.
I'll post a version 2 of that patch then, tomorrow.
Best regards
Georg
--
"No one can write decently who is distrustful of the reader's
intelligence, or whose attitude is patronizing." (William Strunk,
Jr. and E.B. White, The Elements of Style, p. 70, 1959)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH 1/1] net: fix aliasing issue in checksum computation
2021-10-14 17:20 ` Ferruh Yigit
2021-10-14 20:22 ` Morten Brørup
2021-10-16 8:21 ` Morten Brørup
@ 2021-10-16 8:24 ` Morten Brørup
2 siblings, 0 replies; 9+ messages in thread
From: Morten Brørup @ 2021-10-16 8:24 UTC (permalink / raw)
To: Georg Sauthoff
Cc: dev, Ferruh Yigit, Olivier Matz, Thomas Monjalon, David Marchand
Georg, I apologize for calling you Geoff below! Just realized my mistake.
Med venlig hilsen / Kind regards,
-Morten Brørup
> -----Original Message-----
> From: Morten Brørup
> Sent: Saturday, 16 October 2021 10.21
> To: 'Georg Sauthoff'
> Cc: 'dev@dpdk.org'; 'Ferruh Yigit'; 'Olivier Matz'; 'Thomas Monjalon';
> 'David Marchand'
> Subject: RE: [dpdk-dev] [PATCH 1/1] net: fix aliasing issue in checksum
> computation
>
> Geoff,
>
> I have given this some more thoughts.
>
> Most bytes transferred in real life are transferred in large packets,
> so faster processing of large packets is a great improvement!
>
> Furthermore, a quick analysis of a recent packet sample from an ISP
> customer of ours shows that less than 8 % of the packets are odd size.
> Would you consider adding an unlikely() to the branch handling the odd
> byte at the end?
>
> -Morten
>
> > -----Original Message-----
> > From: Morten Brørup
> > Sent: Thursday, 14 October 2021 22.22
> >
> > > -----Original Message-----
> > > From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> > > Sent: Thursday, 14 October 2021 19.20
> > >
> > > On 9/18/2021 12:49 PM, Georg Sauthoff wrote:
> > > > That means a superfluous cast is removed and aliasing through a
> > > uint8_t
> > > > pointer is eliminated. Note that uint8_t doesn't have the same
> > > > strict-aliasing properties as unsigned char.
> > > >
> > > > Also simplified the loop since a modern C compiler can speed up
> > (i.e.
> > > > auto-vectorize) it in a similar way. For example, GCC auto-
> > vectorizes
> > > it
> > > > for Haswell using AVX registers while halving the number of
> > > instructions
> > > > in the generated code.
> > > >
> > > > Signed-off-by: Georg Sauthoff <mail@gms.tf>
> > >
> > > + Morten. (Because of past reviews on cksum code)
> >
> > Thanks, Ferruh.
> >
> > I have not verified the claimed benefits of the patch, but I have
> > reviewed the code thoroughly, and it looks perfectly good to me.
> >
> > Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
> >
> > BTW: It makes me wonder if other parts of DPDK could benefit from the
> > same treatment. Especially some of the older DPDK code, where we were
> > trying to optimize by hand what a modern compiler can optimize for us
> > today.
> >
> > >
> > > > ---
> > > > lib/net/rte_ip.h | 27 ++++++++-------------------
> > > > 1 file changed, 8 insertions(+), 19 deletions(-)
> > > >
> > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> > > > index 05948b69b7..386db94c85 100644
> > > > --- a/lib/net/rte_ip.h
> > > > +++ b/lib/net/rte_ip.h
> > > > @@ -141,29 +141,18 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
> > > *ipv4_hdr)
> > > > static inline uint32_t
> > > > __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> > > > {
> > > > - /* workaround gcc strict-aliasing warning */
> > > > - uintptr_t ptr = (uintptr_t)buf;
> > > > + /* extend strict-aliasing rules */
> > > > typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > > > - const u16_p *u16_buf = (const u16_p *)ptr;
> > > > -
> > > > - while (len >= (sizeof(*u16_buf) * 4)) {
> > > > - sum += u16_buf[0];
> > > > - sum += u16_buf[1];
> > > > - sum += u16_buf[2];
> > > > - sum += u16_buf[3];
> > > > - len -= sizeof(*u16_buf) * 4;
> > > > - u16_buf += 4;
> > > > - }
> > > > - while (len >= sizeof(*u16_buf)) {
> > > > + const u16_p *u16_buf = (const u16_p *)buf;
> > > > + const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > > +
> > > > + for (; u16_buf != end; ++u16_buf)
> >
> > Personally I would prefer post-incrementing here. It makes no
> > difference, so I don't see any need to revise the patch.
> >
> > > > sum += *u16_buf;
> > > > - len -= sizeof(*u16_buf);
> > > > - u16_buf += 1;
> > > > - }
> > > >
> > > > - /* if length is in odd bytes */
> > > > - if (len == 1) {
> > > > + /* if length is odd, keeping it byte order independent */
> > > > + if (len % 2) {
>
> I assume that the compiler already optimizes "% 2" to "& 1".
>
> > > > uint16_t left = 0;
> > > > - *(uint8_t *)&left = *(const uint8_t *)u16_buf;
> > > > + *(unsigned char*)&left = *(const unsigned char *)end;
> > > > sum += left;
> > > > }
> > > >
^ permalink raw reply [flat|nested] 9+ messages in thread