DPDK patches and discussions
 help / color / mirror / Atom feed
From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "thomas@monjalon.net" <thomas@monjalon.net>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
	nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
Date: Wed, 12 Jun 2019 02:41:11 +0000	[thread overview]
Message-ID: <BYAPR18MB2424D7AE8B39DADEF8398E33C8EC0@BYAPR18MB2424.namprd18.prod.outlook.com> (raw)
In-Reply-To: <VE1PR08MB51497EA138297396C7DF5AD098ED0@VE1PR08MB5149.eurprd08.prod.outlook.com>

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Wednesday, June 12, 2019 1:18 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> compiler
> 
> Reduced the CC list (changing the topic slightly)
> 
> > >
> > > My understanding is that the generated code for both your patch and
> > > my changes above is the same. Above suggested changes will conform
> > > to ACLE recommendation.
> >
> > Though instructions are different. Effective cycles are same even
> > though First dup updates the four positions.
> Can you elaborate on how the instructions are different?
> I wrote the following code with both the methods:
> 
> uint32x4_t u32x4_gather_gcc (uint32_t *p0, uint32_t *p1, uint32_t *p2,
> uint32_t *p3) {
>      uint32x4_t r = {*p0, *p1, *p2, *p3};
> 
>      return r;
> }
> 
> uint32x4_t u32x4_gather_acle (uint32_t *p0, uint32_t *p1, uint32_t *p2,
> uint32_t *p3) {
>      uint32x4_t r;
> 
>      r = vdupq_n_u32 (* p0);
>      r = vsetq_lane_u32 (*p1, r, 1);
>      r = vsetq_lane_u32 (*p2, r, 2);
>      r = vsetq_lane_u32 (*p3, r, 3);
> 
>      return r;
> }
> 
> The generated code has the same instructions for both (omitted the unwanted
> parts):
> 
> u32x4_gather_gcc:
>         ld1r    {v0.4s}, [x0]
>         ld1     {v0.s}[1], [x1]
>         ld1     {v0.s}[2], [x2]
>         ld1     {v0.s}[3], [x3]
>         ret
> 
> u32x4_gather_acle:
>         ld1r    {v0.4s}, [x0]
>         ld1     {v0.s}[1], [x1]
>         ld1     {v0.s}[2], [x2]
>         ld1     {v0.s}[3], [x3]
>         ret
> 
> The first 'ld1r' updates all the lanes in both the cases.


Please check actual generated code for ACL case. We can see difference
 0x00000000005cc1dc <+1884>:  80 6a 65 bc     ldr     s0, [x20, x5]
vs
  0x00000000005cc1dc <+1884>:  9e 6a 65 b8     ldr     w30, [x20, x5]

With patch:

244                     /* Gather 4 bytes of input data for each stream. */
245                     input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
   0x00000000005cc1c8 <+1864>:  b4 4f 46 a9     ldp     x20, x19, [x29, #96]
   0x00000000005cc1d8 <+1880>:  65 02 40 b9     ldr     w5, [x19]
   0x00000000005cc1dc <+1884>:  80 6a 65 bc     ldr     s0, [x20, x5]
   0x00000000005cc26c <+2028>:  73 12 00 91     add     x19, x19, #0x4
   0x00000000005cc2ac <+2092>:  b3 37 00 f9     str     x19, [x29, #104]

246                     input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
   0x00000000005cc1d0 <+1872>:  a6 9f 47 a9     ldp     x6, x7, [x29, #120]
   0x00000000005cc1ec <+1900>:  e5 00 40 b9     ldr     w5, [x7]
   0x00000000005cc1f0 <+1904>:  d6 68 65 b8     ldr     w22, [x6, x5]
   0x00000000005cc21c <+1948>:  e7 10 00 91     add     x7, x7, #0x4
   0x00000000005cc260 <+2016>:  a7 43 00 f9     str     x7, [x29, #128]

247                     input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
   0x00000000005cc1d4 <+1876>:  b5 4b 40 f9     ldr     x21, [x29, #144]
   0x00000000005cc1f4 <+1908>:  a6 4f 40 f9     ldr     x6, [x29, #152]
   0x00000000005cc1f8 <+1912>:  d4 00 40 b9     ldr     w20, [x6]
   0x00000000005cc1fc <+1916>:  b5 6a 74 b8     ldr     w21, [x21, x20]
   0x00000000005cc224 <+1956>:  c6 10 00 91     add     x6, x6, #0x4
   0x00000000005cc264 <+2020>:  a6 4f 00 f9     str     x6, [x29, #152]

248                     input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
   0x00000000005cc200 <+1920>:  a5 5b 40 f9     ldr     x5, [x29, #176]
   0x00000000005cc204 <+1924>:  b4 00 40 b9     ldr     w20, [x5]
   0x00000000005cc208 <+1928>:  a5 10 00 91     add     x5, x5, #0x4
   0x00000000005cc218 <+1944>:  b7 57 40 f9     ldr     x23, [x29, #168]
   0x00000000005cc220 <+1952>:  f4 6a 74 b8     ldr     w20, [x23, x20]
   0x00000000005cc228 <+1960>:  a5 5b 00 f9     str     x5, [x29, #176]
   
With out patch:
   
   245                     input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
   0x00000000005cc1c8 <+1864>:  b4 4f 46 a9     ldp     x20, x19, [x29, #96]
   0x00000000005cc1d8 <+1880>:  65 02 40 b9     ldr     w5, [x19]
   0x00000000005cc1dc <+1884>:  9e 6a 65 b8     ldr     w30, [x20, x5]
   0x00000000005cc248 <+1992>:  73 12 00 91     add     x19, x19, #0x4
   0x00000000005cc24c <+1996>:  b3 37 00 f9     str     x19, [x29, #104]

246                     input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
   0x00000000005cc1d0 <+1872>:  a6 9f 47 a9     ldp     x6, x7, [x29, #120]
   0x00000000005cc1ec <+1900>:  e5 00 40 b9     ldr     w5, [x7]
   0x00000000005cc1f0 <+1904>:  d6 68 65 b8     ldr     w22, [x6, x5]
   0x00000000005cc228 <+1960>:  e7 10 00 91     add     x7, x7, #0x4
   0x00000000005cc240 <+1984>:  a7 43 00 f9     str     x7, [x29, #128]

247                     input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
   0x00000000005cc1d4 <+1876>:  b5 4b 40 f9     ldr     x21, [x29, #144]
   0x00000000005cc1f4 <+1908>:  a6 4f 40 f9     ldr     x6, [x29, #152]
   0x00000000005cc1f8 <+1912>:  d4 00 40 b9     ldr     w20, [x6]
   0x00000000005cc1fc <+1916>:  b5 6a 74 b8     ldr     w21, [x21, x20]
   0x00000000005cc22c <+1964>:  c6 10 00 91     add     x6, x6, #0x4
   0x00000000005cc244 <+1988>:  a6 4f 00 f9     str     x6, [x29, #152]

248                     input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
   0x00000000005cc200 <+1920>:  a5 5b 40 f9     ldr     x5, [x29, #176]
   0x00000000005cc204 <+1924>:  b4 00 40 b9     ldr     w20, [x5]
   0x00000000005cc208 <+1928>:  a5 10 00 91     add     x5, x5, #0x4
   0x00000000005cc21c <+1948>:  b7 57 40 f9     ldr     x23, [x29, #168]
   0x00000000005cc224 <+1956>:  f4 6a 74 b8     ldr     w20, [x23, x20]
   0x00000000005cc230 <+1968>:  a5 5b 00 f9     str     x5, [x29, #176]




> 
> > To make forward progress send the v2 based on the updated logic  just
> > to make ACLE  Spec happy, I don’t see any real reason to do it though
> > 😊
> Thanks for the patch, it was important to make forward progress.
> But, I think we should carry forward the discussion as I plan to change other
> parts of DPDK on similar lines. I want to understand why you think there is no
> real reason. The ACLE recommendation mentions the reasoning.

# I see following in the ACLE spec. What is the actual reasoning? 
"
ACLE does not define static construction of vector types. E.g.
 int32x4_t x = { 1, 2, 3, 4 };
Is not portable. Use the vcreate or vdup intrinsics to construct values from scalars.
"

# Why does compiler(gcc) allows if it not indented to use? 

# I think, it may be time to introduce UndefinedBehaviorSanitizer (UBSan)
Gcc feature to DPDK to detect undefined behavior checks to detect such case

>

> >
> > http://patches.dpdk.org/patch/54656/
> >

  reply	other threads:[~2019-06-12  2:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-06 14:50 jerinj
2019-06-06 15:55 ` Michael Santana Francisco
2019-06-07  5:42   ` Honnappa Nagarahalli
2019-06-07  5:35 ` Honnappa Nagarahalli
2019-06-07  6:21   ` Jerin Jacob Kollanukkaran
2019-06-10  5:29     ` Honnappa Nagarahalli
2019-06-10  9:39       ` Jerin Jacob Kollanukkaran
2019-06-11  1:27         ` Honnappa Nagarahalli
2019-06-11 14:24           ` Jerin Jacob Kollanukkaran
2019-06-11 19:48             ` Honnappa Nagarahalli
2019-06-12  2:41               ` Jerin Jacob Kollanukkaran [this message]
2019-06-17  0:48                 ` Honnappa Nagarahalli
2019-06-17  6:52                   ` Jerin Jacob Kollanukkaran
2019-06-10 12:10 ` Aaron Conole
2019-06-11 14:15 ` [dpdk-dev] [PATCH v2] " jerinj
2019-06-11 14:53   ` Aaron Conole
2019-06-11 15:07     ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR18MB2424D7AE8B39DADEF8398E33C8EC0@BYAPR18MB2424.namprd18.prod.outlook.com \
    --to=jerinj@marvell.com \
    --cc=Gavin.Hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=dev@dpdk.org \
    --cc=nd@arm.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).