From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "thomas@monjalon.net" <thomas@monjalon.net>,
"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
Date: Wed, 12 Jun 2019 02:41:11 +0000 [thread overview]
Message-ID: <BYAPR18MB2424D7AE8B39DADEF8398E33C8EC0@BYAPR18MB2424.namprd18.prod.outlook.com> (raw)
In-Reply-To: <VE1PR08MB51497EA138297396C7DF5AD098ED0@VE1PR08MB5149.eurprd08.prod.outlook.com>
> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Wednesday, June 12, 2019 1:18 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> compiler
>
> Reduced the CC list (changing the topic slightly)
>
> > >
> > > My understanding is that the generated code for both your patch and
> > > my changes above is the same. Above suggested changes will conform
> > > to ACLE recommendation.
> >
> > Though instructions are different. Effective cycles are same even
> > though First dup updates the four positions.
> Can you elaborate on how the instructions are different?
> I wrote the following code with both the methods:
>
> uint32x4_t u32x4_gather_gcc (uint32_t *p0, uint32_t *p1, uint32_t *p2,
> uint32_t *p3) {
> uint32x4_t r = {*p0, *p1, *p2, *p3};
>
> return r;
> }
>
> uint32x4_t u32x4_gather_acle (uint32_t *p0, uint32_t *p1, uint32_t *p2,
> uint32_t *p3) {
> uint32x4_t r;
>
> r = vdupq_n_u32 (* p0);
> r = vsetq_lane_u32 (*p1, r, 1);
> r = vsetq_lane_u32 (*p2, r, 2);
> r = vsetq_lane_u32 (*p3, r, 3);
>
> return r;
> }
>
> The generated code has the same instructions for both (omitted the unwanted
> parts):
>
> u32x4_gather_gcc:
> ld1r {v0.4s}, [x0]
> ld1 {v0.s}[1], [x1]
> ld1 {v0.s}[2], [x2]
> ld1 {v0.s}[3], [x3]
> ret
>
> u32x4_gather_acle:
> ld1r {v0.4s}, [x0]
> ld1 {v0.s}[1], [x1]
> ld1 {v0.s}[2], [x2]
> ld1 {v0.s}[3], [x3]
> ret
>
> The first 'ld1r' updates all the lanes in both the cases.
Please check actual generated code for ACL case. We can see difference
0x00000000005cc1dc <+1884>: 80 6a 65 bc ldr s0, [x20, x5]
vs
0x00000000005cc1dc <+1884>: 9e 6a 65 b8 ldr w30, [x20, x5]
With patch:
244 /* Gather 4 bytes of input data for each stream. */
245 input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
0x00000000005cc1c8 <+1864>: b4 4f 46 a9 ldp x20, x19, [x29, #96]
0x00000000005cc1d8 <+1880>: 65 02 40 b9 ldr w5, [x19]
0x00000000005cc1dc <+1884>: 80 6a 65 bc ldr s0, [x20, x5]
0x00000000005cc26c <+2028>: 73 12 00 91 add x19, x19, #0x4
0x00000000005cc2ac <+2092>: b3 37 00 f9 str x19, [x29, #104]
246 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
0x00000000005cc1d0 <+1872>: a6 9f 47 a9 ldp x6, x7, [x29, #120]
0x00000000005cc1ec <+1900>: e5 00 40 b9 ldr w5, [x7]
0x00000000005cc1f0 <+1904>: d6 68 65 b8 ldr w22, [x6, x5]
0x00000000005cc21c <+1948>: e7 10 00 91 add x7, x7, #0x4
0x00000000005cc260 <+2016>: a7 43 00 f9 str x7, [x29, #128]
247 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
0x00000000005cc1d4 <+1876>: b5 4b 40 f9 ldr x21, [x29, #144]
0x00000000005cc1f4 <+1908>: a6 4f 40 f9 ldr x6, [x29, #152]
0x00000000005cc1f8 <+1912>: d4 00 40 b9 ldr w20, [x6]
0x00000000005cc1fc <+1916>: b5 6a 74 b8 ldr w21, [x21, x20]
0x00000000005cc224 <+1956>: c6 10 00 91 add x6, x6, #0x4
0x00000000005cc264 <+2020>: a6 4f 00 f9 str x6, [x29, #152]
248 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
0x00000000005cc200 <+1920>: a5 5b 40 f9 ldr x5, [x29, #176]
0x00000000005cc204 <+1924>: b4 00 40 b9 ldr w20, [x5]
0x00000000005cc208 <+1928>: a5 10 00 91 add x5, x5, #0x4
0x00000000005cc218 <+1944>: b7 57 40 f9 ldr x23, [x29, #168]
0x00000000005cc220 <+1952>: f4 6a 74 b8 ldr w20, [x23, x20]
0x00000000005cc228 <+1960>: a5 5b 00 f9 str x5, [x29, #176]
With out patch:
245 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
0x00000000005cc1c8 <+1864>: b4 4f 46 a9 ldp x20, x19, [x29, #96]
0x00000000005cc1d8 <+1880>: 65 02 40 b9 ldr w5, [x19]
0x00000000005cc1dc <+1884>: 9e 6a 65 b8 ldr w30, [x20, x5]
0x00000000005cc248 <+1992>: 73 12 00 91 add x19, x19, #0x4
0x00000000005cc24c <+1996>: b3 37 00 f9 str x19, [x29, #104]
246 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
0x00000000005cc1d0 <+1872>: a6 9f 47 a9 ldp x6, x7, [x29, #120]
0x00000000005cc1ec <+1900>: e5 00 40 b9 ldr w5, [x7]
0x00000000005cc1f0 <+1904>: d6 68 65 b8 ldr w22, [x6, x5]
0x00000000005cc228 <+1960>: e7 10 00 91 add x7, x7, #0x4
0x00000000005cc240 <+1984>: a7 43 00 f9 str x7, [x29, #128]
247 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
0x00000000005cc1d4 <+1876>: b5 4b 40 f9 ldr x21, [x29, #144]
0x00000000005cc1f4 <+1908>: a6 4f 40 f9 ldr x6, [x29, #152]
0x00000000005cc1f8 <+1912>: d4 00 40 b9 ldr w20, [x6]
0x00000000005cc1fc <+1916>: b5 6a 74 b8 ldr w21, [x21, x20]
0x00000000005cc22c <+1964>: c6 10 00 91 add x6, x6, #0x4
0x00000000005cc244 <+1988>: a6 4f 00 f9 str x6, [x29, #152]
248 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
0x00000000005cc200 <+1920>: a5 5b 40 f9 ldr x5, [x29, #176]
0x00000000005cc204 <+1924>: b4 00 40 b9 ldr w20, [x5]
0x00000000005cc208 <+1928>: a5 10 00 91 add x5, x5, #0x4
0x00000000005cc21c <+1948>: b7 57 40 f9 ldr x23, [x29, #168]
0x00000000005cc224 <+1956>: f4 6a 74 b8 ldr w20, [x23, x20]
0x00000000005cc230 <+1968>: a5 5b 00 f9 str x5, [x29, #176]
>
> > To make forward progress send the v2 based on the updated logic just
> > to make ACLE Spec happy, I don’t see any real reason to do it though
> > 😊
> Thanks for the patch, it was important to make forward progress.
> But, I think we should carry forward the discussion as I plan to change other
> parts of DPDK on similar lines. I want to understand why you think there is no
> real reason. The ACLE recommendation mentions the reasoning.
# I see following in the ACLE spec. What is the actual reasoning?
"
ACLE does not define static construction of vector types. E.g.
int32x4_t x = { 1, 2, 3, 4 };
Is not portable. Use the vcreate or vdup intrinsics to construct values from scalars.
"
# Why does compiler(gcc) allows if it not indented to use?
# I think, it may be time to introduce UndefinedBehaviorSanitizer (UBSan)
Gcc feature to DPDK to detect undefined behavior checks to detect such case
>
> >
> > http://patches.dpdk.org/patch/54656/
> >
next prev parent reply other threads:[~2019-06-12 2:41 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-06 14:50 jerinj
2019-06-06 15:55 ` Michael Santana Francisco
2019-06-07 5:42 ` Honnappa Nagarahalli
2019-06-07 5:35 ` Honnappa Nagarahalli
2019-06-07 6:21 ` Jerin Jacob Kollanukkaran
2019-06-10 5:29 ` Honnappa Nagarahalli
2019-06-10 9:39 ` Jerin Jacob Kollanukkaran
2019-06-11 1:27 ` Honnappa Nagarahalli
2019-06-11 14:24 ` Jerin Jacob Kollanukkaran
2019-06-11 19:48 ` Honnappa Nagarahalli
2019-06-12 2:41 ` Jerin Jacob Kollanukkaran [this message]
2019-06-17 0:48 ` Honnappa Nagarahalli
2019-06-17 6:52 ` Jerin Jacob Kollanukkaran
2019-06-10 12:10 ` Aaron Conole
2019-06-11 14:15 ` [dpdk-dev] [PATCH v2] " jerinj
2019-06-11 14:53 ` Aaron Conole
2019-06-11 15:07 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BYAPR18MB2424D7AE8B39DADEF8398E33C8EC0@BYAPR18MB2424.namprd18.prod.outlook.com \
--to=jerinj@marvell.com \
--cc=Gavin.Hu@arm.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=dev@dpdk.org \
--cc=nd@arm.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).