From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "thomas@monjalon.net" <thomas@monjalon.net>,
"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
Date: Mon, 17 Jun 2019 06:52:01 +0000 [thread overview]
Message-ID: <BYAPR18MB2424F7A65181FF5433CF041CC8EB0@BYAPR18MB2424.namprd18.prod.outlook.com> (raw)
In-Reply-To: <VE1PR08MB51496238B1C7148AAE42549A98EB0@VE1PR08MB5149.eurprd08.prod.outlook.com>
> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Monday, June 17, 2019 6:19 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> compiler
>
> External Email
>
> ----------------------------------------------------------------------
> > >
> > > Reduced the CC list (changing the topic slightly)
> > >
> > > > >
> > > > > My understanding is that the generated code for both your patch
> > > > > and my changes above is the same. Above suggested changes will
> > > > > conform to ACLE recommendation.
> > > >
> > > > Though instructions are different. Effective cycles are same even
> > > > though First dup updates the four positions.
> > > Can you elaborate on how the instructions are different?
> > > I wrote the following code with both the methods:
> > >
> > > uint32x4_t u32x4_gather_gcc (uint32_t *p0, uint32_t *p1, uint32_t
> > > *p2, uint32_t *p3) {
> > > uint32x4_t r = {*p0, *p1, *p2, *p3};
> > >
> > > return r;
> > > }
> > >
> > > uint32x4_t u32x4_gather_acle (uint32_t *p0, uint32_t *p1, uint32_t
> > > *p2, uint32_t *p3) {
> > > uint32x4_t r;
> > >
> > > r = vdupq_n_u32 (* p0);
> > > r = vsetq_lane_u32 (*p1, r, 1);
> > > r = vsetq_lane_u32 (*p2, r, 2);
> > > r = vsetq_lane_u32 (*p3, r, 3);
> > >
> > > return r;
> > > }
> > >
> > > The generated code has the same instructions for both (omitted the
> > > unwanted
> > > parts):
> > >
> > > u32x4_gather_gcc:
> > > ld1r {v0.4s}, [x0]
> > > ld1 {v0.s}[1], [x1]
> > > ld1 {v0.s}[2], [x2]
> > > ld1 {v0.s}[3], [x3]
> > > ret
> > >
> > > u32x4_gather_acle:
> > > ld1r {v0.4s}, [x0]
> > > ld1 {v0.s}[1], [x1]
> > > ld1 {v0.s}[2], [x2]
> > > ld1 {v0.s}[3], [x3]
> > > ret
> > >
> > > The first 'ld1r' updates all the lanes in both the cases.
> >
> >
> > Please check actual generated code for ACL case. We can see difference
> I think there is something wrong with the way you are looking at the
> generated code. Please see comments below.
I am generating the dis assembly like below.
gdb -batch -ex 'file build/app/test ' -ex 'disassemble /rm search_neon_4'
You can try it out.
>
> > > > To make forward progress send the v2 based on the updated logic
> > > > just to make ACLE Spec happy, I don’t see any real reason to do
> > > > it though
> > > > 😊
> > > Thanks for the patch, it was important to make forward progress.
> > > But, I think we should carry forward the discussion as I plan to
> > > change other parts of DPDK on similar lines. I want to understand
> > > why you think there is no real reason. The ACLE recommendation
> > > mentions the
> > reasoning.
> >
> > # I see following in the ACLE spec. What is the actual reasoning?
> > "
> > ACLE does not define static construction of vector types. E.g.
> > int32x4_t x = { 1, 2, 3, 4 };
> > Is not portable. Use the vcreate or vdup intrinsics to construct
> > values from scalars.
> > "
> Here is the complete text from ACLE 2.1
>
> 12.2.6 Compatibility with other vector programming models Programmers
> should take particular care when combining the Neon Intrinsics API with
> alternative vector programming models; ACLE does not specify how the
> NEON Intrinsics API interoperates with them.
> For instance, the GCC vector extension permits include “arm_neon.h”
> ...
> uint32x2_t x = {0, 1}; // GCC extension.
> uint32_t y = vget_lane_s32 (x, 0); // ACLE NEON Intrinsic.
> But with this code the value stored in ‘y’ will depend on both the target
> architecture (AArch32 or AArch64) and whether the program is running in
> big- or little-endian mode.
I don’t have a big endian machine to test. I would be interesting to see
The output in bigendian.
> It is recommended that NEON Intrinsics be used consistently:
> include “arm_neon.h”
> ...
> const int temp[2] = {0, 1};
> uint32x2_t x = vld1_s32 (temp);
> uint32_t y = vget_lane_s32 (x, 0);
>
> >
> > # Why does compiler(gcc) allows if it not indented to use?
> I do not have an answer. This is a recommendation and all that I am trying to
> say is, following the recommendation does not cost us anything in
> performance.
If there is no performance regression then no issue in changing to this format.
>
> >
> > # I think, it may be time to introduce UndefinedBehaviorSanitizer
> > (UBSan) Gcc feature to DPDK to detect undefined behavior checks to
> > detect such case
> I am not sure if it helps here.
>
> >
> > >
> >
> > > >
> > > > http://patches.dpdk.org/patch/54656/
> > > >
next prev parent reply other threads:[~2019-06-17 6:52 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-06 14:50 jerinj
2019-06-06 15:55 ` Michael Santana Francisco
2019-06-07 5:42 ` Honnappa Nagarahalli
2019-06-07 5:35 ` Honnappa Nagarahalli
2019-06-07 6:21 ` Jerin Jacob Kollanukkaran
2019-06-10 5:29 ` Honnappa Nagarahalli
2019-06-10 9:39 ` Jerin Jacob Kollanukkaran
2019-06-11 1:27 ` Honnappa Nagarahalli
2019-06-11 14:24 ` Jerin Jacob Kollanukkaran
2019-06-11 19:48 ` Honnappa Nagarahalli
2019-06-12 2:41 ` Jerin Jacob Kollanukkaran
2019-06-17 0:48 ` Honnappa Nagarahalli
2019-06-17 6:52 ` Jerin Jacob Kollanukkaran [this message]
2019-06-10 12:10 ` Aaron Conole
2019-06-11 14:15 ` [dpdk-dev] [PATCH v2] " jerinj
2019-06-11 14:53 ` Aaron Conole
2019-06-11 15:07 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BYAPR18MB2424F7A65181FF5433CF041CC8EB0@BYAPR18MB2424.namprd18.prod.outlook.com \
--to=jerinj@marvell.com \
--cc=Gavin.Hu@arm.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=dev@dpdk.org \
--cc=nd@arm.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).