DPDK patches and discussions
 help / color / mirror / Atom feed
From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "thomas@monjalon.net" <thomas@monjalon.net>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
	nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
Date: Mon, 17 Jun 2019 06:52:01 +0000	[thread overview]
Message-ID: <BYAPR18MB2424F7A65181FF5433CF041CC8EB0@BYAPR18MB2424.namprd18.prod.outlook.com> (raw)
In-Reply-To: <VE1PR08MB51496238B1C7148AAE42549A98EB0@VE1PR08MB5149.eurprd08.prod.outlook.com>



> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Monday, June 17, 2019 6:19 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> compiler
> 
> External Email
> 
> ----------------------------------------------------------------------
> > >
> > > Reduced the CC list (changing the topic slightly)
> > >
> > > > >
> > > > > My understanding is that the generated code for both your patch
> > > > > and my changes above is the same. Above suggested changes will
> > > > > conform to ACLE recommendation.
> > > >
> > > > Though instructions are different. Effective cycles are same even
> > > > though First dup updates the four positions.
> > > Can you elaborate on how the instructions are different?
> > > I wrote the following code with both the methods:
> > >
> > > uint32x4_t u32x4_gather_gcc (uint32_t *p0, uint32_t *p1, uint32_t
> > > *p2, uint32_t *p3) {
> > >      uint32x4_t r = {*p0, *p1, *p2, *p3};
> > >
> > >      return r;
> > > }
> > >
> > > uint32x4_t u32x4_gather_acle (uint32_t *p0, uint32_t *p1, uint32_t
> > > *p2, uint32_t *p3) {
> > >      uint32x4_t r;
> > >
> > >      r = vdupq_n_u32 (* p0);
> > >      r = vsetq_lane_u32 (*p1, r, 1);
> > >      r = vsetq_lane_u32 (*p2, r, 2);
> > >      r = vsetq_lane_u32 (*p3, r, 3);
> > >
> > >      return r;
> > > }
> > >
> > > The generated code has the same instructions for both (omitted the
> > > unwanted
> > > parts):
> > >
> > > u32x4_gather_gcc:
> > >         ld1r    {v0.4s}, [x0]
> > >         ld1     {v0.s}[1], [x1]
> > >         ld1     {v0.s}[2], [x2]
> > >         ld1     {v0.s}[3], [x3]
> > >         ret
> > >
> > > u32x4_gather_acle:
> > >         ld1r    {v0.4s}, [x0]
> > >         ld1     {v0.s}[1], [x1]
> > >         ld1     {v0.s}[2], [x2]
> > >         ld1     {v0.s}[3], [x3]
> > >         ret
> > >
> > > The first 'ld1r' updates all the lanes in both the cases.
> >
> >
> > Please check actual generated code for ACL case. We can see difference
> I think there is something wrong with the way you are looking at the
> generated code. Please see comments below.

I am generating the dis assembly like below.
gdb -batch -ex 'file build/app/test ' -ex 'disassemble /rm search_neon_4'

You can try it out.

> 
> > > > To make forward progress send the v2 based on the updated logic
> > > > just to make ACLE  Spec happy, I don’t see any real reason to do
> > > > it though
> > > > 😊
> > > Thanks for the patch, it was important to make forward progress.
> > > But, I think we should carry forward the discussion as I plan to
> > > change other parts of DPDK on similar lines. I want to understand
> > > why you think there is no real reason. The ACLE recommendation
> > > mentions the
> > reasoning.
> >
> > # I see following in the ACLE spec. What is the actual reasoning?
> > "
> > ACLE does not define static construction of vector types. E.g.
> >  int32x4_t x = { 1, 2, 3, 4 };
> > Is not portable. Use the vcreate or vdup intrinsics to construct
> > values from scalars.
> > "
> Here is the complete text from ACLE 2.1
> 
> 12.2.6 Compatibility with other vector programming models Programmers
> should take particular care when combining the Neon Intrinsics API with
> alternative vector programming models; ACLE does not specify how the
> NEON Intrinsics API interoperates with them.
> For instance, the GCC vector extension permits include “arm_neon.h”
> ...
> uint32x2_t x = {0, 1}; // GCC extension.
> uint32_t y = vget_lane_s32 (x, 0); // ACLE NEON Intrinsic.
> But with this code the value stored in ‘y’ will depend on both the target
> architecture (AArch32 or AArch64) and whether the program is running in
> big- or little-endian mode.

I don’t have a big endian machine to test. I would be interesting to see 
The output in bigendian. 

> It is recommended that NEON Intrinsics be used consistently:
> include “arm_neon.h”
> ...
> const int temp[2] = {0, 1};
> uint32x2_t x = vld1_s32 (temp);
> uint32_t y = vget_lane_s32 (x, 0);
> 
> >
> > # Why does compiler(gcc) allows if it not indented to use?
> I do not have an answer. This is a recommendation and all that I am trying to
> say is, following the recommendation does not cost us anything in
> performance.

If there is no performance regression then no issue in changing to this format.

> 
> >
> > # I think, it may be time to introduce UndefinedBehaviorSanitizer
> > (UBSan) Gcc feature to DPDK to detect undefined behavior checks to
> > detect such case
> I am not sure if it helps here.
> 
> >
> > >
> >
> > > >
> > > > http://patches.dpdk.org/patch/54656/
> > > >

  reply	other threads:[~2019-06-17  6:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-06 14:50 jerinj
2019-06-06 15:55 ` Michael Santana Francisco
2019-06-07  5:42   ` Honnappa Nagarahalli
2019-06-07  5:35 ` Honnappa Nagarahalli
2019-06-07  6:21   ` Jerin Jacob Kollanukkaran
2019-06-10  5:29     ` Honnappa Nagarahalli
2019-06-10  9:39       ` Jerin Jacob Kollanukkaran
2019-06-11  1:27         ` Honnappa Nagarahalli
2019-06-11 14:24           ` Jerin Jacob Kollanukkaran
2019-06-11 19:48             ` Honnappa Nagarahalli
2019-06-12  2:41               ` Jerin Jacob Kollanukkaran
2019-06-17  0:48                 ` Honnappa Nagarahalli
2019-06-17  6:52                   ` Jerin Jacob Kollanukkaran [this message]
2019-06-10 12:10 ` Aaron Conole
2019-06-11 14:15 ` [dpdk-dev] [PATCH v2] " jerinj
2019-06-11 14:53   ` Aaron Conole
2019-06-11 15:07     ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR18MB2424F7A65181FF5433CF041CC8EB0@BYAPR18MB2424.namprd18.prod.outlook.com \
    --to=jerinj@marvell.com \
    --cc=Gavin.Hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=dev@dpdk.org \
    --cc=nd@arm.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).