DPDK patches and discussions
 help / color / mirror / Atom feed
From: Herbert Guan <Herbert.Guan@arm.com>
To: Pavan Nikhilesh Bhagavatula <pbhagavatula@caviumnetworks.com>,
	Jianbo Liu <Jianbo.Liu@arm.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] arch/arm: optimization for memcpy on AArch64
Date: Mon, 4 Dec 2017 07:14:09 +0000	[thread overview]
Message-ID: <HE1PR08MB2809E0B09B7F0986F591095C863C0@HE1PR08MB2809.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <20171203142049.m7moapt7msxrwacs@Pavan-LT>



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula
> [mailto:pbhagavatula@caviumnetworks.com]
> Sent: Sunday, December 3, 2017 22:21
> To: Herbert Guan <Herbert.Guan@arm.com>; Jianbo Liu
> <Jianbo.Liu@arm.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] arch/arm: optimization for memcpy on
> AArch64
>
> On Sun, Dec 03, 2017 at 12:38:35PM +0000, Herbert Guan wrote:
> > Pavan,
> >
> > Thanks for review and comments.  Please find my comments inline below.
> >
> > Best regards,
> > Herbert
> >
> <snip>
> > > There is an existing flag for arm32 to enable neon based memcpy
> > > RTE_ARCH_ARM_NEON_MEMCPY we could reuse that here as restrict
> does
> > > the same.
> > >
> > This implementation is actually not using ARM NEON instructions so the
> existing flag is not describing the option exactly.  It'll be good if the existing
> flag is "RTE_ARCH_ARM_MEMCPY" but unfortunately it might be too late
> now to get the flags aligned.
> >
>
> Correct me if I'm wrong but doesn't restrict tell the compiler to do SIMD
> optimization?
> Anyway can we put RTE_ARCH_ARM64_MEMCPY into config/common_base
> as CONFIG_RTE_ARCH_ARM64_MEMCPY=n so that it would be easier to
> enable/disable.
>

The result of using 'restrict' is to generate codes with ldp/stp instructions.  These instructions actually belong to the "data transfer instructions", though they are loading/storing a pair of registers.  'ld1/st1' are SIMD (NEON) instructions.

I can add CONFIG_RTE_ARCH_ARM64_MEMCPY=n  into common_armv8a_linuxapp in the new version as you've suggested.

> > > > +#include <rte_common.h>
> > > > +#include <rte_branch_prediction.h>
> > > > +
> > > >
> > >
> +/*********************************************************
> > > ***********
> > > > +***********
> > > > + * The memory copy performance differs on different AArch64
> > > > +micro-
> > > architectures.
> > > > + * And the most recent glibc (e.g. 2.23 or later) can provide a
> > > > +better memcpy()
> > > > + * performance compared to old glibc versions. It's always
> > > > +suggested to use a
> > > > + * more recent glibc if possible, from which the entire system
> > > > +can get
> > > benefit.
> > > > + *
> > > > + * This implementation improves memory copy on some aarch64
> > > > +micro-architectures,
> > > > + * when an old glibc (e.g. 2.19, 2.17...) is being used. It is
> > > > +disabled by
> > > > + * default and needs "RTE_ARCH_ARM64_MEMCPY" defined to
> activate.
> > > > +It's not
> > > > + * always providing better performance than memcpy() so users
> > > > +need to run unit
> > > > + * test "memcpy_perf_autotest" and customize parameters in
> > > > +customization section
> > > > + * below for best performance.
> > > > + *
> > > > + * Compiler version will also impact the rte_memcpy() performance.
> > > > +It's observed
> > > > + * on some platforms and with the same code, GCC 7.2.0 compiled
> > > > +binaries can
> > > > + * provide better performance than GCC 4.8.5 compiled binaries.
> > > > +
> > > >
> > >
> +*********************************************************
> > > ************
> > > > +*********/
> > > > +
> > > > +/**************************************
> > > > + * Beginning of customization section
> > > > +**************************************/
> > > > +#define ALIGNMENT_MASK 0x0F
> > > > +#ifndef RTE_ARCH_ARM64_MEMCPY_STRICT_ALIGN
> > > > +// Only src unalignment will be treaed as unaligned copy #define
> > > > +IS_UNALIGNED_COPY(dst, src) ((uintptr_t)(dst) & ALIGNMENT_MASK)
> > >
> > > We can use existing `rte_is_aligned` function instead.
> >
> > The exising 'rte_is_aligned()' inline function is defined in a relatively
> complex way, and there will be more instructions generated (using GCC
> 7.2.0):
> >
> > 0000000000000000 <align_check_rte>:   // using rte_is_aligned()
> >    0:91003c01 addx1, x0, #0xf
> >    4:927cec21 andx1, x1, #0xfffffffffffffff0
> >    8:eb01001f cmpx0, x1
> >    c:1a9f07e0 csetw0, ne  // ne = any
> >   10:d65f03c0 ret
> >   14:d503201f nop
> >
> > 0000000000000018 <align_check_simp>:   // using above expression
> >   18:12000c00 andw0, w0, #0xf
> >   1c:d65f03c0 ret
> >
> > So to get better performance, it's better to use the simple logic.
>
> Agreed, I have noticed that too maybe we could change rte_is_aligned to be
> simpler (Not in this patch).
>
> <snip>
> > > Would doing this still benifit if size is compile time constant?
> > > i.e. when
> > > __builtin_constant_p(n) is true.
> > >
> > Yes, performance margin is observed if size is compile time constant on
> some tested platforms.
> >
>
> Sorry I didn't get you but which is better? If size is compile time constant is
> using libc memcpy is better or going with restrict implementation better.
>
> If the former then we could do what 32bit rte_memcpy is using i.e.
>
> #define rte_memcpy(dst, src, n)              \
>         __extension__ ({                     \
>         (__builtin_constant_p(n)) ?          \
>         memcpy((dst), (src), (n)) :          \
>         rte_memcpy_func((dst), (src), (n)); })
>
Per my test, it usually has the same direction.  Means if the variable size can get improved performance, then hopefully the compile time constant will be improved as well, and vice versa.  The percentage might be different.  So in this patch, the property of size parameter (variable or compile time constant is not checked).

> Regards,
> Pavan.

Thanks,
Herbert
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

  reply	other threads:[~2017-12-04  7:14 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-27  7:49 Herbert Guan
2017-11-29 12:31 ` Jerin Jacob
2017-12-03 12:37   ` Herbert Guan
2017-12-15  4:06     ` Jerin Jacob
2017-12-18  2:51       ` Herbert Guan
2017-12-18  4:17         ` Jerin Jacob
2017-12-02  7:33 ` Pavan Nikhilesh Bhagavatula
2017-12-03 12:38   ` Herbert Guan
2017-12-03 14:20     ` Pavan Nikhilesh Bhagavatula
2017-12-04  7:14       ` Herbert Guan [this message]
2017-12-05  6:02 ` [dpdk-dev] [PATCH v2] " Herbert Guan
2017-12-18  2:54 ` [dpdk-dev] [PATCH v3] " Herbert Guan
2017-12-18  7:43   ` Jerin Jacob
2017-12-19  5:33     ` Herbert Guan
2017-12-19  7:24       ` Jerin Jacob
2017-12-21  5:33   ` [dpdk-dev] [PATCH v4] " Herbert Guan
2018-01-03 13:35     ` Jerin Jacob
2018-01-04 10:23       ` Herbert Guan
2018-01-04 10:20 ` [dpdk-dev] [PATCH v5] " Herbert Guan
2018-01-12 17:03   ` Thomas Monjalon
2018-01-15 10:57     ` Herbert Guan
2018-01-15 11:37       ` Thomas Monjalon
2018-01-18 23:54         ` Thomas Monjalon
2018-01-19  6:16           ` [dpdk-dev] 答复: " Herbert Guan
2018-01-19  6:10   ` [dpdk-dev] [PATCH v6] arch/arm: optimization for memcpy on ARM64 Herbert Guan
2018-01-20 16:21     ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=HE1PR08MB2809E0B09B7F0986F591095C863C0@HE1PR08MB2809.eurprd08.prod.outlook.com \
    --to=herbert.guan@arm.com \
    --cc=Jianbo.Liu@arm.com \
    --cc=dev@dpdk.org \
    --cc=pbhagavatula@caviumnetworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).