DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Jianbo.Liu@arm.com" <Jianbo.Liu@arm.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: Guduri Prathyusha <gprathyusha@caviumnetworks.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"guduriprathyusha@gmail.com" <guduriprathyusha@gmail.com>,
	"Kantecki, Tomasz" <tomasz.kantecki@intel.com>
Subject: Re: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping
Date: Fri, 3 Nov 2017 11:21:43 +0800	[thread overview]
Message-ID: <20171103032141.GA6518@arm.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAB884B@irsmsx105.ger.corp.intel.com>

The 11/02/2017 15:52, Ananyev, Konstantin wrote:
>
>
> > -----Original Message-----
> > From: Guduri Prathyusha [mailto:gprathyusha@caviumnetworks.com]
> > Sent: Thursday, November 2, 2017 3:34 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Jianbo.Liu@arm.com; guduriprathyusha@gmail.com; Kantecki, Tomasz <tomasz.kantecki@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping
> >
> > On Thu, Nov 02, 2017 at 02:46:43PM +0000, Ananyev, Konstantin wrote:
> > > Hi,
> > Hi
> > >
> > > > -----Original Message-----
> > > > From: Guduri Prathyusha [mailto:gprathyusha@caviumnetworks.com]
> > > > Sent: Thursday, November 2, 2017 2:31 PM
> > > > To: Kantecki, Tomasz <tomasz.kantecki@intel.com>
> > > > Cc: Jianbo.Liu@arm.com; guduriprathyusha@gmail.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org; Guduri
> > > > Prathyusha <gprathyusha@caviumnetworks.com>
> > > > Subject: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping
> > > >
> > > > With -f-strict-aliasing enabled by default from -O2, gcc > 5.x gives

May I ask the detail version about the gcc you are using?

> > > > undefined behavior in port_groupx4. 'pn' and 'pnum' are two different
> > > > pointers pointing to same chunk of memory and with -f-strict-aliasing the
> > > > pointers are assumed to be pointing to different memory and compiler
> > > > reorders instructions that depend on pnum and pn. This breaks port
> > > > grouping algorithm.
> > > >
> > > > This patch eliminates the usage of union and uses memcpy for copying
> > > > gptbl[v].pnum to pn. memcpy when applied on built_in constant size does
> > > > not call its library implementation but uses appropriate LD and ST
> > > > instructions directly and hence no performance overhead.
> > > >
> > > > Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation")
> > > > Fixes: af1694d94bf1 ("examples/l3fwd: fix crash with gcc 5")
> > > > Signed-off-by: Guduri Prathyusha <gprathyusha@caviumnetworks.com>
> > > > ---
> > > >  examples/l3fwd/l3fwd_neon.h | 11 +++--------
> > > >  examples/l3fwd/l3fwd_sse.h  | 11 +++--------
> > > >  2 files changed, 6 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h
> > > > index 4bc161394..10a602a04 100644
> > > > --- a/examples/l3fwd/l3fwd_neon.h
> > > > +++ b/examples/l3fwd/l3fwd_neon.h
> > > > @@ -100,11 +100,6 @@ static inline uint16_t *
> > > >  port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1,
> > > >              uint16x8_t dp2)
> > > >  {
> > > > -       union {
> > > > -               uint16_t u16[FWDSTEP + 1];
> > > > -               uint64_t u64;
> > > > -       } *pnum = (void *)pn;
> > > > -
> > > >         int32_t v;
> > > >         uint16x8_t mask = {1, 2, 4, 8, 0, 0, 0, 0};
> > > >
> > > > @@ -117,9 +112,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1,
> > > >
> > > >         /* if dest port value has changed. */
> > > >         if (v != GRPMSK) {
> > > > -               pnum->u64 = gptbl[v].pnum;
> > > > -               pnum->u16[FWDSTEP] = 1;
> > > > -               lp = pnum->u16 + gptbl[v].idx;
> > > > +               rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum));
> > > > +               pn[FWDSTEP] = 1;
> > > > +               lp = pn + gptbl[v].idx;
> > > >         }
> > > >
> > > >         return lp;
> > > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h
> > > > index 831760f02..79a71d77e 100644
> > > > --- a/examples/l3fwd/l3fwd_sse.h
> > > > +++ b/examples/l3fwd/l3fwd_sse.h
> > > > @@ -98,11 +98,6 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
> > > >  static inline uint16_t *
> > > >  port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2)
> > > >  {
> > > > -       union {
> > > > -               uint16_t u16[FWDSTEP + 1];
> > > > -               uint64_t u64;
> > > > -       } *pnum = (void *)pn;
> > > > -
> > > >         int32_t v;
> > > >
> > > >         dp1 = _mm_cmpeq_epi16(dp1, dp2);
> > > > @@ -114,9 +109,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2)
> > > >
> > > >         /* if dest port value has changed. */
> > > >         if (v != GRPMSK) {
> > > > -               pnum->u64 = gptbl[v].pnum;
> > > > -               pnum->u16[FWDSTEP] = 1;
> > > > -               lp = pnum->u16 + gptbl[v].idx;
> > > > +               rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum));
> > > > +               pn[FWDSTEP] = 1;
> > > > +               lp = pn + gptbl[v].idx;
> > >
> > > Could you explain a bit more here - which exactly instructions were reordered
> > > and what kind of problems did it cause?
> > > Specially on IA?
> >
> > This issue is observed on ARM since ARM gcc is more aggressive in
> > reordering than x86 gcc.
>
> Ok, then if x86 is not affected why to modify l3fwd_sse.h at all?
> Unless there is a reproducible problem with x86 -
> my preference would be to keep that file intact.
>
> > In ARM when v != GRPMSK, the following
> > instructions ordering is not guarenteed because of strict aliasing.
> >
> > lp[0] += gptbl[v].lpv;
> > pnum->u64 = gptbl[v].pnum;
> > pnum->u16[FWDSTEP] = 1;
> > lp = pnum->u16 + gptbl[v].idx;
>
> Ok, so what in particular is reordered by the compiler:
>
>  lp[0] += gptbl[v].lpv; (1)
>  pnum->u64 = gptbl[v].pnum; (2)
>  pnum->u16[FWDSTEP] = 1;   (3)
>  lp = pnum->u16 + gptbl[v].idx; (4)
>
> (2) and (3)?
> If so I am not sure how it could be a problem:
> they do stores to the different locations.
> (1) and (4) as I can see shouldn't be reordered.
> Anyway - if you think this a compiler reordering issue,
> then adding rte_compiler_barrier() should fix the issue, right?

Agree.

>
> >
> > That results in wrong lp[0] updation.
> > memcpy in this case will avoid this problem.
> >
> > > In any case I don't think using rte_memcpy is a good thing to use here:
> > > it is a huge inline function - way too much to copy just 64 bit variable.
> >
> > I agree that rte_memcpy is overhead in this case but how about using
> > memcpy that will not use library implementation if the size is constant.
> > memcpy with constant size uses built_in_memcpy that does not add
> > performance overhead.
>
> On x86 rte_memcpy() doesn't call libc memcpy() at all - it is a separate function:
> ib/librte_eal/common/include/arch/x86/rte_memcpy.h
>
> >
> > Thoughts?
>
> As I said - if x86 is  not affected - please keep l3fwd_sse.h intact.
> If it does (still not sure how) - check would compiler barrier help here.
> Konstantin
>

--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

  parent reply	other threads:[~2017-11-03  3:22 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-02 14:31 Guduri Prathyusha
2017-11-02 14:46 ` Ananyev, Konstantin
2017-11-02 15:33   ` Guduri Prathyusha
2017-11-02 15:52     ` Ananyev, Konstantin
2017-11-02 17:38       ` Prathyusha, Guduri
2017-11-03  3:21       ` Jianbo.Liu [this message]
2017-11-03  5:42         ` Guduri Prathyusha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171103032141.GA6518@arm.com \
    --to=jianbo.liu@arm.com \
    --cc=dev@dpdk.org \
    --cc=gprathyusha@caviumnetworks.com \
    --cc=guduriprathyusha@gmail.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=tomasz.kantecki@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).