From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Guduri Prathyusha <gprathyusha@caviumnetworks.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
"Jianbo.Liu@arm.com" <Jianbo.Liu@arm.com>,
"guduriprathyusha@gmail.com" <guduriprathyusha@gmail.com>,
"Kantecki, Tomasz" <tomasz.kantecki@intel.com>
Subject: Re: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping
Date: Thu, 2 Nov 2017 15:52:10 +0000 [thread overview]
Message-ID: <2601191342CEEE43887BDE71AB9772585FAB884B@irsmsx105.ger.corp.intel.com> (raw)
In-Reply-To: <20171102153327.GA24586@cavium.com>
> -----Original Message-----
> From: Guduri Prathyusha [mailto:gprathyusha@caviumnetworks.com]
> Sent: Thursday, November 2, 2017 3:34 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Jianbo.Liu@arm.com; guduriprathyusha@gmail.com; Kantecki, Tomasz <tomasz.kantecki@intel.com>
> Subject: Re: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping
>
> On Thu, Nov 02, 2017 at 02:46:43PM +0000, Ananyev, Konstantin wrote:
> > Hi,
> Hi
> >
> > > -----Original Message-----
> > > From: Guduri Prathyusha [mailto:gprathyusha@caviumnetworks.com]
> > > Sent: Thursday, November 2, 2017 2:31 PM
> > > To: Kantecki, Tomasz <tomasz.kantecki@intel.com>
> > > Cc: Jianbo.Liu@arm.com; guduriprathyusha@gmail.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org; Guduri
> > > Prathyusha <gprathyusha@caviumnetworks.com>
> > > Subject: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping
> > >
> > > With -f-strict-aliasing enabled by default from -O2, gcc > 5.x gives
> > > undefined behavior in port_groupx4. 'pn' and 'pnum' are two different
> > > pointers pointing to same chunk of memory and with -f-strict-aliasing the
> > > pointers are assumed to be pointing to different memory and compiler
> > > reorders instructions that depend on pnum and pn. This breaks port
> > > grouping algorithm.
> > >
> > > This patch eliminates the usage of union and uses memcpy for copying
> > > gptbl[v].pnum to pn. memcpy when applied on built_in constant size does
> > > not call its library implementation but uses appropriate LD and ST
> > > instructions directly and hence no performance overhead.
> > >
> > > Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation")
> > > Fixes: af1694d94bf1 ("examples/l3fwd: fix crash with gcc 5")
> > > Signed-off-by: Guduri Prathyusha <gprathyusha@caviumnetworks.com>
> > > ---
> > > examples/l3fwd/l3fwd_neon.h | 11 +++--------
> > > examples/l3fwd/l3fwd_sse.h | 11 +++--------
> > > 2 files changed, 6 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h
> > > index 4bc161394..10a602a04 100644
> > > --- a/examples/l3fwd/l3fwd_neon.h
> > > +++ b/examples/l3fwd/l3fwd_neon.h
> > > @@ -100,11 +100,6 @@ static inline uint16_t *
> > > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1,
> > > uint16x8_t dp2)
> > > {
> > > - union {
> > > - uint16_t u16[FWDSTEP + 1];
> > > - uint64_t u64;
> > > - } *pnum = (void *)pn;
> > > -
> > > int32_t v;
> > > uint16x8_t mask = {1, 2, 4, 8, 0, 0, 0, 0};
> > >
> > > @@ -117,9 +112,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1,
> > >
> > > /* if dest port value has changed. */
> > > if (v != GRPMSK) {
> > > - pnum->u64 = gptbl[v].pnum;
> > > - pnum->u16[FWDSTEP] = 1;
> > > - lp = pnum->u16 + gptbl[v].idx;
> > > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum));
> > > + pn[FWDSTEP] = 1;
> > > + lp = pn + gptbl[v].idx;
> > > }
> > >
> > > return lp;
> > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h
> > > index 831760f02..79a71d77e 100644
> > > --- a/examples/l3fwd/l3fwd_sse.h
> > > +++ b/examples/l3fwd/l3fwd_sse.h
> > > @@ -98,11 +98,6 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
> > > static inline uint16_t *
> > > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2)
> > > {
> > > - union {
> > > - uint16_t u16[FWDSTEP + 1];
> > > - uint64_t u64;
> > > - } *pnum = (void *)pn;
> > > -
> > > int32_t v;
> > >
> > > dp1 = _mm_cmpeq_epi16(dp1, dp2);
> > > @@ -114,9 +109,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2)
> > >
> > > /* if dest port value has changed. */
> > > if (v != GRPMSK) {
> > > - pnum->u64 = gptbl[v].pnum;
> > > - pnum->u16[FWDSTEP] = 1;
> > > - lp = pnum->u16 + gptbl[v].idx;
> > > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum));
> > > + pn[FWDSTEP] = 1;
> > > + lp = pn + gptbl[v].idx;
> >
> > Could you explain a bit more here - which exactly instructions were reordered
> > and what kind of problems did it cause?
> > Specially on IA?
>
> This issue is observed on ARM since ARM gcc is more aggressive in
> reordering than x86 gcc.
Ok, then if x86 is not affected why to modify l3fwd_sse.h at all?
Unless there is a reproducible problem with x86 -
my preference would be to keep that file intact.
> In ARM when v != GRPMSK, the following
> instructions ordering is not guarenteed because of strict aliasing.
>
> lp[0] += gptbl[v].lpv;
> pnum->u64 = gptbl[v].pnum;
> pnum->u16[FWDSTEP] = 1;
> lp = pnum->u16 + gptbl[v].idx;
Ok, so what in particular is reordered by the compiler:
lp[0] += gptbl[v].lpv; (1)
pnum->u64 = gptbl[v].pnum; (2)
pnum->u16[FWDSTEP] = 1; (3)
lp = pnum->u16 + gptbl[v].idx; (4)
(2) and (3)?
If so I am not sure how it could be a problem:
they do stores to the different locations.
(1) and (4) as I can see shouldn't be reordered.
Anyway - if you think this a compiler reordering issue,
then adding rte_compiler_barrier() should fix the issue, right?
>
> That results in wrong lp[0] updation.
> memcpy in this case will avoid this problem.
>
> > In any case I don't think using rte_memcpy is a good thing to use here:
> > it is a huge inline function - way too much to copy just 64 bit variable.
>
> I agree that rte_memcpy is overhead in this case but how about using
> memcpy that will not use library implementation if the size is constant.
> memcpy with constant size uses built_in_memcpy that does not add
> performance overhead.
On x86 rte_memcpy() doesn't call libc memcpy() at all - it is a separate function:
ib/librte_eal/common/include/arch/x86/rte_memcpy.h
>
> Thoughts?
As I said - if x86 is not affected - please keep l3fwd_sse.h intact.
If it does (still not sure how) - check would compiler barrier help here.
Konstantin
next prev parent reply other threads:[~2017-11-02 15:52 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-02 14:31 Guduri Prathyusha
2017-11-02 14:46 ` Ananyev, Konstantin
2017-11-02 15:33 ` Guduri Prathyusha
2017-11-02 15:52 ` Ananyev, Konstantin [this message]
2017-11-02 17:38 ` Prathyusha, Guduri
2017-11-03 3:21 ` Jianbo.Liu
2017-11-03 5:42 ` Guduri Prathyusha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2601191342CEEE43887BDE71AB9772585FAB884B@irsmsx105.ger.corp.intel.com \
--to=konstantin.ananyev@intel.com \
--cc=Jianbo.Liu@arm.com \
--cc=dev@dpdk.org \
--cc=gprathyusha@caviumnetworks.com \
--cc=guduriprathyusha@gmail.com \
--cc=tomasz.kantecki@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).