From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id BF38A1B5FB for ; Thu, 2 Nov 2017 15:46:51 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Nov 2017 07:46:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,334,1505804400"; d="scan'208";a="1213389622" Received: from irsmsx102.ger.corp.intel.com ([163.33.3.155]) by fmsmga001.fm.intel.com with ESMTP; 02 Nov 2017 07:46:47 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.67]) by IRSMSX102.ger.corp.intel.com ([169.254.2.180]) with mapi id 14.03.0319.002; Thu, 2 Nov 2017 14:46:44 +0000 From: "Ananyev, Konstantin" To: Guduri Prathyusha , "Kantecki, Tomasz" CC: "Jianbo.Liu@arm.com" , "guduriprathyusha@gmail.com" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping Thread-Index: AQHTU+dVL1v39Y21s0ue+MuCIWi6PKMBKhMA Date: Thu, 2 Nov 2017 14:46:43 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772585FAB87F0@irsmsx105.ger.corp.intel.com> References: <20171102143114.24380-1-gprathyusha@caviumnetworks.com> In-Reply-To: <20171102143114.24380-1-gprathyusha@caviumnetworks.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYmJhNjMxMGItZmU4NS00Y2VlLWIxMmQtM2U3Njc2NDEzZDYyIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6ImhKdkNJajJ2YTAzZDNVZCtqa2haNUk4VStWeUFCNjY1ek95TEdWN1JGdms9In0= x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Nov 2017 14:46:52 -0000 Hi, > -----Original Message----- > From: Guduri Prathyusha [mailto:gprathyusha@caviumnetworks.com] > Sent: Thursday, November 2, 2017 2:31 PM > To: Kantecki, Tomasz > Cc: Jianbo.Liu@arm.com; guduriprathyusha@gmail.com; Ananyev, Konstantin <= konstantin.ananyev@intel.com>; dev@dpdk.org; Guduri > Prathyusha > Subject: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port groupin= g >=20 > With -f-strict-aliasing enabled by default from -O2, gcc > 5.x gives > undefined behavior in port_groupx4. 'pn' and 'pnum' are two different > pointers pointing to same chunk of memory and with -f-strict-aliasing the > pointers are assumed to be pointing to different memory and compiler > reorders instructions that depend on pnum and pn. This breaks port > grouping algorithm. >=20 > This patch eliminates the usage of union and uses memcpy for copying > gptbl[v].pnum to pn. memcpy when applied on built_in constant size does > not call its library implementation but uses appropriate LD and ST > instructions directly and hence no performance overhead. >=20 > Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation") > Fixes: af1694d94bf1 ("examples/l3fwd: fix crash with gcc 5") > Signed-off-by: Guduri Prathyusha > --- > examples/l3fwd/l3fwd_neon.h | 11 +++-------- > examples/l3fwd/l3fwd_sse.h | 11 +++-------- > 2 files changed, 6 insertions(+), 16 deletions(-) >=20 > diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h > index 4bc161394..10a602a04 100644 > --- a/examples/l3fwd/l3fwd_neon.h > +++ b/examples/l3fwd/l3fwd_neon.h > @@ -100,11 +100,6 @@ static inline uint16_t * > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1, > uint16x8_t dp2) > { > - union { > - uint16_t u16[FWDSTEP + 1]; > - uint64_t u64; > - } *pnum =3D (void *)pn; > - > int32_t v; > uint16x8_t mask =3D {1, 2, 4, 8, 0, 0, 0, 0}; >=20 > @@ -117,9 +112,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, = uint16x8_t dp1, >=20 > /* if dest port value has changed. */ > if (v !=3D GRPMSK) { > - pnum->u64 =3D gptbl[v].pnum; > - pnum->u16[FWDSTEP] =3D 1; > - lp =3D pnum->u16 + gptbl[v].idx; > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum)); > + pn[FWDSTEP] =3D 1; > + lp =3D pn + gptbl[v].idx; > } >=20 > return lp; > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h > index 831760f02..79a71d77e 100644 > --- a/examples/l3fwd/l3fwd_sse.h > +++ b/examples/l3fwd/l3fwd_sse.h > @@ -98,11 +98,6 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_= t dst_port[FWDSTEP]) > static inline uint16_t * > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128= i dp2) > { > - union { > - uint16_t u16[FWDSTEP + 1]; > - uint64_t u64; > - } *pnum =3D (void *)pn; > - > int32_t v; >=20 > dp1 =3D _mm_cmpeq_epi16(dp1, dp2); > @@ -114,9 +109,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, = __m128i dp1, __m128i dp2) >=20 > /* if dest port value has changed. */ > if (v !=3D GRPMSK) { > - pnum->u64 =3D gptbl[v].pnum; > - pnum->u16[FWDSTEP] =3D 1; > - lp =3D pnum->u16 + gptbl[v].idx; > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum)); > + pn[FWDSTEP] =3D 1; > + lp =3D pn + gptbl[v].idx; Could you explain a bit more here - which exactly instructions were reorder= ed and what kind of problems did it cause? Specially on IA? In any case I don't think using rte_memcpy is a good thing to use here: it is a huge inline function - way too much to copy just 64 bit variable. Konstantin > } >=20 > return lp; > -- > 2.14.1