From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id E2FC728BF for ; Mon, 4 Apr 2016 21:05:21 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP; 04 Apr 2016 12:05:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,441,1455004800"; d="scan'208";a="78881035" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by fmsmga004.fm.intel.com with ESMTP; 04 Apr 2016 12:05:20 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.35]) by IRSMSX107.ger.corp.intel.com ([169.254.10.137]) with mapi id 14.03.0248.002; Mon, 4 Apr 2016 20:05:19 +0100 From: "Ananyev, Konstantin" To: "Kulasek, TomaszX" CC: "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x Thread-Index: AQHRjoC/RWiN1YIBa0yX2h8fmQ5HWZ957/KA///9/YCAACeeEA== Date: Mon, 4 Apr 2016 19:05:18 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836B2E22E@irsmsx105.ger.corp.intel.com> References: <1459781123-7556-1-git-send-email-tomaszx.kulasek@intel.com> <2601191342CEEE43887BDE71AB97725836B2DF1A@irsmsx105.ger.corp.intel.com> <3042915272161B4EB253DA4D77EB373A14E7D5BA@IRSMSX102.ger.corp.intel.com> In-Reply-To: <3042915272161B4EB253DA4D77EB373A14E7D5BA@IRSMSX102.ger.corp.intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOGY2NWQxZTYtZDk0ZS00YzEzLTkxOGYtZjk4ZGRjN2QzNjUzIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IkpvKzV4UnZaM1VKWVFicHJmUTlnMXo2UmFRTEVMZXh0NEM5KzlmekVUS009In0= x-ctpclassification: CTP_IC x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Apr 2016 19:05:22 -0000 > -----Original Message----- > From: Kulasek, TomaszX > Sent: Monday, April 04, 2016 5:20 PM > To: Ananyev, Konstantin > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x >=20 > Hi Konstantin, >=20 > > -----Original Message----- > > From: Ananyev, Konstantin > > Sent: Monday, April 4, 2016 17:35 > > To: Kulasek, TomaszX > > Cc: dev@dpdk.org > > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5= .x > > > > Hi Tomasz, > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek > > > Sent: Monday, April 04, 2016 3:45 PM > > > To: dev@dpdk.org > > > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > > > > > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet > > > grouping algorithm. > > > > > > When last packet pointer "lp" and "pnum->u64" buffer points the same > > > memory buffer, high optimization can cause unpredictable results. It > > > seems that assignment of precalculated group sizes may interfere with > > > initialization of new group size when lp points value inside current > > > group and didn't should be changed. > > > > > > With gcc >5.x and optimization we cannot be sure which assignment wil= l > > > be done first, so the group size can be counted incorrectly. > > > > > > This patch eliminates intersection of assignment of initial group siz= e > > > (lp[0] =3D 1) and precalculated group sizes when gptbl[v].idx < 4. > > > > > > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match") > > > > > > Signed-off-by: Tomasz Kulasek > > > --- > > > examples/l3fwd/l3fwd_sse.h | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h > > > index f9cf50a..1afa1f0 100644 > > > --- a/examples/l3fwd/l3fwd_sse.h > > > +++ b/examples/l3fwd/l3fwd_sse.h > > > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t > > > *lp, __m128i dp1, __m128i dp2) > > > > > > /* if dest port value has changed. */ > > > if (v !=3D GRPMSK) { > > > - lp =3D pnum->u16 + gptbl[v].idx; > > > - lp[0] =3D 1; > > > pnum->u64 =3D gptbl[v].pnum; > > > + pnum->u16[FWDSTEP] =3D 1; > > > > Hmm, but FWDSTEP and gptbl[v].idx are not always equal. > > Actually could you explain a bit more - what exactly is reordered by gc= c > > 5.x, and how to reproduce it? > > i.e what sequence of input packets will trigger an error? > > Konstantin > > > > > + lp =3D pnum->u16 + gptbl[v].idx; > > > } > > > > > > return lp; > > > -- > > > 1.7.9.5 >=20 >=20 > Eg. For this case, when group is changed: >=20 > { > /* 0xb: a =3D=3D b, b =3D=3D c, c !=3D d, d =3D=3D e */ > .pnum =3D UINT64_C(0x0002000100020003), > .idx =3D 3, > .lpv =3D 2, > }, >=20 > We expect: >=20 > pnum->u16 =3D { 3, 2, 1, 2, x } > lp =3D pnum->u16 + 3; > // should be lp[0] =3D=3D 2 >=20 > but for gcc 5.2 >=20 > lp =3D pnum->u16 + gptbl[v].idx; > lp[0] =3D 1; > pnum->u64 =3D gptbl[v].pnum; >=20 > gives, for some reason lp[0] =3D=3D 1, even if pnum->u16[3] =3D=3D 2. >=20 > It causes, that group is shorter and fails trying to send next group with= messy length. >=20 > We should set lp[0] =3D 1 only when needed (gptbl[v].idx =3D=3D 4), so th= is is why I set pnum->u16[4] =3D 1. I set it up always to prevent > condition. For idx < 4 we don't need to set lp[0]. >=20 > The problem is that both pointers operates on the same memory buffer and,= it seems like gcc optimization will produce (it is wrong): >=20 > lp =3D pnum->u16 + gptbl[v].idx; > pnum->u64 =3D gptbl[v].pnum; > lp[0] =3D 1; >=20 > except: >=20 > lp =3D pnum->u16 + gptbl[v].idx; > lp[0] =3D 1; > pnum->u64 =3D gptbl[v].pnum; >=20 > This issue is with gcc 5.x and application seems to fail for the patterns= where gptbl[v].idx < 4. Thanks for explanation Tomasz. So it reordered: lp[0] =3D 1; pnum->u64 =3D gptbl[v].pnum; correct? My first thought was to insert a rte_complier_barrier() between these two l= ines, but actually your approach looks cleaner.=20 Konstantin