From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on0066.outbound.protection.outlook.com [104.47.36.66]) by dpdk.org (Postfix) with ESMTP id 7FD2D1B3B1 for ; Thu, 2 Nov 2017 18:38:07 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=I8HFpXaiW8oLxrDv7kH1UKSNIiWQySPpdMdsVPJAAoQ=; b=TOLGioBZXTXxeJLKQcCJaU+eJsVlO6sjdzks8SDRj+R7P2v8UmtkgwIsTSyQ9iOqIQ/I3nEgrYkyuoo1fW3IZYnkl8gteJqf8VpqodByQEBni93w8ppSNOGycNlQBvn56MUQJIhVqVlrvExfqgE3W2mTwoRaa1v5i3Q4IJ8aQgQ= Received: from BY2PR07MB1505.namprd07.prod.outlook.com (10.162.77.13) by BY2PR07MB1507.namprd07.prod.outlook.com (10.162.77.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.178.6; Thu, 2 Nov 2017 17:38:04 +0000 Received: from BY2PR07MB1505.namprd07.prod.outlook.com ([10.162.77.13]) by BY2PR07MB1505.namprd07.prod.outlook.com ([10.162.77.13]) with mapi id 15.20.0178.015; Thu, 2 Nov 2017 17:38:04 +0000 From: "Prathyusha, Guduri" To: "Ananyev, Konstantin" CC: "dev@dpdk.org" , "Jianbo.Liu@arm.com" , "guduriprathyusha@gmail.com" , "Kantecki, Tomasz" Thread-Topic: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping Thread-Index: AQHTU+dPvSuE51siZUSvt09y8a2D7qMBKv2AgABpSwD//6j/AIAAGmiS Date: Thu, 2 Nov 2017 17:38:04 +0000 Message-ID: References: <20171102143114.24380-1-gprathyusha@caviumnetworks.com> <2601191342CEEE43887BDE71AB9772585FAB87F0@irsmsx105.ger.corp.intel.com> <20171102153327.GA24586@cavium.com>, <2601191342CEEE43887BDE71AB9772585FAB884B@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAB884B@irsmsx105.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Guduri.Prathyusha@cavium.com; x-originating-ip: [111.93.218.67] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; BY2PR07MB1507; 6:bGVoU/vYw6qhNvKmKyypBM3u1aflo6fC3l9BWgA13uk3pUrUHv2zbnLZ8YiN7MLxeQ/qTze+mZYwXDk7Ygb5FRC4bIRnm4RitXJt9Asq2WBrR88eRZkfwhZo7IxgLP9SeozolalwHi/oE8jUrYUEPiePxn6KzcrJeRWNiB8vVqDYusodFBAMsLOZ6eEdHmOHeGA/6Z92X2SoPTtC0mpRC4FZvPHaQcWTwcUxYSJoE6FOWXwFqYjgRsOlH8XNFhAxkcniXgj8QmsY5ASsE5TTJEyN6esue6Ew19yrCY4rA6q0k35x6sliAt0YuPBAnbSiT3YCsIwPEFHWq6OEFjE4UZQY/GLBBrJ49f6dx4KIpcg=; 5:i6BW1jNC4Pj542sw97jul14ubhBf5YBJe/HJgg7fW47VGgen6Tv2oqiFP4BnG+lCWAANa7mfOFduj67FhB8aKn+wzHrhCwxZ8AoOsyGskuLx38afWYlsDweomw9vtp35/YKDnyDhj6H2SBNBoqs4OHZY72x30jd8koe4YOkw7io=; 24:iJT8F5KaA0yXvT1q57d9iVveQkGaHxB6RPC5PEPNUVXb2qz8lRYrOWhTmDmbGnl0EL4qeYyblusis4YM0GmBJvIAgYGdC+0wG2iEpXwh3d8=; 7:TF0eJ55uUUAGVxU8QfIrHrx08a3WVijhUJO8e3+AqaNdyioBSxxDQ1F/vYvkc385ZkvWnN0vLIGgc/H/XaV92vGGTEdrMkjG7CK/uoO3W+Yc9OpZkyG1IR1eTbe/t6wTMc3IyueLwgYfVWyAVBs6x1TvSZooMcVgeBKG5EiK+SljebdaQP1HKSrHjLxM+Cn7Fw3KsgKl4iOCxiBR1q6jg4bZuol5h5y+nIkOJMEY7qgDV39qHnell98F3g1W9fDo x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 07e22538-db34-417c-4b79-08d522187908 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603199); SRVR:BY2PR07MB1507; x-ms-traffictypediagnostic: BY2PR07MB1507: x-exchange-antispam-report-test: UriScan:(180628864354917)(228905959029699); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(3002001)(100000703101)(100105400095)(10201501046)(93006095)(93001095)(3231020)(6041248)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123558100)(20161123560025)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:BY2PR07MB1507; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:BY2PR07MB1507; x-forefront-prvs: 047999FF16 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(346002)(199003)(13464003)(189002)(24454002)(3280700002)(3846002)(3660700001)(8676002)(106356001)(105586002)(19627405001)(2906002)(54896002)(6246003)(7736002)(6116002)(4326008)(39060400002)(55016002)(9686003)(236005)(14454004)(189998001)(81156014)(66066001)(101416001)(53936002)(74316002)(53546010)(316002)(72206003)(478600001)(86362001)(33656002)(5660300001)(6606003)(2950100002)(6916009)(81166006)(2900100001)(6436002)(68736007)(97736004)(93886005)(54906003)(102836003)(54356999)(8936002)(50986999)(77096006)(99286004)(6506006)(76176999)(229853002)(7696004)(25786009); DIR:OUT; SFP:1101; SCL:1; SRVR:BY2PR07MB1507; H:BY2PR07MB1505.namprd07.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-Network-Message-Id: 07e22538-db34-417c-4b79-08d522187908 X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Nov 2017 17:38:04.6220 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR07MB1507 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Nov 2017 17:38:08 -0000 Hi, Some issue with my mutt command line and hence apologies for unpleasant for= matting in the mail. Please see inline ________________________________ From: dev on behalf of Ananyev, Konstantin Sent: Thursday, November 2, 2017 9:22 PM To: Prathyusha, Guduri Cc: dev@dpdk.org; Jianbo.Liu@arm.com; guduriprathyusha@gmail.com; Kantecki,= Tomasz Subject: Re: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port group= ing > -----Original Message----- > From: Guduri Prathyusha [mailto:gprathyusha@caviumnetworks.com] > Sent: Thursday, November 2, 2017 3:34 PM > To: Ananyev, Konstantin > Cc: dev@dpdk.org; Jianbo.Liu@arm.com; guduriprathyusha@gmail.com; Kanteck= i, Tomasz > Subject: Re: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port gro= uping > > On Thu, Nov 02, 2017 at 02:46:43PM +0000, Ananyev, Konstantin wrote: > > Hi, > Hi > > > > > -----Original Message----- > > > From: Guduri Prathyusha [mailto:gprathyusha@caviumnetworks.com] > > > Sent: Thursday, November 2, 2017 2:31 PM > > > To: Kantecki, Tomasz > > > Cc: Jianbo.Liu@arm.com; guduriprathyusha@gmail.com; Ananyev, Konstant= in ; dev@dpdk.org; Guduri > > > Prathyusha > > > Subject: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port gro= uping > > > > > > With -f-strict-aliasing enabled by default from -O2, gcc > 5.x gives > > > undefined behavior in port_groupx4. 'pn' and 'pnum' are two different > > > pointers pointing to same chunk of memory and with -f-strict-aliasing= the > > > pointers are assumed to be pointing to different memory and compiler > > > reorders instructions that depend on pnum and pn. This breaks port > > > grouping algorithm. > > > > > > This patch eliminates the usage of union and uses memcpy for copying > > > gptbl[v].pnum to pn. memcpy when applied on built_in constant size do= es > > > not call its library implementation but uses appropriate LD and ST > > > instructions directly and hence no performance overhead. > > > > > > Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation") > > > Fixes: af1694d94bf1 ("examples/l3fwd: fix crash with gcc 5") > > > Signed-off-by: Guduri Prathyusha > > > --- > > > examples/l3fwd/l3fwd_neon.h | 11 +++-------- > > > examples/l3fwd/l3fwd_sse.h | 11 +++-------- > > > 2 files changed, 6 insertions(+), 16 deletions(-) > > > > > > diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.= h > > > index 4bc161394..10a602a04 100644 > > > --- a/examples/l3fwd/l3fwd_neon.h > > > +++ b/examples/l3fwd/l3fwd_neon.h > > > @@ -100,11 +100,6 @@ static inline uint16_t * > > > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1, > > > uint16x8_t dp2) > > > { > > > - union { > > > - uint16_t u16[FWDSTEP + 1]; > > > - uint64_t u64; > > > - } *pnum =3D (void *)pn; > > > - > > > int32_t v; > > > uint16x8_t mask =3D {1, 2, 4, 8, 0, 0, 0, 0}; > > > > > > @@ -117,9 +112,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *= lp, uint16x8_t dp1, > > > > > > /* if dest port value has changed. */ > > > if (v !=3D GRPMSK) { > > > - pnum->u64 =3D gptbl[v].pnum; > > > - pnum->u16[FWDSTEP] =3D 1; > > > - lp =3D pnum->u16 + gptbl[v].idx; > > > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum)); > > > + pn[FWDSTEP] =3D 1; > > > + lp =3D pn + gptbl[v].idx; > > > } > > > > > > return lp; > > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h > > > index 831760f02..79a71d77e 100644 > > > --- a/examples/l3fwd/l3fwd_sse.h > > > +++ b/examples/l3fwd/l3fwd_sse.h > > > @@ -98,11 +98,6 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uin= t16_t dst_port[FWDSTEP]) > > > static inline uint16_t * > > > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __= m128i dp2) > > > { > > > - union { > > > - uint16_t u16[FWDSTEP + 1]; > > > - uint64_t u64; > > > - } *pnum =3D (void *)pn; > > > - > > > int32_t v; > > > > > > dp1 =3D _mm_cmpeq_epi16(dp1, dp2); > > > @@ -114,9 +109,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *= lp, __m128i dp1, __m128i dp2) > > > > > > /* if dest port value has changed. */ > > > if (v !=3D GRPMSK) { > > > - pnum->u64 =3D gptbl[v].pnum; > > > - pnum->u16[FWDSTEP] =3D 1; > > > - lp =3D pnum->u16 + gptbl[v].idx; > > > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum)); > > > + pn[FWDSTEP] =3D 1; > > > + lp =3D pn + gptbl[v].idx; > > > > Could you explain a bit more here - which exactly instructions were reo= rdered > > and what kind of problems did it cause? > > Specially on IA? > > This issue is observed on ARM since ARM gcc is more aggressive in > reordering than x86 gcc. Ok, then if x86 is not affected why to modify l3fwd_sse.h at all? Unless there is a reproducible problem with x86 - my preference would be to keep that file intact. > In ARM when v !=3D GRPMSK, the following > instructions ordering is not guarenteed because of strict aliasing. > > lp[0] +=3D gptbl[v].lpv; > pnum->u64 =3D gptbl[v].pnum; > pnum->u16[FWDSTEP] =3D 1; > lp =3D pnum->u16 + gptbl[v].idx; Ok, so what in particular is reordered by the compiler: lp[0] +=3D gptbl[v].lpv; (1) pnum->u64 =3D gptbl[v].pnum; (2) pnum->u16[FWDSTEP] =3D 1; (3) lp =3D pnum->u16 + gptbl[v].idx; (4) (2) and (3)? If so I am not sure how it could be a problem: they do stores to the different locations. (1) and (4) as I can see shouldn't be reordered. Anyway - if you think this a compiler reordering issue, then adding rte_compiler_barrier() should fix the issue, right? [prathyusha] : Yes, adding a rte_compiler_barrier() fixes the issue in case of ARM. We think it is needed in x86 case also but If you still think to keep l3fwd_sse.h in tact then I will not modify l3fwd_sse.h in V2. But certainly change l3fwd_neon.h to use rte_compiler_barrier() Let me know what you prefer will spin a V2 accordingly. Thanks, Prathyusha > > That results in wrong lp[0] updation. > memcpy in this case will avoid this problem. > > > In any case I don't think using rte_memcpy is a good thing to use here: > > it is a huge inline function - way too much to copy just 64 bit variabl= e. > > I agree that rte_memcpy is overhead in this case but how about using > memcpy that will not use library implementation if the size is constant. > memcpy with constant size uses built_in_memcpy that does not add > performance overhead. On x86 rte_memcpy() doesn't call libc memcpy() at all - it is a separate fu= nction: ib/librte_eal/common/include/arch/x86/rte_memcpy.h > > Thoughts? As I said - if x86 is not affected - please keep l3fwd_sse.h intact. If it does (still not sure how) - check would compiler barrier help here. Konstantin