From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0080.outbound.protection.outlook.com [104.47.40.80]) by dpdk.org (Postfix) with ESMTP id 2113E1B5EE for ; Thu, 2 Nov 2017 16:34:08 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=cU93Qu7Xtpv6NJofkSh3u0ybV3UIqvW8B3mXLigoAEk=; b=CAOtd3Ky5ULFxgVoWNi5O/oXEOH2whjRGzruh7NT5gC9eOji1bjVNEICZ2Ip+UuxGS0SDqi++czTK0DfmQr58tT91q/CZMFUI3Vd0MBehq334EGe6nLme6gSRpwoQai5I8Te7aAl/b8uZikyp0WYHIFi14xrKR6CIPPvdNnlnCg= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Guduri.Prathyusha@cavium.com; Received: from cavium.com (111.93.218.67) by BY2PR07MB1505.namprd07.prod.outlook.com (10.162.77.13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.178.6; Thu, 2 Nov 2017 15:34:04 +0000 Date: Thu, 2 Nov 2017 21:03:36 +0530 From: Guduri Prathyusha To: "Ananyev, Konstantin" Cc: dev@dpdk.org, Jianbo.Liu@arm.com, guduriprathyusha@gmail.com, tomasz.kantecki@intel.com Message-ID: <20171102153327.GA24586@cavium.com> References: <20171102143114.24380-1-gprathyusha@caviumnetworks.com> <2601191342CEEE43887BDE71AB9772585FAB87F0@irsmsx105.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAB87F0@irsmsx105.ger.corp.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Originating-IP: [111.93.218.67] X-ClientProxiedBy: BM1PR01CA0080.INDPRD01.PROD.OUTLOOK.COM (10.174.208.148) To BY2PR07MB1505.namprd07.prod.outlook.com (10.162.77.13) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 46f3ed1f-cad2-4401-7a7e-08d52207273f X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603238); SRVR:BY2PR07MB1505; X-Microsoft-Exchange-Diagnostics: 1; BY2PR07MB1505; 3:OBbjCLRctCFCu1iIgfDcNdxQYu0BfHGUm7JWFQebrn6AI4ALhKwBR7RubVSx0dF90C0eSl3Oz1muIKF7s+sqpwOiFp/+Gwux4UNwxvxrHCSJCTpCW7rHlesTwQr5eITtVEI3JpCfF8GAKRzLaLWq9eCkhXt+Xpd04GAGrLLQnYgl/5bmmlKIi3YIof4d4OPggTZWCaXoI3PvApmwuNQhc8DZ7g+sSvRnPyZacea/z+bJk25krysqTIjt/JgVrJqr; 25:lxxsN+f0sVm6tRoKi40tIoipX1nn6J1a9w9mm51m8bPfN8B0bxioofFR9fj1z7cOGVoYRLYor5ebfMu048b+xdFmpIRic/ZMQQpXEQ3gkqyT8xK+yZVB9grqDofFvc57t8V5UQ3R265RwsFcXCBow4OhAEPfpfKbBhaGyR4lZX7FOA/Dakvdp4mIeF5nRkEHcVHZwHwiVLGx20CIz/T5IE6/+gZGCdScLH2JEV6CLedI8ot3vd8neeHtQ7xsZnPpFMthfLPdVtN7dhBkYFYutRmWjPB6hY93GUd7NUU3/P1CVF2JogQJyl8SUDdN/x5mA77PT49b6QgHzZBkjWApww==; 31:wsh0XnX+tasQBkO1beT4VEC55y2lIgi3rIxrKvhbyRWofhBxUOU3HRleGQqO7jCb0/kECl8B4zOKdffBk251ZS6nOV6zE0h0B6mRImEhTfnVLqNyawGXs/Yb1QLsxLdKo+JPTWbjqJ00zAY1B0/0ddV5EZ8lsTCGFIUU5J+Hpkx3YcIxRu0/JHDJLjadFUhZdzBSIkPVYkaLmlhP7Zht+h8FuPJeMFSNqCVYPTTL4Lk= X-MS-TrafficTypeDiagnostic: BY2PR07MB1505: X-Microsoft-Exchange-Diagnostics: 1; BY2PR07MB1505; 20:i/XXcwKxF1TSboDyVyra6AjTA4UfY2FHO28XH+558E8tqKMl7wF1jGQDxZQWfQHAk9d15Yae0YNJMXcRtxwPF0uWQvXmn51DSRg1clWbJnahRilfDtnLjRAlvIB1xBu9z2Fkq+UzyhjTA2VUzfa73yQ8p0w/Iq81OQTGpzYco/9rvU4qZnxW3SSdS9F4tshgN48/l6nYMrRbPPPAIlMqtMkgcxdSno+uExWaVPT0mdHjWPKyGoNnH3MWkPVyN81CzeCWNTnYI+hbedXMFWcbrgJksZaS9Z7NYb/CrYfLZkv59YGYO+gdoTrroIAwoGSfsEEp8HetbSqpR/etVTy/mHsj32Ojlo1TKx6tvrCDWTAwnyWj/LZlE8KVGHYyz0eGK32IjVIKriQSir1VK2OXMKJ+d2KpOecqIJTVTf5tPnqONqRdPPUiE6S1Gjex+8deBVLRJf9ngoHV82NiZIkIgqVriV4HhhOnTyJDznu9lYtfDecKkOmq2/KAlkmsoczi65kZuEyGTJfVxaG31pPhOhXjCGW1d28z6qcsIOSq1Q3X86YvhWanEly3uRKIC/JpDGXDNVdhMSBf5A4d7bW9N3/h9I7n574SngCAXwO25MI= X-Exchange-Antispam-Report-Test: UriScan:(180628864354917)(228905959029699); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(3002001)(100000703101)(100105400095)(3231020)(10201501046)(93006095)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123560025)(20161123564025)(20161123558100)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:BY2PR07MB1505; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:BY2PR07MB1505; X-Microsoft-Exchange-Diagnostics: 1; BY2PR07MB1505; 4:iZeK/5dR3e8BOvK60E77HUl6mMu24hxHwkCdlZSIi0a5jKAgMrQRAsItf9Ts0Gc/SG1rpHH/G9jDTJeKxrCSWvrPNa8n/zf2Mr1Pi0TiaYUBBSUt+BFEqU4uo/CpBWbH2SibrwUquG95IJ7Xgpq7GTSjVSUaBiYJk0OENWLPnUlaUzsI/0R4X9UPAXa1YLNbVCIXplijS+CN/LgkI4CvF4c0+Ruish7cBzDwRsTbUDoL1sPW5sSZJUISX5QOWarrM07B4bn9nCRy7F6gtrXUnoezJyCN1zFiBHOUP8rQBQGFW9xx39407um8gCVl0d+UIoZXB434XANesOWkMa4RSmgtLY+ztXG3KmTstsHvX8g= X-Forefront-PRVS: 047999FF16 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(6009001)(346002)(376002)(24454002)(13464003)(189002)(199003)(47776003)(54356999)(76176999)(105586002)(50466002)(101416001)(33656002)(106356001)(4326008)(8936002)(21086003)(81166006)(81156014)(478600001)(16586007)(8676002)(58126008)(316002)(83506002)(72206003)(7736002)(305945005)(5009440100003)(50986999)(66066001)(68736007)(53546010)(25786009)(55016002)(5660300001)(9686003)(36756003)(16526018)(189998001)(23726003)(53936002)(69596002)(3846002)(6116002)(1076002)(39060400002)(97736004)(229853002)(6506006)(2906002)(42882006)(6666003)(6246003)(2950100002)(6916009)(18370500001); DIR:OUT; SFP:1101; SCL:1; SRVR:BY2PR07MB1505; H:cavium.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BY2PR07MB1505; 23:EK7TA4sSn1wiYxHw+B8iWOTr5rvcCP4D+74wlEwAi?= =?us-ascii?Q?xOrlwmSo/U8b9DkxQhgISSaUk78y8HKrrGB5sw7/QA6PUjkIxvM73mNLxXtS?= =?us-ascii?Q?XYGGmCsdBYCCn79LqayFII2hgR62h/oiMg2NYodARJQeVe7SqV/JM4dUqcQ+?= =?us-ascii?Q?DyBpIVQ2p9WM+heNPjbroBLoP1RQCqFqSeTVmQHvILHC2qggY0dZrMnsSxsv?= =?us-ascii?Q?SAWRi2SHf5rK5SpcE0R/7RUtfJS5nMn+NaMwf+dznpojbsKdMcmZKZT9EQ8p?= =?us-ascii?Q?SpMu1LiS/QeYQnBK9iUwALlFAFKbn0iVYz2MsPTdK9HrCUb+pe82caZ5qfcL?= =?us-ascii?Q?cuFpiB/uLUOESfVdpI14RygUHI6XGfGs0WVVLizhQLsyIfWAbmwmNzaySsiR?= =?us-ascii?Q?JHkSgRkrzKdAwUdaMqAZNeweKPWUr28J4oKBzi8aeoqc4B//IUaLsSlTNw+9?= =?us-ascii?Q?BtIMy+f7TPUbqFivgZDJKIYfen4e406KMfhhs0nfHMXcOghlp1B2xhEj8hwx?= =?us-ascii?Q?9Qo5pHw9L6nq4pePjgUZFPO8sGkEl6MSLUIaKCYYUUz8sojX5xsygTIj+9W5?= =?us-ascii?Q?/UuLX8VECuA1rBTfH4TTbkSkNUBRVtXtXJ6G3ZXcxFs8jkE2wyBWJhF86OF4?= =?us-ascii?Q?K75WGH+M3z0l93ueuYIKIkleu+rG1K66Rem/AwVoM1/jpM+COQqSS9P2Ayiy?= =?us-ascii?Q?GY+GRJSNGT1ekf53kZbLpVzt1t9EIF58AvJi82g0SebTMptLCUkJ6WakpIME?= =?us-ascii?Q?w5a1UnLYnTtPC0GiwsZqQcazt5sfMOMkbxRX64UG5n9IWRx9bAJxXk3kyfEr?= =?us-ascii?Q?qX7VTmWMnCsbepB5/+0duerWDBaI+pc9Q1eWrQI+RtQOjBNipBQmq75JKIkG?= =?us-ascii?Q?e5F6/3NJsNoI3dBm8Ns8LO2/6vBLstDUjq6NjqMZBh6h2IvUjh5WlXAw6bcE?= =?us-ascii?Q?lcCLSlY0H2btP8kNdvyJbJm8GuWOBjADWDgRPymIV37I+3x39LhEhpsvGf0+?= =?us-ascii?Q?Y+muhwu8dPXBzRoT3aDcl15aXyff+SbR/9bjotHY/RGJwqvNmXc1vwy6wgZF?= =?us-ascii?Q?mVNL91Y+OKENE+A/T0XzM2TZ6hUFKd+fp2Ocujpn2Fok0ylMVfzjX1/rScy+?= =?us-ascii?Q?9IITraWmByzqe+TSXEH+UgA1/UMAs7+pdkH14usJdDS4YYrqOGXXrbWwwBVq?= =?us-ascii?Q?yHp3ci3pY6TPglUdt3wgeG92WVr9ASAk7bM4apJJRix7FRYfnTTjkWhD2hzi?= =?us-ascii?Q?7X0xdvzx7mFOBQuDadzwizzaWhF1B2iGca6xrSDKK/jHpIMNXpkWBZlNbqvZ?= =?us-ascii?Q?XTOSUFedPq0u6UiBTKuSHAMhgbxqAg2CnWq9mQXS/IdqlVToUnCfgVlMWdHg?= =?us-ascii?Q?sHasw=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; BY2PR07MB1505; 6:mjeccNzmyQxFEdElKHH3ujJbC4/f+jWbwimzimhpb53GDecEHvs4Nf53ziT1X58hPpzcBTui2OnrJnAxeTOjfpSeOOxlKde3asXwruK3GOA4r2YZuWf80Go43mxmRVvSuOWmPRn2ClRW5xUpiofoNfm9RRE6blJ97KW9kOpMahpXFhsCMagt0rPVvupvnRV9xP6mgkCaQkhaE4c8Bqti38YZsNjlZUs2odbLiPos1+EqC4HReT/Uua4Z1LpcScQkuc7GmxnBLjjIt/BNcgKOrvQ+7k4niFx2z+FjCTRoNmxXTRHYVWmrfkv/1F61tSM/VEsYTWWzQDaOhEmLIkgsJQoBSD5tk1WaRbyTTrH7dpk=; 5:3SrOXY0hbvRYrJUpuBhEoTzxIOvJP9InQGJyLIhYFlkqDPKa8GRGuhbLdHFbVrhyuCUyTYUgdsYHrir+ZQrdqx9mpXn0H5rnf5wShRjZhNupqA/snn+8WX32a3YCK6egJs9QHeIIsfSgtKBsIVOQtDt8XxmOm0zvrhTTRaHNoyc=; 24:KKV7b/yb+ql+cnldgDDHaPGJPEDXmqDoiXWG0A7JSepgaD3LtyfPoM61aqNYP82k+HcsFGcUPAG/GmYPsEb1+aVSIXrhn4q8jZpsOvzo6ao=; 7:dEsDATAq8p+beOji7+Py9LY/KHQJnTdZF1Obn0JIT55IGGh12F8bH5aHTq9ttE9sUUSUZF9Ob7ED6CVW916Jr9oerrFaxflBLeiSo02lDbj262JMjqhT02/wa/6PL7FlDyAT/aSlQTbl54v7B+e16VAs5ubb0fGWjlY+1Ob5AioIu3/Hi5hPt3ouk1riYgN532hirAOY2ClDO92j7D3ouC0+0oCm520He7FDEb/SRplcn7R/eKtf3u8+9UmDO+yX SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Nov 2017 15:34:04.3254 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 46f3ed1f-cad2-4401-7a7e-08d52207273f X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR07MB1505 Subject: Re: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Nov 2017 15:34:09 -0000 On Thu, Nov 02, 2017 at 02:46:43PM +0000, Ananyev, Konstantin wrote: > Hi, Hi > > > -----Original Message----- > > From: Guduri Prathyusha [mailto:gprathyusha@caviumnetworks.com] > > Sent: Thursday, November 2, 2017 2:31 PM > > To: Kantecki, Tomasz > > Cc: Jianbo.Liu@arm.com; guduriprathyusha@gmail.com; Ananyev, Konstantin ; dev@dpdk.org; Guduri > > Prathyusha > > Subject: [dpdk-dev] [PATCH ] examples/l3fwd: fix aliasing in port grouping > > > > With -f-strict-aliasing enabled by default from -O2, gcc > 5.x gives > > undefined behavior in port_groupx4. 'pn' and 'pnum' are two different > > pointers pointing to same chunk of memory and with -f-strict-aliasing the > > pointers are assumed to be pointing to different memory and compiler > > reorders instructions that depend on pnum and pn. This breaks port > > grouping algorithm. > > > > This patch eliminates the usage of union and uses memcpy for copying > > gptbl[v].pnum to pn. memcpy when applied on built_in constant size does > > not call its library implementation but uses appropriate LD and ST > > instructions directly and hence no performance overhead. > > > > Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation") > > Fixes: af1694d94bf1 ("examples/l3fwd: fix crash with gcc 5") > > Signed-off-by: Guduri Prathyusha > > --- > > examples/l3fwd/l3fwd_neon.h | 11 +++-------- > > examples/l3fwd/l3fwd_sse.h | 11 +++-------- > > 2 files changed, 6 insertions(+), 16 deletions(-) > > > > diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h > > index 4bc161394..10a602a04 100644 > > --- a/examples/l3fwd/l3fwd_neon.h > > +++ b/examples/l3fwd/l3fwd_neon.h > > @@ -100,11 +100,6 @@ static inline uint16_t * > > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1, > > uint16x8_t dp2) > > { > > - union { > > - uint16_t u16[FWDSTEP + 1]; > > - uint64_t u64; > > - } *pnum = (void *)pn; > > - > > int32_t v; > > uint16x8_t mask = {1, 2, 4, 8, 0, 0, 0, 0}; > > > > @@ -117,9 +112,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1, > > > > /* if dest port value has changed. */ > > if (v != GRPMSK) { > > - pnum->u64 = gptbl[v].pnum; > > - pnum->u16[FWDSTEP] = 1; > > - lp = pnum->u16 + gptbl[v].idx; > > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum)); > > + pn[FWDSTEP] = 1; > > + lp = pn + gptbl[v].idx; > > } > > > > return lp; > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h > > index 831760f02..79a71d77e 100644 > > --- a/examples/l3fwd/l3fwd_sse.h > > +++ b/examples/l3fwd/l3fwd_sse.h > > @@ -98,11 +98,6 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP]) > > static inline uint16_t * > > port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2) > > { > > - union { > > - uint16_t u16[FWDSTEP + 1]; > > - uint64_t u64; > > - } *pnum = (void *)pn; > > - > > int32_t v; > > > > dp1 = _mm_cmpeq_epi16(dp1, dp2); > > @@ -114,9 +109,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2) > > > > /* if dest port value has changed. */ > > if (v != GRPMSK) { > > - pnum->u64 = gptbl[v].pnum; > > - pnum->u16[FWDSTEP] = 1; > > - lp = pnum->u16 + gptbl[v].idx; > > + rte_memcpy(pn, &gptbl[v].pnum, sizeof(gptbl[v].pnum)); > > + pn[FWDSTEP] = 1; > > + lp = pn + gptbl[v].idx; > > Could you explain a bit more here - which exactly instructions were reordered > and what kind of problems did it cause? > Specially on IA? This issue is observed on ARM since ARM gcc is more aggressive in reordering than x86 gcc. In ARM when v != GRPMSK, the following instructions ordering is not guarenteed because of strict aliasing. lp[0] += gptbl[v].lpv; pnum->u64 = gptbl[v].pnum; pnum->u16[FWDSTEP] = 1; lp = pnum->u16 + gptbl[v].idx; That results in wrong lp[0] updation. memcpy in this case will avoid this problem. > In any case I don't think using rte_memcpy is a good thing to use here: > it is a huge inline function - way too much to copy just 64 bit variable. I agree that rte_memcpy is overhead in this case but how about using memcpy that will not use library implementation if the size is constant. memcpy with constant size uses built_in_memcpy that does not add performance overhead. Thoughts? > Konstantin > > > } > > > > return lp; > > -- > > 2.14.1 >