From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id C4D6293BE for ; Fri, 13 Nov 2015 11:35:09 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP; 13 Nov 2015 02:35:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,286,1444719600"; d="scan'208";a="684545578" Received: from irsmsx109.ger.corp.intel.com ([163.33.3.23]) by orsmga003.jf.intel.com with ESMTP; 13 Nov 2015 02:35:07 -0800 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.203]) by IRSMSX109.ger.corp.intel.com ([169.254.13.96]) with mapi id 14.03.0248.002; Fri, 13 Nov 2015 10:35:06 +0000 From: "Ananyev, Konstantin" To: "harish.patil@qlogic.com" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH] l3fwd: Fix l3fwd crash due to unaligned load/store intrinsics Thread-Index: AQHRGl1eBFJeGDtGpUGJXNk/itcaZJ6Zwy2w Date: Fri, 13 Nov 2015 10:35:06 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836AC87E4@irsmsx105.ger.corp.intel.com> References: <1447011596-2993-1-git-send-email-harish.patil@qlogic.com> In-Reply-To: <1447011596-2993-1-git-send-email-harish.patil@qlogic.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] l3fwd: Fix l3fwd crash due to unaligned load/store intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Nov 2015 10:35:10 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of harish.patil@qlogic.= com > Sent: Sunday, November 08, 2015 7:40 PM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH] l3fwd: Fix l3fwd crash due to unaligned load/= store intrinsics >=20 > From: Harish Patil >=20 > l3fwd app expects PMDs to return packets whose L2 header is > 16-byte aligned due to usage of _mm_load_si128()/_mm_store_si128() > intrinsics in the app. However, most of the protocol stacks expects > packets such that its IP/L3 header be aligned on a 16-byte boundary. >=20 > Based on the recommendations received on dpdk-dev, we are changing > the l3fwd app to use _mm_loadu_si128()/_mm_loadu_si128() so that the > address need not be 16-byte aligned and thereby preventing crash. > We have tested that there is no performance impact due to this > change. >=20 > Signed-off-by: Harish Patil > --- Acked-by: Konstantin Ananyev As a side notice: In fact with gcc build I do see a slight regression: ~1% for 4 ports over 1 core test-case. Though I think the problem is not in the patch itself. By some, unknown to me reason, gcc treats aligned and unaligned load/store instrincts in a different way (at least for that particular case). With aligned load/store in use it generates code that is pretty close to th= e source: 4 loads first, then 4 BLENDs, then 4 stores (with some interfering scala= r instructions of course). But with unaligned ones gcc starts to mix loads and blends for the same re= gister, so now it is: load x0; blend x0; load x1; blend x1; ..=20 As if the source code was: te[0] =3D _mm_loadu_si128(p[0]); te[0] =3D _mm_blend_epi16(te[0], ve[0], MASK_ETH); te[1] =3D _mm_loadu_si128(p[1]); te[1] =3D _mm_blend_epi16(te[1], ve[1], MASK_ETH); =20 ... So load latency is not hidden any more. I tried it with different versions of - same story for all of them. Clang doesn't have such issue and generates similar code for both aligned and unaligned instrincts.=20 The only way to fix it I can think about - put rte_compiler_barrier() jus= t before the first blend instinct. That helped, now there are no noticeable differences in generated code and = results before and after the patch. So I suppose, I'll have to submit a patch after yours one to fix that prob= lem. Konstantin