From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 7DEB84F91; Sun, 11 Nov 2018 15:15:42 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Nov 2018 06:15:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,492,1534834800"; d="scan'208";a="95417055" Received: from irsmsx106.ger.corp.intel.com ([163.33.3.31]) by FMSMGA003.fm.intel.com with ESMTP; 11 Nov 2018 06:15:39 -0800 Received: from irsmsx156.ger.corp.intel.com (10.108.20.68) by IRSMSX106.ger.corp.intel.com (163.33.3.31) with Microsoft SMTP Server (TLS) id 14.3.408.0; Sun, 11 Nov 2018 14:15:38 +0000 Received: from irsmsx106.ger.corp.intel.com ([169.254.8.87]) by IRSMSX156.ger.corp.intel.com ([169.254.3.140]) with mapi id 14.03.0415.000; Sun, 11 Nov 2018 14:15:38 +0000 From: "Ananyev, Konstantin" To: Thomas Monjalon , "Yigit, Ferruh" , "Richardson, Bruce" CC: "stable@dpdk.org" , "Wiles, Keith" , Yongseok Koh , "dev@dpdk.org" , Shahaf Shuler , "Burakov, Anatoly" , "justin.parus@microsoft.com" , "christian.ehrhardt@canonical.com" , "david.coronel@canonical.com" , "josh.powers@canonical.com" , "jay.vosburgh@canonical.com" , "dan.streetman@canonical.com" Thread-Topic: [dpdk-stable] AVX512 bug on SkyLake Thread-Index: AQHUd3wLL0bTUyiuoEWNUynzlq2gYqVIRviAgAJRWyA= Date: Sun, 11 Nov 2018 14:15:37 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258010CE46DE0@IRSMSX106.ger.corp.intel.com> References: <20181023212318.43082-1-yskoh@mellanox.com> <432F92CE-5714-45DC-B72F-CD8771DAFC89@intel.com> <1612642.At0RDolh7h@xps> <15598804.PupyfcORSR@xps> In-Reply-To: <15598804.PupyfcORSR@xps> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZmM3Njk5OTEtYmI4OS00NDhmLThiOWItY2IzOWQyM2UzNWU0IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoibDVMUGhWQUxxUks5ZmtmN3FlckY0dnh4ODR2Q1RnMFllY0lGb0ZCcG5KbkMzaGxnQ0JCQWR2V1l0YzVhR1gxNSJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2018 14:15:43 -0000 Hi Thomas, >=20 > Below is my conclusion for this bug. > An expert of x86 is required to follow-up. >=20 > Summary: > - CPU: Intel Skylake > - Linux environment: Ubuntu 18.04 > - Compiler: GCC 7 or 8 > - Scenario: testpmd crashes when it starts forwarding > - Behaviour: AVX2 version of rte_memcpy() fails if optimized for AVX512 > - Context: inline rte_memcpy() is called from > inline rte_mempool_put_bulk(), called from > mlx5_tx_complete() (inline or not) > - Analysis: AVX512 optimization changes vmovdqu to vmovdqu8 >=20 > Latest status can be found in Bugzilla: > https://bugs.dpdk.org/show_bug.cgi?id=3D97#c35 Looking at dissamled output from the bug report, it seems that the problem is not in vmovdqu8 instruction itself, but in the wrong offsets generated by the compiler: vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2] vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1 vmovups XMMWORD PTR [rsi+0x20],xmm0 vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1 vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4] vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1 vmovups XMMWORD PTR [rsi+0x40],xmm0 vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1 vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6] Should be: vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x20] I think. Same for next two offsets: 0x4 and 0x6 respectively should be 0x40 and 0x60= . Not sure what causing compiler behaves that way. BTW, looking though testpmd objdump output - it seems that only mlx5 driver exhibits such problem (I didn't enable mlx4 actually, probably same problem= here). Which looks a bit weird to me. Konstantin