From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) by dpdk.org (Postfix) with ESMTP id B8E35288C; Sun, 11 Nov 2018 19:15:07 +0100 (CET) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 5258E20FDC; Sun, 11 Nov 2018 13:15:07 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Sun, 11 Nov 2018 13:15:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=mesmtp; bh=oMYBvHlw7w8F22KEFe5KRbt2bDopyQBY8wUutEI/4jw=; b=lqUc+J2pFI6d hLxQ98N3uN6I1nnN5JAFbaojmdfIl/RZVGpIiv1/1DTR/If+/9VOCkepq5StYU2P 0f/Pu8tsEzd3ggGFcz/7jKl3M2yhh9wIQNXSeAmNmsgy99XAL1C0rNP+oZvC3sfD JFvau0d/bO8Yev5u1iDNOtn8EiSDopk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=oMYBvHlw7w8F22KEFe5KRbt2bDopyQBY8wUutEI/4 jw=; b=HQhG2uEAf7jMuCNGjdsoA9Pq/HaK2wNgq7uESt5RFwFPFkA+nH8yGIxRq LepgvStVNFxpCQLTet2c7SlkqF0taakBm9WBR+f8pKYqnXYfpV0fItm4ShQzKznL A17kU3aITeWEChiYaaux9ezcOo6EAWg5u22NT1JItF3hIGK+6mokOxFJkVJw4dUK pILlLIrb7r1Arviaje7oovM6ZNViulI//w7trTpBW+LgDvnMOYsdL5PTe3lrIR4+ TunNABILlxoTdkML0n3yLJH7h4RNa9/YfCQ6F+bDzgxE1vRoXGQLeDT2I3q73FWF sfRzJllGoONtH9UFeAxupz2LdYNKQ== X-ME-Sender: X-ME-Proxy: Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184]) by mail.messagingengine.com (Postfix) with ESMTPA id 59EE6102F0; Sun, 11 Nov 2018 13:15:04 -0500 (EST) From: Thomas Monjalon To: "Ananyev, Konstantin" Cc: "Yigit, Ferruh" , "Richardson, Bruce" , "stable@dpdk.org" , "Wiles, Keith" , Yongseok Koh , "dev@dpdk.org" , Shahaf Shuler , "Burakov, Anatoly" , "justin.parus@microsoft.com" , "christian.ehrhardt@canonical.com" , "david.coronel@canonical.com" , "josh.powers@canonical.com" , "jay.vosburgh@canonical.com" , "dan.streetman@canonical.com" Date: Sun, 11 Nov 2018 19:15:02 +0100 Message-ID: <14171327.rkP5k0YJWv@xps> In-Reply-To: <2601191342CEEE43887BDE71AB977258010CE46DE0@IRSMSX106.ger.corp.intel.com> References: <20181023212318.43082-1-yskoh@mellanox.com> <15598804.PupyfcORSR@xps> <2601191342CEEE43887BDE71AB977258010CE46DE0@IRSMSX106.ger.corp.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2018 18:15:08 -0000 11/11/2018 15:15, Ananyev, Konstantin: > Hi Thomas, > > > Below is my conclusion for this bug. > > An expert of x86 is required to follow-up. > > > > Summary: > > - CPU: Intel Skylake > > - Linux environment: Ubuntu 18.04 > > - Compiler: GCC 7 or 8 > > - Scenario: testpmd crashes when it starts forwarding > > - Behaviour: AVX2 version of rte_memcpy() fails if optimized for AVX512 > > - Context: inline rte_memcpy() is called from > > inline rte_mempool_put_bulk(), called from > > mlx5_tx_complete() (inline or not) > > - Analysis: AVX512 optimization changes vmovdqu to vmovdqu8 > > > > Latest status can be found in Bugzilla: > > https://bugs.dpdk.org/show_bug.cgi?id=97#c35 > > > Looking at dissamled output from the bug report, it seems that the > problem is not in vmovdqu8 instruction itself, but in the wrong offsets > generated by the compiler: > > vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2] > vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1 > vmovups XMMWORD PTR [rsi+0x20],xmm0 > vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1 > vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4] > vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1 > vmovups XMMWORD PTR [rsi+0x40],xmm0 > vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1 > vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6] > > Should be: > vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x20] > I think. > > Same for next two offsets: 0x4 and 0x6 respectively should be 0x40 and 0x60. Yes, you're right, I missed it, thank you! The full diff is below: --- bad-avx512-enabled +++ good-avx512-disabled - vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x0] + vmovdqu xmm0,XMMWORD PTR [rax*8+0x0] vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x10],0x1 vmovups XMMWORD PTR [rsi],xmm0 vextracti128 XMMWORD PTR [rsi+0x10],ymm0,0x1 - vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2] + vmovdqu xmm0,XMMWORD PTR [rax*8+0x20] vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1 vmovups XMMWORD PTR [rsi+0x20],xmm0 vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1 - vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4] + vmovdqu xmm0,XMMWORD PTR [rax*8+0x40] vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1 vmovups XMMWORD PTR [rsi+0x40],xmm0 vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1 - vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6] + vmovdqu xmm0,XMMWORD PTR [rax*8+0x60] vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x70],0x1 vmovups XMMWORD PTR [rsi+0x60],xmm0 vextracti128 XMMWORD PTR [rsi+0x70],ymm0,0x1 > Not sure what causing compiler behaves that way. > BTW, looking though testpmd objdump output - it seems that only mlx5 driver > exhibits such problem (I didn't enable mlx4 actually, probably same problem here). > Which looks a bit weird to me. Yes it's weird. I don't see how the mlx5 code could influence the compiler to generate this bad code in AVX512 mode.