From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from youngberry.canonical.com (youngberry.canonical.com [91.189.89.112]) by dpdk.org (Postfix) with ESMTP id 2094649E0 for ; Fri, 9 Nov 2018 07:27:34 +0100 (CET) Received: from mail-ed1-f72.google.com ([209.85.208.72]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gL0GT-0004AE-Uj for stable@dpdk.org; Fri, 09 Nov 2018 06:27:33 +0000 Received: by mail-ed1-f72.google.com with SMTP id n18-v6so684493edt.3 for ; Thu, 08 Nov 2018 22:27:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=sx2ZtvJo7hIpynpi+bK0cKWpwX7yMZ3hXp6cCDIiYU0=; b=tXzlfIR68f43lYlfZcutrFQwtVnnx41vzVSVWHw7j7AS0CrgrjRZgwXZBLwWZPuOsd K3Yd6D5J3dezzxpVQJok6fnCYzcFShG170cE1NXAzMpgLyACEtPqgvXwuailvJWmPZqy h2vpEmmd2QLSsiu4OGBTKg35a1qvRlNBw/SitXG+OqEAGw8atrX1wnVbdYZ5B9B1akVw kDEOgwqdgjHkRBIxIfnq54WqcGVrjnayUsHycJZNjMzVSZl7RmlLKvNhrqt5zaFTVoAp HCBs9nqINFctHxWuKr4HSuC0A5yoBYHg6OwX9FuWtDOo7TKnoEheENcm2QIrIiPI2HOO reHA== X-Gm-Message-State: AGRZ1gItPbZe47uK/W6rXVplMwUZgwHKHO5oTSBWmRocbwef2BtU7x13 jzJVZ/0qeCWWF6Pcye7bIH0tP3SY1VMRJ5fulW5qacfvpIecM5Xfvq1R8Cd17DeDidfklZXtLHC XE1yT993zX50o0BBYr7eo/CvAy71aAocCgL/NpS1H X-Received: by 2002:a50:a824:: with SMTP id j33-v6mr1217027edc.230.1541744853392; Thu, 08 Nov 2018 22:27:33 -0800 (PST) X-Google-Smtp-Source: AJdET5e1BEJyxd4bLkabVFXMt+dtit9FrisCHe7p43Fxb7sDuMYE0NqLCHk8tMsKWAQDBaMhG1REbjjZLEy41iVQA2s= X-Received: by 2002:a50:a824:: with SMTP id j33-v6mr1217009edc.230.1541744853095; Thu, 08 Nov 2018 22:27:33 -0800 (PST) MIME-Version: 1.0 References: <20181023212318.43082-1-yskoh@mellanox.com> <432F92CE-5714-45DC-B72F-CD8771DAFC89@intel.com> <1612642.At0RDolh7h@xps> <9d3f48fc-5a47-c813-1da8-7e1cab6bdd9e@intel.com> In-Reply-To: From: Christian Ehrhardt Date: Fri, 9 Nov 2018 07:27:06 +0100 Message-ID: To: yskoh@mellanox.com Cc: Ferruh Yigit , Thomas Monjalon , keith.wiles@intel.com, dev , Bruce Richardson , Shahaf Shuler , "Ananyev, Konstantin" , anatoly.burakov@intel.com, stable@dpdk.org, justin.parus@microsoft.com, David Coronel , Josh Powers , Jay Vosburgh , Dan Streetman Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-stable] AVX512 bug on SkyLake X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Nov 2018 06:27:34 -0000 On Fri, Nov 9, 2018 at 12:01 AM Yongseok Koh wrote: > > > > On Nov 8, 2018, at 9:21 AM, Ferruh Yigit wrote= : > > > > On 11/8/2018 3:59 PM, Thomas Monjalon wrote: > >> Hi, > >> > >> We need to gather more information about this bug. > >> More below. > >> Thanks Thomas for looping us in! > >> 07/11/2018 10:04, Wiles, Keith: > >>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh wrote: > >>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith wr= ote: > >>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh wrot= e: > >>>>>> > >>>>>> This is a workaround to prevent a crash, which might be caused by > >>>>>> optimization of newer gcc (7.3.0) on Intel Skylake. > >>>>> > >>>>> Should the code below not also test for the gcc version and > >>>>> the Sky Lake processor, maybe I am wrong but it seems it is > >>>>> turning AVX512 for all GCC builds > >>>> > >>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 = is newly up since then (gcc 8.2). > >>>> Also, I wasn't able to test every gcc versions and I wanted to be a = bit conservative for this crash. > >>>> Performance drop (if any) by disabling a new (experimental) feature = would be less risky than unaccountable crash. > >>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=3D= n. Please refer to v3. > >>> > >>> Are you not turning off all of the GCC versions for AVX512. > >>> And you can test for range or greater then GCC version and > >>> it just seems like we are turning off every gcc version, is that true= ? > >> > >> Do we know exactly which GCC versions are affected? > >> > >>>>> Also bug 97 seems a bit obscure reference, maybe you know > >>>>> the bug report, but more details would be good? > >>>> > >>>> I sent out the report to dev list two month ago. > >>>> And I created the Bug 97 in order to reference it > >>>> in the commit message. > >>>> I didn't want to repeat same message here and there, > >>>> but it would've been better to have some sort of summary > >>>> of the Bug, although v3 has a few more words. > >>>> However, v3 has been merged. > >>> > >>> Still this is too obscure if nothing else give a link to > >>> a specific bug not just 97. > >> > >> The URL is > >> https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&data=3D02%7C01%7Cyskoh%40mel= lanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f= 461b%7C0%7C0%7C636772945282345908&sdata=3D2o%2Fg203aWrKCYg16S6oI4BcS41i= gpLu1DloS%2FrRnknc%3D&reserved=3D0 > >> The bug is also pointing to an email: > >> https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&data= =3D02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652= 971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&sdata=3DNCFKx= aREd69iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&reserved=3D0 > >> > >> Summary: > >> - CPU: Intel Skylake > >> - Linux environment: Ubuntu 18.04 > >> - Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3) > > > > Is it possible to test a few other gcc versions to check if the issue i= s > > specific to this compiler version? > > Nothing's impossible but even with my quick search in gcc.gnu.org, > I could find the following documents mention mavx512f support: > > GCC 4.9.0 > April 22, 2014 (changes, documentation) > > GCC 5.1 > April 22, 2015 (changes, documentation) > > GCC 6.4 > July 4, 2017 (changes, documentation) > > GCC 7.1 > May 2, 2017 (changes, documentation) > > GCC 8.1 > May 2, 2018 (changes, documentation) > > We altogether have to put quite large resource to verify all of the versi= ons. > > I assumed older than gcc 7 would have the same issue. I know it was a spe= culation > but like I mentioned I wanted to be more conservative. I didn't mean this= is a permanent fix. > For two months, we couldn't have any tangible solution (actually nobody c= ared including myself), > so I submitted the patch to temporarily disable mavx512f. > > I'm still not sure what the best option is... > What I wonder in all of this as I don't understand that part of it yet is t= his. I assume you are building on Ubuntu as that is your gcc reference. FYI: as people asked for bug references, there also is [1] which seems pretty much the same issue. It builds with mostly defaults, that means per mk/machine/default/rte.vars.mk and similar it sets -march=3Dcorei7 But when I look at what that implies all avx512 is disabled $ gcc -Q --help=3Dtarget -m64 -march=3Dcorei7 | grep avx512f -mavx512f [disabled] So I wonder what/why -mno-avx512f should help at all. I used the full list of gcc args we have for the build (e.g. [2] of a 18.05 build), but that doesn't change that (mostly -W, -I and -D). So I wonder, did people do a custom build and bump up march or enable -mavx512f on their own to hit that? Or are we facing a real gcc issue where " -mavx512f [disabled]" is not the same as -mno-avx512f ? Maybe someone who hit the bug could clarify that please? BTW: per reports I've seen it also seems to apply to the latest compiler update of the same series - at least it was said to be fully updated, that would be 7.3.0-27ubuntu1~18.04 But this is 2nd grade information as I don't have a system with the right combo MLX5+Skylake available atm, so I can't confirm for sure :-/ [1]: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1799397 [2]: https://launchpadlibrarian.net/373589345/buildlog_ubuntu-bionic-amd64.= dpdk_18.05-1~ubuntu0.18.04.1_BUILDING.txt.gz