From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C016CA0553; Fri, 10 Jun 2022 15:12:21 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B24B64069C; Fri, 10 Jun 2022 15:12:21 +0200 (CEST) Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by mails.dpdk.org (Postfix) with ESMTP id 7781A40689 for ; Fri, 10 Jun 2022 15:12:20 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654866740; x=1686402740; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=2vyqQwWCnPae3aKjuRizQQuAS0JXqofLjPUqo3ErEGw=; b=iU02bLtcQk9LqNTNXhIztvywo6XNnth/I2/GbRzMY5gVwofzjLy6+muq 63RHuwn64I92qLqZVogd3EUUyGDmNgDZw88qOKKbxr+An2yBJJpIsLb+Z XXVLMOjRzUfomKMM7lr+tixKeii4BQagKqnvzUz2NWbYD51/wtzFoWvY5 sK9lmrOq/epAuOr85kcCNrJasrmxSUbKdU5Di+JJkJhtjpAGJPfZD0wVw zeRYb3w9qawRtF5ZUiYSyQP/nar0joFihwIOLPLJ9GEsSVyOlXGOugbOu Gc+u3w9ePhA8T1nhIv+ZyLjifUu5QpBbouObioNnF7Hq6mM5YzXfUWl+p A==; X-IronPort-AV: E=McAfee;i="6400,9594,10373"; a="339376820" X-IronPort-AV: E=Sophos;i="5.91,290,1647327600"; d="scan'208";a="339376820" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2022 06:12:19 -0700 X-IronPort-AV: E=Sophos;i="5.91,290,1647327600"; d="scan'208";a="908908281" Received: from bricha3-mobl.ger.corp.intel.com ([10.55.133.106]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 10 Jun 2022 06:12:17 -0700 Date: Fri, 10 Jun 2022 14:12:13 +0100 From: Bruce Richardson To: Timothy McDaniel Cc: jerinj@marvell.com, dev@dpdk.org, Kent Wires Subject: Re: [PATCH v5] event/dlb2: add support for single 512B write of 4 QEs Message-ID: References: <20220610123544.2332492-1-timothy.mcdaniel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220610123544.2332492-1-timothy.mcdaniel@intel.com> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, Jun 10, 2022 at 07:35:44AM -0500, Timothy McDaniel wrote: > On Xeon, 512b accesses are available, so movdir64 instruction is able to > perform 512b read and write to DLB producer port. In order for movdir64 > to be able to pull its data from store buffers (store-buffer-forwarding) > (before actual write), data should be in single 512b write format. > This commit add change when code is built for Xeon with 512b AVX support > to make single 512b write of all 4 QEs instead of 4x64b writes. > > Signed-off-by: Timothy McDaniel > Acked-by: Kent Wires > === > > Changes since V4: > 1) Add build-time control for avx512 support to meson.buildi, based > on implementation found in lib/acl/meson.build > 2) Add rte_vect_get_max_simd_bitwidth runtime check before using > avx512 instructions > Thanks, these changes look better for runtime support. Some further more minor comments inline below. /Bruce > Changes since V3: > 1) Renamed dlb2_noavx512.c to dlb2_sve.c, and fixed up meson.build > for new file name. > > Changes since V1: > 1) Split out dlb2_event_build_hcws into two implementations, one > that uses AVX512 instructions, and one that does not. Each implementation > is in its own source file in order to avoid build errors if the compiler > does not support the newer AVX512 instructions. > 2) Update meson.build to and pull in appropriate source file based on > whether the compiler supports AVX512VL > 3) Check if target supports AVX512VL, and use appropriate implementation > based on this runtime check. > --- > drivers/event/dlb2/dlb2.c | 208 +----------------------- > drivers/event/dlb2/dlb2_avx512.c | 267 +++++++++++++++++++++++++++++++ > drivers/event/dlb2/dlb2_priv.h | 10 ++ > drivers/event/dlb2/dlb2_sve.c | 219 +++++++++++++++++++++++++ > drivers/event/dlb2/meson.build | 53 ++++++ > 5 files changed, 556 insertions(+), 201 deletions(-) > create mode 100644 drivers/event/dlb2/dlb2_avx512.c > create mode 100644 drivers/event/dlb2/dlb2_sve.c > > diff --git a/drivers/event/dlb2/meson.build b/drivers/event/dlb2/meson.build > index f963589fd3..58146e8aef 100644 > --- a/drivers/event/dlb2/meson.build > +++ b/drivers/event/dlb2/meson.build > @@ -19,6 +19,59 @@ sources = files( > 'dlb2_selftest.c', > ) > > +# compile AVX512 version if: > +# we are building 64-bit binary (checked above) AND binutils > +# can generate proper code > + > +if binutils_ok > + > + # compile AVX512 version if either: > + # a. we have AVX512 supported in minimum instruction set > + # baseline > + # b. it's not minimum instruction set, but supported by > + # compiler > + # > + # in former case, just add avx512 C file to files list > + # in latter case, compile c file to static lib, using correct > + # compiler flags, and then have the .o file from static lib > + # linked into main lib. > + > + # check if all required flags already enabled (variant a). > + dlb2_avx512_flags = ['__AVX512F__', '__AVX512VL__', > + '__AVX512CD__', '__AVX512BW__'] Minor nit: are all 4 of these really necessary? I see the runtime portion only seems to check for VL? > + > + dlb2_avx512_on = true > + foreach f:dlb2_avx512_flags > + > + if cc.get_define(f, args: machine_args) == '' > + dlb2_avx512_on = false > + endif > + endforeach > + > + if dlb2_avx512_on == true > + > + sources += files('dlb2_avx512.c') > + cflags += '-DCC_AVX512_SUPPORT' > + > + elif cc.has_multi_arguments('-mavx512f', '-mavx512vl', > + '-mavx512cd', '-mavx512bw') > + > + cflags += '-DCC_AVX512_SUPPORT' > + avx512_tmplib = static_library('avx512_tmp', > + 'dlb2_avx512.c', > + dependencies: [static_rte_eal, > + static_rte_eventdev], > + c_args: cflags + > + ['-mavx512f', '-mavx512vl', > + '-mavx512cd', '-mavx512bw']) > + objs += avx512_tmplib.extract_objects('dlb2_avx512.c') > + else > + sources += files('dlb2_sve.c') > + endif > +else > + sources += files('dlb2_sve.c') Since this is x86 only, do you mean SSE rather than SVE? Also, rather than adding this in the "else" legs, does the SSE version not need to always be compiled in? If the build takes the second leg, i.e. build is not mandating AVX-512, but supports it if not available, is the SSE code path not necessary for the case where the runtime machine does not support AVX-512? > +endif > + > headers = files('rte_pmd_dlb2.h') > > deps += ['mbuf', 'mempool', 'ring', 'pci', 'bus_pci'] > -- > 2.25.1