From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 90EAFA0553; Fri, 10 Jun 2022 17:51:06 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 39FB54069C; Fri, 10 Jun 2022 17:51:06 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id E4CE040689 for ; Fri, 10 Jun 2022 17:51:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654876264; x=1686412264; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=ilWHNCcebemncpVXFaF1GmAo7AIUd6OTa6iSdktnLKY=; b=NlTBUtWkd9Wx+KoZ7upOC99FoCXRneY/MH/VxOBSRA2UsmWy5TiSL4xD 6cJsoj9kCQCrdnwA4eZ6EcJy7TYBKdDSf+hPMNLrv7w8KFn3EHs+2TM8A 9OVQnk8PVJRBivuS6b6WrBxua4wlrek8vc9v2VAvLMLPp/Hvm+hSsQZvK lL51PJVYSIYMQn73x0BuTdN1/S1qFCH+HHTaHx7CRLewU+3CtYM+LDQzQ JTJPO2zbDYa0XCjRXHJpyukQN4nYukhPqzrMfUzwCfT82YCNVk4kHyKGf u+MslebmQ2nWV7IO5KG5MzppDiKhNmiri9BpIOhrVDbis/oQyyZlz7Run A==; X-IronPort-AV: E=McAfee;i="6400,9594,10374"; a="278799267" X-IronPort-AV: E=Sophos;i="5.91,290,1647327600"; d="scan'208";a="278799267" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2022 08:51:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,290,1647327600"; d="scan'208";a="566895430" Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by orsmga002.jf.intel.com with ESMTP; 10 Jun 2022 08:51:02 -0700 Received: from orsmsx602.amr.corp.intel.com (10.22.229.15) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27; Fri, 10 Jun 2022 08:51:02 -0700 Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by orsmsx602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27 via Frontend Transport; Fri, 10 Jun 2022 08:51:02 -0700 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (104.47.66.45) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2308.27; Fri, 10 Jun 2022 08:51:02 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MJVBp0ps7XpBCcftizGHbcJQqPz62UBvAxWhiqqW//9rqA4cs3yETV0B6OIohSoXzBcZT8baFyCc0NgEmqTDGw12d2cMckNgLUvdDPCCHLvjpu/Je3iJs/2PVPaf0gklLx000KcFllFsUnbvpFwXJcKzlniahhsTII4zxiwFIJTWCEiV7Z/whGm9ja2YKKKHCYMMHfOHshHNLYaiQoMIAZ9UThuFQi99VcITQUI1DaXFhE2v4MyyWvz8vX8LqVMgaGT80YgU5vYmaWcFmLrr6qlh+CvvN9hg3mWi1ysONUJqw9gsyNY9nZroFXuAHGil/IWtnaj73QqIKGmV4iX7mA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2R5OvSEpRsLfko6HA/ktJZf+zzk5QMNo+/FzL/AMBkE=; b=NR5FY5dBtQ1vnYRxpYzF5n9HLCQrsZd+N+CjCL4RLcBpHh4GMqeCFYwYRKCVfoM1Er1F/Doi+CEMWtDia10d+obCTLbPaUUPXpn2W4PZXAbb4MH5MQgOzWQaWiHTpVhy/5OZ3JjdwK+jUtkN7nxgiQYb4KFFHmin4TB1FlCDWdjABdjVTCGYgYxh7ELaZ56j9yE6aGmAL1ALCeIFNBRW2BnaP4mo7sK+MuNEgu4CkuHA1xaxpmBSBZRV1yqKpNz5LAmQzdQf3JPNsdFW4+FgoJXrSGfaxNnVExZPsvp5jlLky6nzWwLonOPXdlkIMv380Kmiuy7eUzYaOTm0lO+qqA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from SN6PR11MB3103.namprd11.prod.outlook.com (2603:10b6:805:d7::13) by BN8PR11MB3554.namprd11.prod.outlook.com (2603:10b6:408:88::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.12; Fri, 10 Jun 2022 15:51:00 +0000 Received: from SN6PR11MB3103.namprd11.prod.outlook.com ([fe80::6d4a:5f46:925b:a49d]) by SN6PR11MB3103.namprd11.prod.outlook.com ([fe80::6d4a:5f46:925b:a49d%6]) with mapi id 15.20.5314.021; Fri, 10 Jun 2022 15:51:00 +0000 From: "McDaniel, Timothy" To: "Richardson, Bruce" CC: "jerinj@marvell.com" , "dev@dpdk.org" , "Wires, Kent" Subject: RE: [PATCH v5] event/dlb2: add support for single 512B write of 4 QEs Thread-Topic: [PATCH v5] event/dlb2: add support for single 512B write of 4 QEs Thread-Index: AQHYfMu6LFi2c1ZLM0eFCjFAn5NDzK1IteJggAAR7YCAAACr8A== Date: Fri, 10 Jun 2022 15:51:00 +0000 Message-ID: References: <20220610123544.2332492-1-timothy.mcdaniel@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.6.500.17 authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: fbb6c079-cb45-4bce-b82d-08da4af9044d x-ms-traffictypediagnostic: BN8PR11MB3554:EE_ x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: TR/qW/NCqXYjSXgR+mz+t/26SJngOzy0XRRtJg9C5dp0/htTuYeafjPPyHocVatc2LPMs4jAh7nTMAEbjyph5Vr94ZTdgFTphFgBcIUrXlv0ALv4s4lbsof7+2jqufiKe7mFPwlnkkD+Zw5haYYkaf51dL93CX38sVoAkOh3BBc+dddnhtU3epmVPFJEC49NObKmP4eW3CDG3nZVj+7rqdDC0ws63ap7ZrJbcmpuoGZ6mFPlAnIZVTQQhjy2jQGzF70iOiZsEWiSgpqz60hVkuiqgvRSXrCLOlqXD/S4y3dIEXL1zND3iMM74ICF1y8wzezglbxA6SAA8GDQAZkvFwFDs1hAYhyGjRcZv41VUopvcTxO7Rfn18zd2/ZhO16XzJL8SFJk9nYgGdkhfe6aorPYJxdtXu6rgr7/hjrrXL9EoVippY2x5fYPS+vIVZZXfN3dqtDKvR8E/uOhqe2DCQAhE2r5x0IPigV4HQVBA67/GNgYAI1zni9hASlgKYjrhlWhQRV3jZ2VkNRBT2nv1cYHg8ArBbVS4l5P/Sv3Jtp8aGmGdcXdOq5I3z9yOPyrWdZUsNG4EzCGMRFMylsRPd9oS8dMR69qPX+//Xx+vIhxQxqNJ+r9mq4xOcoAGRio2P7WV3D1AqiVANPuI15ojmGeAll1Mp+wi57XBslE78VQ+MAANSvs33wNYvVNzovGxAs1AFbxRdlu6EbPT2WSCA== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SN6PR11MB3103.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(366004)(86362001)(5660300002)(55016003)(38100700002)(6636002)(54906003)(33656002)(107886003)(38070700005)(7696005)(122000001)(2906002)(52536014)(71200400001)(6862004)(8936002)(82960400001)(508600001)(53546011)(9686003)(6506007)(8676002)(66476007)(4326008)(66446008)(66556008)(186003)(316002)(64756008)(76116006)(66946007)(26005)(83380400001); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?j6e0jZZ099WEBw6npEOC8UDzi4pLewNieuzQNtLZFYhNJ9T/15xpHyqudIZ+?= =?us-ascii?Q?GUU4cnmGYoXOmpp6Dcab/xvJirlmJqXjOU+5nHcZqBQv8EySH8TiZB7SZhgM?= =?us-ascii?Q?nTVrLVcNOKQCxhyxtTm2JEgFPKjSmvqOQh9DIE4ErBQULk0Uxx2eZ6cdqsug?= =?us-ascii?Q?D0c3qxSLyQb4KAYGrsfQpmxGgRQdd+fNRvF1JbRN3pwNRJk28RLyDHM9QZbK?= =?us-ascii?Q?9hpzM85GcfZwOQaGa3hkcU2vIzULeJCpim+D+hbgsdKTA4UCsYjoO0+G2ZIA?= =?us-ascii?Q?BJcGX5sbBY9f1BSfXq+q9Ofs1TcIExeFIoaiC1+Ecik/hZ7TOiXC4aIv98iZ?= =?us-ascii?Q?FF9k4fu0aD6r3ePZmHOkWQ85NxEIwRbm/MweUF/J7rCrTHWjM0gy38h5xNbF?= =?us-ascii?Q?TyVuHsQITSXViOtPxCQyk0BSbxpPuwEGorbW/5Meh/Me7AhZ+CvPNVCt4lCA?= =?us-ascii?Q?sCYrtp7JNsnVqgZA1UHvqJMOdW10gU0Xl/cMYOqUg9LbnuubmiVc7kFaBnt2?= =?us-ascii?Q?79kLRwQI5t1g2hn7ISWI0OCHoj6zrLtu0NFWfB9LA5XBkHhMZNDUpexnfofD?= =?us-ascii?Q?QNoF7LLPENk++KQu4xkww1oNd/qnU5OXkgp0EE8uQM67ks2YRdthYsQ9HQD5?= =?us-ascii?Q?8IHKj2zVgeYpJybnK7QimwyITk+pCN79T0yx85hu1f4R/Dej6ofIniQOGN9G?= =?us-ascii?Q?vnoBAgINdORDV3NmbFGc1WwaAUfZn354+EiUhcXbL+H8+hA+IzhScORj2Rsf?= =?us-ascii?Q?QTzvfcBGjHoq9Dw9KkSGw2VbrIoZvxidOJQWe5QbcR150ks9NHftF7MF+pWr?= =?us-ascii?Q?SgkQ+nh+ikYFOq9Ytsx5TFo7W13G84eP0RzhB0OCJ2ABi1I1l3UmJObifvRi?= =?us-ascii?Q?zZCmMqixEeIPd0gIX/PXck9Ncs/R/3mkgJQaE+MR3pILaRC+lI4iE9LyFENq?= =?us-ascii?Q?aVTrqij3UMxT77UXQkn+/GSUPe9OM+U7dlyuBOzTeJuCdQk1IJdKI0DzrOpO?= =?us-ascii?Q?c4Iv0KQlNn2TsSghkzAGDDwETFkZySHoVXx/tdEpdgzcrinK0UVlEVrlk4hw?= =?us-ascii?Q?vm9hikd4/PfK5Amqp1klXRNGvZ5qKY9RvAMDEphKWLMvFdgc95oIWYjKdUec?= =?us-ascii?Q?pPp+mjeICVj/wS9OPfBuDW2b+fVfagQPmf7Sstd9uza/5h0zbKQ6bhgM19RH?= =?us-ascii?Q?5KoTisZGfRef3DCQr45DRQ4zec96zp9d711O2usSb0GgvqKatVxqHob4l1jG?= =?us-ascii?Q?f9L0vKvPJDz91bj2wXy1/lEfGvtgojNFO7q9RTjZLwKRJzYeekheEW8dsSih?= =?us-ascii?Q?iLi0N+BliwWayydNypYFDrMIKkLVxo2htoTp2y5Ilq7uzflzJLfJDPToLA8I?= =?us-ascii?Q?Mg30tgg/8m7PODZqslw5Y61FAIEi662FzwT+az0AeFDarYP4h8O7N3pxV746?= =?us-ascii?Q?712aBFRR3xXmSEcDn5ty+jM+SIr5AKne2GHClZUR7tviOi0yBKlyQk6gJHE/?= =?us-ascii?Q?cpHUQdfyGgrQxgZQ00uxQwTLT1pzSrrlPG1Dw8uHJTw9xU/jkEcRv9qI1Nir?= =?us-ascii?Q?W20d1ZdmVG7ighRI9Kci2z2wMZwNp+jMmKXb9i5OzeVqcnaQpKLryMsUTumf?= =?us-ascii?Q?Qf2GtHHt0vhMfpW+LETXrYrAbEudQorsBQ1HXzHlQGFjYothFG3MV0gxV4zI?= =?us-ascii?Q?fzELYMyTjACHmvBTSh/Y0yjP5S179+0SpohxyCqvqcGNkf1i6dp2KxDaLclv?= =?us-ascii?Q?BeKmYGkOZpCtzGyThg9Tme7yhZvtsyE=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SN6PR11MB3103.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: fbb6c079-cb45-4bce-b82d-08da4af9044d X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Jun 2022 15:51:00.5392 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: IQqmzFM3oebVx++n/mU1RDR8xTZQHA9vDg0PZUL6+6daGehEedr8BS5Nl3IaoiIG91kygSMLKIjBV209yPwNlkdVKxUaFhT3dobOygCDtpI= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR11MB3554 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > -----Original Message----- > From: Richardson, Bruce > Sent: Friday, June 10, 2022 10:43 AM > To: McDaniel, Timothy > Cc: jerinj@marvell.com; dev@dpdk.org; Wires, Kent > Subject: Re: [PATCH v5] event/dlb2: add support for single 512B write of = 4 QEs >=20 > On Fri, Jun 10, 2022 at 03:41:00PM +0100, McDaniel, Timothy wrote: > > > > > > > -----Original Message----- > > > From: Richardson, Bruce > > > Sent: Friday, June 10, 2022 8:12 AM > > > To: McDaniel, Timothy > > > Cc: jerinj@marvell.com; dev@dpdk.org; Wires, Kent > > > Subject: Re: [PATCH v5] event/dlb2: add support for single 512B write= of 4 > QEs > > > > > > On Fri, Jun 10, 2022 at 07:35:44AM -0500, Timothy McDaniel wrote: > > > > On Xeon, 512b accesses are available, so movdir64 instruction is ab= le to > > > > perform 512b read and write to DLB producer port. In order for movd= ir64 > > > > to be able to pull its data from store buffers (store-buffer-forwar= ding) > > > > (before actual write), data should be in single 512b write format. > > > > This commit add change when code is built for Xeon with 512b AVX su= pport > > > > to make single 512b write of all 4 QEs instead of 4x64b writes. > > > > > > > > Signed-off-by: Timothy McDaniel > > > > Acked-by: Kent Wires > > > > =3D=3D=3D > > > > > > > > Changes since V4: > > > > 1) Add build-time control for avx512 support to meson.buildi, based > > > > on implementation found in lib/acl/meson.build > > > > 2) Add rte_vect_get_max_simd_bitwidth runtime check before using > > > > avx512 instructions > > > > > > > > > > Thanks, these changes look better for runtime support. Some further m= ore > > > minor comments inline below. > > > > > > /Bruce > > > > > > > Changes since V3: > > > > 1) Renamed dlb2_noavx512.c to dlb2_sve.c, and fixed up meson.build > > > > for new file name. > > > > > > > > Changes since V1: > > > > 1) Split out dlb2_event_build_hcws into two implementations, one > > > > that uses AVX512 instructions, and one that does not. Each implemen= tation > > > > is in its own source file in order to avoid build errors if the com= piler > > > > does not support the newer AVX512 instructions. > > > > 2) Update meson.build to and pull in appropriate source file based = on > > > > whether the compiler supports AVX512VL > > > > 3) Check if target supports AVX512VL, and use appropriate implement= ation > > > > based on this runtime check. > > > > --- > > > > drivers/event/dlb2/dlb2.c | 208 +----------------------- > > > > drivers/event/dlb2/dlb2_avx512.c | 267 > > > +++++++++++++++++++++++++++++++ > > > > drivers/event/dlb2/dlb2_priv.h | 10 ++ > > > > drivers/event/dlb2/dlb2_sve.c | 219 +++++++++++++++++++++++++ > > > > drivers/event/dlb2/meson.build | 53 ++++++ > > > > 5 files changed, 556 insertions(+), 201 deletions(-) > > > > create mode 100644 drivers/event/dlb2/dlb2_avx512.c > > > > create mode 100644 drivers/event/dlb2/dlb2_sve.c > > > > > > > > > > > > > > diff --git a/drivers/event/dlb2/meson.build > b/drivers/event/dlb2/meson.build > > > > index f963589fd3..58146e8aef 100644 > > > > --- a/drivers/event/dlb2/meson.build > > > > +++ b/drivers/event/dlb2/meson.build > > > > @@ -19,6 +19,59 @@ sources =3D files( > > > > 'dlb2_selftest.c', > > > > ) > > > > > > > > +# compile AVX512 version if: > > > > +# we are building 64-bit binary (checked above) AND binutils > > > > +# can generate proper code > > > > + > > > > +if binutils_ok > > > > + > > > > + # compile AVX512 version if either: > > > > + # a. we have AVX512 supported in minimum instruction set > > > > + # baseline > > > > + # b. it's not minimum instruction set, but supported by > > > > + # compiler > > > > + # > > > > + # in former case, just add avx512 C file to files list > > > > + # in latter case, compile c file to static lib, using correct > > > > + # compiler flags, and then have the .o file from static lib > > > > + # linked into main lib. > > > > + > > > > + # check if all required flags already enabled (variant a). > > > > + dlb2_avx512_flags =3D ['__AVX512F__', '__AVX512VL__', > > > > + '__AVX512CD__', '__AVX512BW__'] > > > > > > Minor nit: are all 4 of these really necessary? I see the runtime por= tion > > > only seems to check for VL? > > > > > > > I will update to check for just VL > > > > > > + > > > > + dlb2_avx512_on =3D true > > > > + foreach f:dlb2_avx512_flags > > > > + > > > > + if cc.get_define(f, args: machine_args) =3D=3D '' > > > > + dlb2_avx512_on =3D false > > > > + endif > > > > + endforeach > > > > + > > > > + if dlb2_avx512_on =3D=3D true > > > > + > > > > + sources +=3D files('dlb2_avx512.c') > > > > + cflags +=3D '-DCC_AVX512_SUPPORT' > > > > + > > > > + elif cc.has_multi_arguments('-mavx512f', '-mavx512vl', > > > > + '-mavx512cd', '-mavx512bw') > > > > + > > > > + cflags +=3D '-DCC_AVX512_SUPPORT' > > > > + avx512_tmplib =3D static_library('avx512_tmp', > > > > + 'dlb2_avx512.c', > > > > + dependencies: [static_rte_eal, > > > > + static_rte_eventdev], > > > > + c_args: cflags + > > > > + ['-mavx512f', '-mavx512vl', > > > > + '-mavx512cd', '-mavx512bw'= ]) > > > > + objs +=3D avx512_tmplib.extract_objects('dlb2_avx512.c') > > > > + else > > > > + sources +=3D files('dlb2_sve.c') > > > > + endif > > > > +else > > > > + sources +=3D files('dlb2_sve.c') > > > > > > Since this is x86 only, do you mean SSE rather than SVE? > > > > > > Also, rather than adding this in the "else" legs, does the SSE versio= n not > > > need to always be compiled in? If the build takes the second leg, i.e= . > > > build is not mandating AVX-512, but supports it if not available, is = the > > > SSE code path not necessary for the case where the runtime machine do= es > not > > > support AVX-512? > > > > > > > I'll update the name, but it's an "either or" situation. They cannot bo= th be built > > as currently coded. > > > If only the AVX-512 path is built, what happens when the runtime check fo= r > AVX-512 fails? Is there a scalar path that is used as fallback? >=20 > /Bruce The file dlb2_avx512.c contains a runtime check that controls whether avx51= 2 instructions are used.=20 That file contains both implementations. The dlb2_sse.c file contains only = the original/sse implementation. Thanks, Tim