From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9335EA0553; Fri, 10 Jun 2022 16:41:18 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 22AC34069C; Fri, 10 Jun 2022 16:41:18 +0200 (CEST) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id CF7E240689 for ; Fri, 10 Jun 2022 16:41:14 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654872076; x=1686408076; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=D4k2sBDqDS8qPLxOXcF6gvJ0KuoOtHq6jcQXaWtpoHc=; b=SFZxKgzLsrsUbrMwT+5CLuXmQkCK2z+xEpe8h6+7invgXavpQL52T9le Ig7KIPRpsCiz71mAi270T5l41DemEgNnWRhi7Dc+0ZvPsJldmeXlxKnyZ gbim5zPtTaJj6jCmMv53BRhE7DNDOef+q7bQWTdwbqjJU00RilaugoidI TsVJrap4g5vNqVP8g8dCXLa22l7YTNnyEyW0hAPYvCXS/OnVQNjQ2eaUz xc6PE2JcpOCdYR5Qtx7swHvKzeQ07eWGpmZXGL2jM7aqRIUhlerhGjvJU mxtSsKXaU9rrRqEzM/Z1738z60zDaMrGBFXe7wmhqwhgKA3FoWdyiADQx A==; X-IronPort-AV: E=McAfee;i="6400,9594,10374"; a="275179250" X-IronPort-AV: E=Sophos;i="5.91,290,1647327600"; d="scan'208";a="275179250" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2022 07:41:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,290,1647327600"; d="scan'208";a="566869653" Received: from orsmsx605.amr.corp.intel.com ([10.22.229.18]) by orsmga002.jf.intel.com with ESMTP; 10 Jun 2022 07:41:03 -0700 Received: from orsmsx602.amr.corp.intel.com (10.22.229.15) by ORSMSX605.amr.corp.intel.com (10.22.229.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27; Fri, 10 Jun 2022 07:41:03 -0700 Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by orsmsx602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27 via Frontend Transport; Fri, 10 Jun 2022 07:41:03 -0700 Received: from NAM04-BN8-obe.outbound.protection.outlook.com (104.47.74.42) by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2308.27; Fri, 10 Jun 2022 07:41:02 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=e62rhp4frrQ8sS75/+geR7xqna3meZ+DPlY/p9Jgp0oR3ytw01hxqF2DMIOJ7o77foTthXsSdpB8Q/C7JWPXtvP4cCNcka69yQ941SO23A9jeKINCbglzoQexm5nBnabA2k3KmqZuk2cDmHIIJlWIgHotDJlHDpD/TE4Tc5ez4B7mdduUpJt5ly797hMvy+leb9W401VB9CYxN5Gi3CYbsVDJPOU4RHIbIJsZC6vjF2L6I2WiWhKu6pMV+L6QldXWgOPIj6X2YCfI+7ccDyGY5OquR5+/od0Wf9sq4xZT71ZZQJKR1H5uP3q3yVtr9rFRGHlYo31/Gl4FkKpXVwR3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zoKhFpTgopo2ad/WzeH0dpkIWAxh/Zy0W1Gn1FO6c1o=; b=kE/QGp10+cX1VX8FFfrKaZarDqKyTlCX8L03JnkmWjMR0IzPrhJc7VOFkaHT/GBNn3pzkGaTiuay4rMEytFfHY72b+K6qyh4QthtPeCaDXOp0xep9qn5vDgmSNyK+gQ6+s1iz1JCWZUZh8Rxh6ea3jSOyZFdHvDOWrl88uyvqP1mzha4PS7DCe8ut8/ktzZ2tcz71fLfQYU4j37ywaUzVh5PAuG49SVSBbIDS5qFpfMsohGFaob9eLLUxQoUtgBLZUyBOf70KI3riVbp2RxgRmQQ4QH65cBuOhDrMK1pgVNKTKu/6iHBOwDFkaT5+aCsAzwxxCbytA/tVPu4GT8syQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from SN6PR11MB3103.namprd11.prod.outlook.com (2603:10b6:805:d7::13) by BYAPR11MB2662.namprd11.prod.outlook.com (2603:10b6:a02:c8::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5314.15; Fri, 10 Jun 2022 14:41:00 +0000 Received: from SN6PR11MB3103.namprd11.prod.outlook.com ([fe80::6d4a:5f46:925b:a49d]) by SN6PR11MB3103.namprd11.prod.outlook.com ([fe80::6d4a:5f46:925b:a49d%6]) with mapi id 15.20.5314.021; Fri, 10 Jun 2022 14:41:00 +0000 From: "McDaniel, Timothy" To: "Richardson, Bruce" CC: "jerinj@marvell.com" , "dev@dpdk.org" , "Wires, Kent" Subject: RE: [PATCH v5] event/dlb2: add support for single 512B write of 4 QEs Thread-Topic: [PATCH v5] event/dlb2: add support for single 512B write of 4 QEs Thread-Index: AQHYfMu6LFi2c1ZLM0eFCjFAn5NDzK1IteJg Date: Fri, 10 Jun 2022 14:41:00 +0000 Message-ID: References: <20220610123544.2332492-1-timothy.mcdaniel@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.6.500.17 authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 49ddd4c9-06bb-4d30-c943-08da4aef3d04 x-ms-traffictypediagnostic: BYAPR11MB2662:EE_ x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: E5/OXYMkZnjWJVg3dF9VRQ3ccpAReskPiTg8R2JpKB4xx5eRd3410H6GsXqekwHVuAgd4SYPR4QCGqPRp5zIZletecDr1fWnB9BBhlwJJt9SDqRTFztZT8AjC9yqeGr9ZayoMpUx8P0GI9JYJvltp5WIYIa3rSs6oqUT/g+8G20giUmnOVfBqj6WI4VXiywzy0URhj15yO7bsaGSCPdG54Ko2Yuh2JhOvMHW6fuuPkG4XbDYb6J4tjho431oVDV0TpLsk+/4lVXhHpsBdaljq2aRRW3eBbDQC7WCR0+6+ozM7eihbZ0er4QS9k2iWaegJc6qp/nft26cyhbpWlMwGCB3rfOF3Fapy8iRo//LMKbVSQmhNgCn1sVTHDjmk4DcEjychUmq1JcRRYJxBzd56w+BW3VXlV6y1HwPqnnmf3pVBfF1ysraG6y3fxvD8Z/Vn/ckurkSwBBCr+5p5tmTNdWkX6xGcDGiKMYbF8xxUzP3JrP7uxqO+ExTX0tTfPKfYuU0AzLRSpGAmdAYURm7nGxYZHXL1fqLD6/3gw+iZMoMx+Rhuv0x42mhRR0IqYO+dmNNyUmL9hAPDM7F5AcbkiMTbwGCrnOki2a3ZW8fLSXZ77EdJttTYXGwCbMfNpBrH+b5JaKjhtP8W/Pr7HaD39xCI+1xBGBv6kvYPG2jnBeMRePcUZ1A3NNH2Hapqs3DyqWvekY85k/6GmPNSL1ayw== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SN6PR11MB3103.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(366004)(508600001)(52536014)(8936002)(71200400001)(64756008)(82960400001)(55016003)(186003)(26005)(316002)(6862004)(38070700005)(122000001)(83380400001)(54906003)(9686003)(6636002)(107886003)(66446008)(66556008)(33656002)(53546011)(2906002)(66946007)(38100700002)(5660300002)(66476007)(8676002)(7696005)(86362001)(76116006)(4326008)(6506007); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?mU1/gKV+Uey3ihheml44eVLGFtWr9ilGXPbK5opGjQEGLeEEukJS26iYS70u?= =?us-ascii?Q?9pFyF1hIOp0VR4Dhj8kSSSDoDSCQDhwLt00Y15bkRnQAZGwi6Ps4f39miUir?= =?us-ascii?Q?yGYIxwnGDIt04J0KolFspfAJi4/HoLGBefqHjOgUPWhsSX163KuFHlmc7wUA?= =?us-ascii?Q?jyyT07RZA8yGvYNe2WzigjBb1K+etRjw1QZ2IqJt9SsUJQSq7vurtGG3f4Dz?= =?us-ascii?Q?6izelaN2VkanlhDFmANin7LXTk84KwHp1DBa/EYwW7kHxqJapgau6V3Ch++m?= =?us-ascii?Q?Rt6J+M8vTIBU6lPmHkUxCy9eWg7Y9VxOEmM234kTVT/zC5y1YoHTpyBkxR/P?= =?us-ascii?Q?BQHUeIUFVswiqVeWoTjDwQlSNBAg8LAsnvsVGJLoX/MjvMkrQq6RprR9I3fS?= =?us-ascii?Q?4BI0ZQi7NSzo5GVxH0nFjAEqnPsNcnFXS9edH3kmI5LB6SVTDPvw63kuUuYn?= =?us-ascii?Q?LjRyAgeYSPpm7L3fAxMdhVbvYQSsa0IXAeV4bIoMpTOsJrHBcgUlfjotZDaj?= =?us-ascii?Q?xDnip+wiU+8uYI0t84iwsx7csALaojg/olWN3c99QOjEND459uvLP84+Vzkc?= =?us-ascii?Q?4DzLY3SCQmwc8p13VusAvsg40lZ5weS/ukAGM35fJ+l5cCTZzRtFaYFKmmXR?= =?us-ascii?Q?J9y1BV7wr8O83c1wDFTgD1WOqP9lB3/JLi8rp1VxqYu/zH0y6ft4tC6MWbvB?= =?us-ascii?Q?U2kmpICRmC3c5qcrHEWrXE7c7UVupr4y/m6/G6R4ctuqN3vZzOrpO8VKQ9s0?= =?us-ascii?Q?NXCAFPeW05TkENkJfAiIXxzuqcxuR9qkGqeCfcD0lzJwhR63XJxBXX82iNfb?= =?us-ascii?Q?OKOiYsB5JVIye2YRNwWVUsZaVvMtUNcnKTSrBVRMLycxsXxE4BTSdprwzWYb?= =?us-ascii?Q?GYqOliTpGeq3URpMV9Xu06T3WNz0B8g7CV/VIZO5tDxSqj2goWpCt+wOjktw?= =?us-ascii?Q?kUST0nVLwTDjcnZFKfUvcFFlJf59XJ5+3lOCPuFS6qvHhCB9EjljrKGg/J0w?= =?us-ascii?Q?3ot4kzRAsz8l/+6Qa5VxL9h/aCR7xblNXTefjoQ3C37RNe4+CUgRgPwxWFlc?= =?us-ascii?Q?GqyBRkba7OzDKFlXtwQMYPbf6zo//KIU4XLlGViKwKYcwQPhORcOyLzxIG+g?= =?us-ascii?Q?CBY3chQP/fthm8VCccgKi+ODk3zskeCgprIHxReimh97626ktECjiiXE1n2X?= =?us-ascii?Q?T0wQ0vBCwuqbKjfXV+SxC+UofBo88SadeWMAbWMBzF+lWhk2OibKZEYWXMtB?= =?us-ascii?Q?oxbkyPvsGfhR/U9vFf/N4IDpnBAvlZx5CHfSDd+pUn2ylik5fivW7gDfisrM?= =?us-ascii?Q?RWaNBjAwgGzaE94gY+PNxMESwn3CH252j6qS99fP5uN1Gp+RHixlCBcZNFkt?= =?us-ascii?Q?jeDae8tqh9PFf77dcScSC57w2HZ67A+M0LCQBRtvo/NsIZJw00B2YYRYmQIA?= =?us-ascii?Q?ZthIIRtj5XnUqctPJKeqIWFCxT0GJYVoRjdwhA/ufqlEfWjWJTZd0jp11hiU?= =?us-ascii?Q?ItSkcX0LGNbYHuwfXiWaR2mwCTcgm+L+kvPcGXP6D4Se6ssOz73lMMLLUszd?= =?us-ascii?Q?PdUv1yS8yURiWAX9BDBmEogcPDoZSEyRnrZqIqvfsMSKA2WsIlIOCZzpiqKA?= =?us-ascii?Q?5ss7mEwRyekzpcJu9cICIJr6QoCbmq63UWGIAGjO9c2VClhMXDAiazvNjlhK?= =?us-ascii?Q?P56mw6h3HeiKrJ7Ros3mJSLHaM1oI+8zaGXJ/gTKlxob2/mTaHj0kzMz8vHx?= =?us-ascii?Q?EN1+2A1o+b7wkJ7PvhVAeww0ut7Jmc4=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SN6PR11MB3103.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 49ddd4c9-06bb-4d30-c943-08da4aef3d04 X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Jun 2022 14:41:00.7553 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: fG55PABiu8NU5vRIBRJ8gCxz6qoWX0IEwbB9Xhmb5scFPkgRu4SpTa9R46QFQX3MxNH97UYz0lMxfqSXq0DEa+TCLnaRoWJENtgvouTMXow= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB2662 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > -----Original Message----- > From: Richardson, Bruce > Sent: Friday, June 10, 2022 8:12 AM > To: McDaniel, Timothy > Cc: jerinj@marvell.com; dev@dpdk.org; Wires, Kent > Subject: Re: [PATCH v5] event/dlb2: add support for single 512B write of = 4 QEs >=20 > On Fri, Jun 10, 2022 at 07:35:44AM -0500, Timothy McDaniel wrote: > > On Xeon, 512b accesses are available, so movdir64 instruction is able t= o > > perform 512b read and write to DLB producer port. In order for movdir64 > > to be able to pull its data from store buffers (store-buffer-forwarding= ) > > (before actual write), data should be in single 512b write format. > > This commit add change when code is built for Xeon with 512b AVX suppor= t > > to make single 512b write of all 4 QEs instead of 4x64b writes. > > > > Signed-off-by: Timothy McDaniel > > Acked-by: Kent Wires > > =3D=3D=3D > > > > Changes since V4: > > 1) Add build-time control for avx512 support to meson.buildi, based > > on implementation found in lib/acl/meson.build > > 2) Add rte_vect_get_max_simd_bitwidth runtime check before using > > avx512 instructions > > >=20 > Thanks, these changes look better for runtime support. Some further more > minor comments inline below. >=20 > /Bruce >=20 > > Changes since V3: > > 1) Renamed dlb2_noavx512.c to dlb2_sve.c, and fixed up meson.build > > for new file name. > > > > Changes since V1: > > 1) Split out dlb2_event_build_hcws into two implementations, one > > that uses AVX512 instructions, and one that does not. Each implementati= on > > is in its own source file in order to avoid build errors if the compile= r > > does not support the newer AVX512 instructions. > > 2) Update meson.build to and pull in appropriate source file based on > > whether the compiler supports AVX512VL > > 3) Check if target supports AVX512VL, and use appropriate implementatio= n > > based on this runtime check. > > --- > > drivers/event/dlb2/dlb2.c | 208 +----------------------- > > drivers/event/dlb2/dlb2_avx512.c | 267 > +++++++++++++++++++++++++++++++ > > drivers/event/dlb2/dlb2_priv.h | 10 ++ > > drivers/event/dlb2/dlb2_sve.c | 219 +++++++++++++++++++++++++ > > drivers/event/dlb2/meson.build | 53 ++++++ > > 5 files changed, 556 insertions(+), 201 deletions(-) > > create mode 100644 drivers/event/dlb2/dlb2_avx512.c > > create mode 100644 drivers/event/dlb2/dlb2_sve.c > > > >=20 > > diff --git a/drivers/event/dlb2/meson.build b/drivers/event/dlb2/meson.= build > > index f963589fd3..58146e8aef 100644 > > --- a/drivers/event/dlb2/meson.build > > +++ b/drivers/event/dlb2/meson.build > > @@ -19,6 +19,59 @@ sources =3D files( > > 'dlb2_selftest.c', > > ) > > > > +# compile AVX512 version if: > > +# we are building 64-bit binary (checked above) AND binutils > > +# can generate proper code > > + > > +if binutils_ok > > + > > + # compile AVX512 version if either: > > + # a. we have AVX512 supported in minimum instruction set > > + # baseline > > + # b. it's not minimum instruction set, but supported by > > + # compiler > > + # > > + # in former case, just add avx512 C file to files list > > + # in latter case, compile c file to static lib, using correct > > + # compiler flags, and then have the .o file from static lib > > + # linked into main lib. > > + > > + # check if all required flags already enabled (variant a). > > + dlb2_avx512_flags =3D ['__AVX512F__', '__AVX512VL__', > > + '__AVX512CD__', '__AVX512BW__'] >=20 > Minor nit: are all 4 of these really necessary? I see the runtime portion > only seems to check for VL? >=20 I will update to check for just VL > > + > > + dlb2_avx512_on =3D true > > + foreach f:dlb2_avx512_flags > > + > > + if cc.get_define(f, args: machine_args) =3D=3D '' > > + dlb2_avx512_on =3D false > > + endif > > + endforeach > > + > > + if dlb2_avx512_on =3D=3D true > > + > > + sources +=3D files('dlb2_avx512.c') > > + cflags +=3D '-DCC_AVX512_SUPPORT' > > + > > + elif cc.has_multi_arguments('-mavx512f', '-mavx512vl', > > + '-mavx512cd', '-mavx512bw') > > + > > + cflags +=3D '-DCC_AVX512_SUPPORT' > > + avx512_tmplib =3D static_library('avx512_tmp', > > + 'dlb2_avx512.c', > > + dependencies: [static_rte_eal, > > + static_rte_eventdev], > > + c_args: cflags + > > + ['-mavx512f', '-mavx512vl', > > + '-mavx512cd', '-mavx512bw']) > > + objs +=3D avx512_tmplib.extract_objects('dlb2_avx512.c') > > + else > > + sources +=3D files('dlb2_sve.c') > > + endif > > +else > > + sources +=3D files('dlb2_sve.c') >=20 > Since this is x86 only, do you mean SSE rather than SVE? >=20 > Also, rather than adding this in the "else" legs, does the SSE version no= t > need to always be compiled in? If the build takes the second leg, i.e. > build is not mandating AVX-512, but supports it if not available, is the > SSE code path not necessary for the case where the runtime machine does n= ot > support AVX-512? >=20 I'll update the name, but it's an "either or" situation. They cannot both b= e built as currently coded. > > +endif > > + > > headers =3D files('rte_pmd_dlb2.h') > > > > deps +=3D ['mbuf', 'mempool', 'ring', 'pci', 'bus_pci'] > > -- > > 2.25.1