From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id CAF19A00BE; Mon, 16 May 2022 10:42:59 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7466640A7A; Mon, 16 May 2022 10:42:59 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id 5F41E40A79 for ; Mon, 16 May 2022 10:42:57 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1652690578; x=1684226578; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=lK0+zaL4YgHvsNKqE3w8EDnC9ism2MRfTBCSZm2kk/4=; b=F8UxHtp/pp2rzCE2xpS0BydpILkMwBO45HtVZTN7GwLLtjIjU3lVQaNY MZAnTJ8AonP7rdh7aUFD3qibyiOxHirZxHxMYtUkF9NHXrsSBsFFyZmYC GWJlK1f5pwvgCbI6YdKbEpQCQ9JK9AWj1+dmUb97Nm0hxfH+AxEYsbZg7 PKRGzJsvs6tcQNuFJAPQgq1sbwevHjrzbEi6LMpcGV84gV/WowPxlly8N QoihIMQ8mzfM+s7nZ6r/dZoHrm6kp9LE2Px+r9hSBKYhnbMS6st0hE9lq MJayGMPFJmGlLXoWYrTe2SaxFeeU3TFu7IX41YFRErxvuxdIUBx65+xFQ A==; X-IronPort-AV: E=McAfee;i="6400,9594,10348"; a="251281472" X-IronPort-AV: E=Sophos;i="5.91,229,1647327600"; d="scan'208";a="251281472" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 May 2022 01:42:56 -0700 X-IronPort-AV: E=Sophos;i="5.91,229,1647327600"; d="scan'208";a="596419590" Received: from dcremins-mobl.ger.corp.intel.com (HELO bricha3-MOBL.ger.corp.intel.com) ([10.252.3.55]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 16 May 2022 01:42:54 -0700 Date: Mon, 16 May 2022 09:42:50 +0100 From: Bruce Richardson To: Jerin Jacob Cc: Timothy McDaniel , konstantin.v.ananyev@yandex.ru, Jerin Jacob , dpdk-dev Subject: Re: [PATCH] event/dlb2: add support for single 512B write of 4 QEs Message-ID: References: <20220409151849.1007602-1-timothy.mcdaniel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Sat, May 14, 2022 at 05:37:39PM +0530, Jerin Jacob wrote: > On Sat, Apr 9, 2022 at 8:48 PM Timothy McDaniel > wrote: > > > > On Xeon, as 512b accesses are available, movdir64 instruction is able > > to perform 512b read and write to DLB producer port. In order for > > movdir64 to be able to pull its data from store buffers > > (store-buffer-forwarding) (before actual write), data should be in > > single 512b write format. This commit add change when code is built > > for Xeon with 512b AVX support to make single 512b write of all 4 QEs > > instead of 4x64b writes. > > > > Signed-off-by: Timothy McDaniel --- > > drivers/event/dlb2/dlb2.c | 86 ++++++++++++++++++++++++++++++--------- > > 1 file changed, 67 insertions(+), 19 deletions(-) > > > > diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c > > index 36f07d0061..e2a5303310 100644 --- a/drivers/event/dlb2/dlb2.c +++ > > b/drivers/event/dlb2/dlb2.c @@ -2776,25 +2776,73 @@ > > dlb2_event_build_hcws(struct dlb2_port *qm_port, ev[3].event_type, > > DLB2_QE_EV_TYPE_WORD + 4); > > > > - /* Store the metadata to memory (use the > > double-precision - * _mm_storeh_pd because there is no > > integer function for - * storing the upper 64b): - > > * qe[0] metadata = sse_qe[0][63:0] - * qe[1] metadata = > > sse_qe[0][127:64] - * qe[2] metadata = sse_qe[1][63:0] - > > * qe[3] metadata = sse_qe[1][127:64] - */ - > > _mm_storel_epi64((__m128i *)&qe[0].u.opaque_data, sse_qe[0]); - > > _mm_storeh_pd((double *)&qe[1].u.opaque_data, - > > (__m128d)sse_qe[0]); - _mm_storel_epi64((__m128i > > *)&qe[2].u.opaque_data, sse_qe[1]); - > > _mm_storeh_pd((double *)&qe[3].u.opaque_data, - > > (__m128d)sse_qe[1]); - - qe[0].data = ev[0].u64; - > > qe[1].data = ev[1].u64; - qe[2].data = ev[2].u64; - > > qe[3].data = ev[3].u64; + #ifdef __AVX512VL__ > > + x86 maintainers > > We need a runtime check based on CPU flags. Right? As the build and run > machine can be different? > Ideally, yes, this should be a run-time decision. There are quite a number of examples of this in DPDK. However, most uses of runtime decisions are in functions called via function pointer, so not sure if those schemes apply here. It's certainly worth investigating, though. /Bruce