From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C81AFA00C3; Sat, 14 May 2022 14:08:07 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 67A8F40683; Sat, 14 May 2022 14:08:07 +0200 (CEST) Received: from mail-il1-f182.google.com (mail-il1-f182.google.com [209.85.166.182]) by mails.dpdk.org (Postfix) with ESMTP id C01B940395 for ; Sat, 14 May 2022 14:08:05 +0200 (CEST) Received: by mail-il1-f182.google.com with SMTP id d3so7457397ilr.10 for ; Sat, 14 May 2022 05:08:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fWykteoKk258lrcZhJIQTYKPVecNQudCU0ZhsS1m4YA=; b=oYHjDmbrRWGDm/3LG51QkErsohiwOu26Xbsy/olbLC3nRROyio66SrBkPKNVgXb+Qs 9YrO62rS6TnajpEw2w0vQr1LY7fOFg80JiYY01Gv95ZB2G8he9VzjFul/9IXYUx1Uzuu oi50eEJ+5SCjo/qp5zYLGIBusxgy0rAg8UdjvIqDBNdg2ySJkqWshptbdSGLugePvw4d 6mxbYy9HcFqqpRc2IfmQfdPbSFA/L3gwKeA3JJyCxhZbCQX0I9cEXgH5cueFN3mUkUp4 NtwWN0DAC2ibc8oID1UIBS5pT5KzC00BXO/bljtsBR9f2IhITKR7/uI+JSY9OFOB7MX0 kosg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fWykteoKk258lrcZhJIQTYKPVecNQudCU0ZhsS1m4YA=; b=xRuujIUrytZzaT0yR0nVuz1xE/SWUTkA+gec7aVgtR564LwzhjzTXasniqY4mqvYIv ylleytHeK73053+DGRDxkk8azvFDXJgna+WdJvxV0HIeajixEh5Ls7R0z7AcZsHknHEQ zFmEtf9d0sVvEdXi76BU25DU8TQt01gO635l7hzom7GortHdj9xd605M47UxQ9OZsb8J Yrpzo8PzgHkthlEyjaBS1avDWNLM3cZ7Na5rKi55bnY0eQ4DuxdEWGGE3zw21basW8NK s5sP87AdX3wOa7Nrv/SUQc8lhykLZFCosTchWPu/5zCfbBjZACMyJ5DZ/UIlVs+MFmry fYxA== X-Gm-Message-State: AOAM533j+D5bduGuaq8dHfafsC6RDCnsCsPQHw+V6NY25OeNUWO7RclM OjUx8UzvS2OMqileDOT85olJooQmT+BHHJ8hXyU= X-Google-Smtp-Source: ABdhPJw34RuyYzYOsjlTUXY5//twSCoa0qLCbjNzc6yY3crLIUWa5vPOaMMSfh3tgyKSSoM8ipK5HQY37+ZSWJcQo70= X-Received: by 2002:a05:6e02:1d8a:b0:2cd:fa75:6395 with SMTP id h10-20020a056e021d8a00b002cdfa756395mr4779835ila.294.1652530085043; Sat, 14 May 2022 05:08:05 -0700 (PDT) MIME-Version: 1.0 References: <20220409151849.1007602-1-timothy.mcdaniel@intel.com> In-Reply-To: <20220409151849.1007602-1-timothy.mcdaniel@intel.com> From: Jerin Jacob Date: Sat, 14 May 2022 17:37:39 +0530 Message-ID: Subject: Re: [PATCH] event/dlb2: add support for single 512B write of 4 QEs To: Timothy McDaniel , "Richardson, Bruce" , konstantin.v.ananyev@yandex.ru Cc: Jerin Jacob , dpdk-dev Content-Type: text/plain; charset="UTF-8" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Sat, Apr 9, 2022 at 8:48 PM Timothy McDaniel wrote: > > On Xeon, as 512b accesses are available, movdir64 instruction is able to > perform 512b read and write to DLB producer port. In order for movdir64 > to be able to pull its data from store buffers (store-buffer-forwarding) > (before actual write), data should be in single 512b write format. > This commit add change when code is built for Xeon with 512b AVX support > to make single 512b write of all 4 QEs instead of 4x64b writes. > > Signed-off-by: Timothy McDaniel > --- > drivers/event/dlb2/dlb2.c | 86 ++++++++++++++++++++++++++++++--------- > 1 file changed, 67 insertions(+), 19 deletions(-) > > diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c > index 36f07d0061..e2a5303310 100644 > --- a/drivers/event/dlb2/dlb2.c > +++ b/drivers/event/dlb2/dlb2.c > @@ -2776,25 +2776,73 @@ dlb2_event_build_hcws(struct dlb2_port *qm_port, > ev[3].event_type, > DLB2_QE_EV_TYPE_WORD + 4); > > - /* Store the metadata to memory (use the double-precision > - * _mm_storeh_pd because there is no integer function for > - * storing the upper 64b): > - * qe[0] metadata = sse_qe[0][63:0] > - * qe[1] metadata = sse_qe[0][127:64] > - * qe[2] metadata = sse_qe[1][63:0] > - * qe[3] metadata = sse_qe[1][127:64] > - */ > - _mm_storel_epi64((__m128i *)&qe[0].u.opaque_data, sse_qe[0]); > - _mm_storeh_pd((double *)&qe[1].u.opaque_data, > - (__m128d)sse_qe[0]); > - _mm_storel_epi64((__m128i *)&qe[2].u.opaque_data, sse_qe[1]); > - _mm_storeh_pd((double *)&qe[3].u.opaque_data, > - (__m128d)sse_qe[1]); > - > - qe[0].data = ev[0].u64; > - qe[1].data = ev[1].u64; > - qe[2].data = ev[2].u64; > - qe[3].data = ev[3].u64; > + #ifdef __AVX512VL__ + x86 maintainers We need a runtime check based on CPU flags. Right? As the build and run machine can be different? > + > + /* > + * 1) Build avx512 QE store and build each > + * QE individually as XMM register > + * 2) Merge the 4 XMM registers/QEs into single AVX512 > + * register > + * 3) Store single avx512 register to &qe[0] (4x QEs > + * stored in 1x store) > + */ > + > + __m128i v_qe0 = _mm_setzero_si128(); > + uint64_t meta = _mm_extract_epi64(sse_qe[0], 0); > + v_qe0 = _mm_insert_epi64(v_qe0, ev[0].u64, 0); > + v_qe0 = _mm_insert_epi64(v_qe0, meta, 1); > + > + __m128i v_qe1 = _mm_setzero_si128(); > + meta = _mm_extract_epi64(sse_qe[0], 1); > + v_qe1 = _mm_insert_epi64(v_qe1, ev[1].u64, 0); > + v_qe1 = _mm_insert_epi64(v_qe1, meta, 1); > + > + __m128i v_qe2 = _mm_setzero_si128(); > + meta = _mm_extract_epi64(sse_qe[1], 0); > + v_qe2 = _mm_insert_epi64(v_qe2, ev[2].u64, 0); > + v_qe2 = _mm_insert_epi64(v_qe2, meta, 1); > + > + __m128i v_qe3 = _mm_setzero_si128(); > + meta = _mm_extract_epi64(sse_qe[1], 1); > + v_qe3 = _mm_insert_epi64(v_qe3, ev[3].u64, 0); > + v_qe3 = _mm_insert_epi64(v_qe3, meta, 1); > + > + /* we have 4x XMM registers, one per QE. */ > + __m512i v_all_qes = _mm512_setzero_si512(); > + v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe0, 0); > + v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe1, 1); > + v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe2, 2); > + v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe3, 3); > + > + /* > + * store the 4x QEs in a single register to the scratch > + * space of the PMD > + */ > + _mm512_store_si512(&qe[0], v_all_qes); > +#else > + /* > + * Store the metadata to memory (use the double-precision > + * _mm_storeh_pd because there is no integer function for > + * storing the upper 64b): > + * qe[0] metadata = sse_qe[0][63:0] > + * qe[1] metadata = sse_qe[0][127:64] > + * qe[2] metadata = sse_qe[1][63:0] > + * qe[3] metadata = sse_qe[1][127:64] > + */ > + _mm_storel_epi64((__m128i *)&qe[0].u.opaque_data, > + sse_qe[0]); > + _mm_storeh_pd((double *)&qe[1].u.opaque_data, > + (__m128d)sse_qe[0]); > + _mm_storel_epi64((__m128i *)&qe[2].u.opaque_data, > + sse_qe[1]); > + _mm_storeh_pd((double *)&qe[3].u.opaque_data, > + (__m128d)sse_qe[1]); > + > + qe[0].data = ev[0].u64; > + qe[1].data = ev[1].u64; > + qe[2].data = ev[2].u64; > + qe[3].data = ev[3].u64; > +#endif > > break; > case 3: > -- > 2.25.1 >