From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5490CA04B6; Thu, 17 Sep 2020 11:13:15 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 649871C10F; Thu, 17 Sep 2020 11:13:14 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 39E7B1BEA1 for ; Thu, 17 Sep 2020 11:13:12 +0200 (CEST) IronPort-SDR: Exn/XmmQttsStM6kWuG4rPJAvLqLmQdiBy/C8+e91nN+RixpRA23WMPWhCHAbNT0LLItuXS+rI uPqP/5yFW5VQ== X-IronPort-AV: E=McAfee;i="6000,8403,9746"; a="244496754" X-IronPort-AV: E=Sophos;i="5.76,436,1592895600"; d="scan'208";a="244496754" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Sep 2020 02:13:11 -0700 IronPort-SDR: TGOBqMBPznUYqtq2PAZSJdLE33iOAnchJqAXKBgG4aAXZoot4HHka3wvW+NU1Iaeaq5JS3RPEb 0HrXBe2FjCsA== X-IronPort-AV: E=Sophos;i="5.76,436,1592895600"; d="scan'208";a="332074497" Received: from bricha3-mobl.ger.corp.intel.com ([10.213.24.6]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 17 Sep 2020 02:13:09 -0700 Date: Thu, 17 Sep 2020 10:13:03 +0100 From: Bruce Richardson To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: Wenzhuo Lu , Leyi Rong , dev@dpdk.org Message-ID: <20200917091303.GB1568@bricha3-MOBL.ger.corp.intel.com> References: <1599717545-106571-1-git-send-email-wenzhuo.lu@intel.com> <1600306778-46470-1-git-send-email-wenzhuo.lu@intel.com> <98CBD80474FA8B44BF855DF32C47DC35C612F7@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35C612F7@smartserver.smartshare.dk> Subject: Re: [dpdk-dev] [PATCH v2 0/3] enable AVX512 for iavf X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thu, Sep 17, 2020 at 09:37:29AM +0200, Morten Brørup wrote: > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wenzhuo Lu > > Sent: Thursday, September 17, 2020 3:40 AM > > > > AVX512 instructions is supported by more and more platforms. These > > instructions > > can be used in the data path to enhance the per-core performance of > > packet > > processing. > > Comparing with the existing implementation, this path set introduces > > some AVX512 > > instructions into the iavf data path, and we get a better per-code > > throughput. > > > > v2: > > Update meson.build. > > Repalce the deprecated 'buf_physaddr' by 'buf_iova'. > > > > Wenzhuo Lu (3): > > net/iavf: enable AVX512 for legacy RX > > net/iavf: enable AVX512 for flexible RX > > net/iavf: enable AVX512 for TX > > > > doc/guides/rel_notes/release_20_11.rst | 3 + > > drivers/net/iavf/iavf_ethdev.c | 3 +- > > drivers/net/iavf/iavf_rxtx.c | 69 +- > > drivers/net/iavf/iavf_rxtx.h | 18 + > > drivers/net/iavf/iavf_rxtx_vec_avx512.c | 1720 > > +++++++++++++++++++++++++++++++ > > drivers/net/iavf/meson.build | 17 + > > 6 files changed, 1818 insertions(+), 12 deletions(-) > > create mode 100644 drivers/net/iavf/iavf_rxtx_vec_avx512.c > > > > -- > > 1.9.3 > > > > I am not sure I understand the full context here, so please bear with me if I'm completely off... > > With this patch set, it looks like the driver manipulates the mempool cache directly, bypassing the libararies encapsulating it. > > Isn't that going deeper into a library than expected... What if the implementation of the mempool library changes radically? > > And if there are performance gains to be achieved by using vector instructions for manipulating the mempool, perhaps your vector optimizations should go into the mempool library instead? > Looking specifically at the descriptor re-arm code, the benefit from working off the mempool cache directly comes from saving loads by merging the code blocks, rather than directly from the vectorization itself - though the vectorization doesn't hurt. The original code having a separate mempool function worked roughly like below: 1. mempool code loads mbuf pointers from cache 2. mempool code writes mbuf pointers to the SW ring for the NIC 3. driver code loads the mempool pointers from the SW ring 4. driver code then does the rest of the descriptor re-arm. The benefit comes from eliminating step 3, the loads in the driver, which are dependent upon the previous stores. By having the driver itself read from the mempool cache (the code still uses mempool functions for every other part, since everything beyond the cache depends on the ring/stack/bucket implementation), we can have the stores go out, and while they are completing reuse the already-loaded data to do the descriptor rearm. Hope this clarifies things. /Bruce