From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 5490CA04B6;
	Thu, 17 Sep 2020 11:13:15 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 649871C10F;
	Thu, 17 Sep 2020 11:13:14 +0200 (CEST)
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
 by dpdk.org (Postfix) with ESMTP id 39E7B1BEA1
 for <dev@dpdk.org>; Thu, 17 Sep 2020 11:13:12 +0200 (CEST)
IronPort-SDR: Exn/XmmQttsStM6kWuG4rPJAvLqLmQdiBy/C8+e91nN+RixpRA23WMPWhCHAbNT0LLItuXS+rI
 uPqP/5yFW5VQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9746"; a="244496754"
X-IronPort-AV: E=Sophos;i="5.76,436,1592895600"; d="scan'208";a="244496754"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 17 Sep 2020 02:13:11 -0700
IronPort-SDR: TGOBqMBPznUYqtq2PAZSJdLE33iOAnchJqAXKBgG4aAXZoot4HHka3wvW+NU1Iaeaq5JS3RPEb
 0HrXBe2FjCsA==
X-IronPort-AV: E=Sophos;i="5.76,436,1592895600"; d="scan'208";a="332074497"
Received: from bricha3-mobl.ger.corp.intel.com ([10.213.24.6])
 by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA;
 17 Sep 2020 02:13:09 -0700
Date: Thu, 17 Sep 2020 10:13:03 +0100
From: Bruce Richardson <bruce.richardson@intel.com>
To: Morten =?iso-8859-1?Q?Br=F8rup?= <mb@smartsharesystems.com>
Cc: Wenzhuo Lu <wenzhuo.lu@intel.com>, Leyi Rong <leyi.rong@intel.com>,
 dev@dpdk.org
Message-ID: <20200917091303.GB1568@bricha3-MOBL.ger.corp.intel.com>
References: <1599717545-106571-1-git-send-email-wenzhuo.lu@intel.com>
 <1600306778-46470-1-git-send-email-wenzhuo.lu@intel.com>
 <98CBD80474FA8B44BF855DF32C47DC35C612F7@smartserver.smartshare.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35C612F7@smartserver.smartshare.dk>
Subject: Re: [dpdk-dev] [PATCH v2 0/3] enable AVX512 for iavf
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Thu, Sep 17, 2020 at 09:37:29AM +0200, Morten Brørup wrote:
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wenzhuo Lu
> > Sent: Thursday, September 17, 2020 3:40 AM
> > 
> > AVX512 instructions is supported by more and more platforms. These
> > instructions
> > can be used in the data path to enhance the per-core performance of
> > packet
> > processing.
> > Comparing with the existing implementation, this path set introduces
> > some AVX512
> > instructions into the iavf data path, and we get a better per-code
> > throughput.
> > 
> > v2:
> > Update meson.build.
> > Repalce the deprecated 'buf_physaddr' by 'buf_iova'.
> > 
> > Wenzhuo Lu (3):
> >   net/iavf: enable AVX512 for legacy RX
> >   net/iavf: enable AVX512 for flexible RX
> >   net/iavf: enable AVX512 for TX
> > 
> >  doc/guides/rel_notes/release_20_11.rst  |    3 +
> >  drivers/net/iavf/iavf_ethdev.c          |    3 +-
> >  drivers/net/iavf/iavf_rxtx.c            |   69 +-
> >  drivers/net/iavf/iavf_rxtx.h            |   18 +
> >  drivers/net/iavf/iavf_rxtx_vec_avx512.c | 1720
> > +++++++++++++++++++++++++++++++
> >  drivers/net/iavf/meson.build            |   17 +
> >  6 files changed, 1818 insertions(+), 12 deletions(-)
> >  create mode 100644 drivers/net/iavf/iavf_rxtx_vec_avx512.c
> > 
> > --
> > 1.9.3
> > 
> 
> I am not sure I understand the full context here, so please bear with me if I'm completely off...
> 
> With this patch set, it looks like the driver manipulates the mempool cache directly, bypassing the libararies encapsulating it.
> 
> Isn't that going deeper into a library than expected... What if the implementation of the mempool library changes radically?
> 
> And if there are performance gains to be achieved by using vector instructions for manipulating the mempool, perhaps your vector optimizations should go into the mempool library instead?
> 

Looking specifically at the descriptor re-arm code, the benefit from
working off the mempool cache directly comes from saving loads by merging
the code blocks, rather than directly from the vectorization itself -
though the vectorization doesn't hurt. The original code having a separate
mempool function worked roughly like below:

1. mempool code loads mbuf pointers from cache
2. mempool code writes mbuf pointers to the SW ring for the NIC
3. driver code loads the mempool pointers from the SW ring
4. driver code then does the rest of the descriptor re-arm.

The benefit comes from eliminating step 3, the loads in the driver, which
are dependent upon the previous stores. By having the driver itself read
from the mempool cache (the code still uses mempool functions for every
other part, since everything beyond the cache depends on the
ring/stack/bucket implementation), we can have the stores go out, and while
they are completing reuse the already-loaded data to do the descriptor
rearm.

Hope this clarifies things.

/Bruce