From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id D57DFB3AB for ; Wed, 17 Sep 2014 19:54:02 +0200 (CEST) Received: from hmsreliant.think-freely.org ([2001:470:8:a08:7aac:c0ff:fec2:933b] helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1XUJVx-0002WF-HG; Wed, 17 Sep 2014 13:59:43 -0400 Date: Wed, 17 Sep 2014 13:59:36 -0400 From: Neil Horman To: "Richardson, Bruce" Message-ID: <20140917175936.GA13492@hmsreliant.think-freely.org> References: <1410948102-12740-1-git-send-email-bruce.richardson@intel.com> <1410948102-12740-3-git-send-email-bruce.richardson@intel.com> <20140917152103.GE4213@localhost.localdomain> <59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Status: No Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx perf X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Sep 2014 17:54:03 -0000 On Wed, Sep 17, 2014 at 03:35:19PM +0000, Richardson, Bruce wrote: > > > -----Original Message----- > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > Sent: Wednesday, September 17, 2014 4:21 PM > > To: Richardson, Bruce > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx > > perf > > > > On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote: > > > Make a small improvement to slow path TX performance by adding in a > > > prefetch for the second mbuf cache line. > > > Also move assignment of l2/l3 length values only when needed. > > > > > > Signed-off-by: Bruce Richardson > > > --- > > > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++----- > > > 1 file changed, 7 insertions(+), 5 deletions(-) > > > > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > > index 6f702b3..c0bb49f 100644 > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > > @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf > > **tx_pkts, > > > ixgbe_xmit_cleanup(txq); > > > } > > > > > > + rte_prefetch0(&txe->mbuf->pool); > > > + > > > > Can you explain what all of these prefetches are doing? It looks to me like > > they're just fetching the first caheline of the mempool structure, which it > > appears amounts to the pools name. I don't see that having any use here. > > > This does make a decent enough performance difference in my tests (the amount varies depending on the RX path being used by testpmd). > > What I've done with the prefetches is two-fold: > 1) changed it from prefetching the mbuf (first cache line) to prefetching the mbuf pool pointer (second cache line) so that when we go to access the pool pointer to free transmitted mbufs we don't get a cache miss. When clearing the ring and freeing mbufs, the pool pointer is the only mbuf field used, so we don't need that first cache line. ok, this makes some sense, but you're not guaranteed to either have that prefetch be needed, nor are you certain it will still be in cache by the time you get to the free call. Seems like it might be preferable to prefecth the data pointed to by tx_pkt, as you're sure to use that every loop iteration. > 2) changed the code to prefetch earlier - in effect to prefetch one mbuf ahead. The original code prefetched the mbuf to be freed as soon as it started processing the mbuf to replace it. Instead now, every time we calculate what the next mbuf position is going to be we prefetch the mbuf in that position (i.e. the mbuf pool pointer we are going to free the mbuf to), even while we are still updating the previous mbuf slot on the ring. This gives the prefetch much more time to resolve and get the data we need in the cache before we need it. > Again, early isn't necessecarily better, as it just means more time for the data in cache to get victimized. It seems like it would be better to prefetch the tx_pkts data a few cache lines ahead. Neil > Hope this clarifies things. > > /Bruce >