From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 08DA968C3 for ; Wed, 17 Sep 2014 17:33:34 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP; 17 Sep 2014 08:36:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="387455441" Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157]) by FMSMGA003.fm.intel.com with ESMTP; 17 Sep 2014 08:30:32 -0700 Received: from irsmsx105.ger.corp.intel.com (163.33.3.28) by IRSMSX103.ger.corp.intel.com (163.33.3.157) with Microsoft SMTP Server (TLS) id 14.3.195.1; Wed, 17 Sep 2014 16:35:20 +0100 Received: from irsmsx103.ger.corp.intel.com ([169.254.3.112]) by IRSMSX105.ger.corp.intel.com ([169.254.7.158]) with mapi id 14.03.0195.001; Wed, 17 Sep 2014 16:35:20 +0100 From: "Richardson, Bruce" To: Neil Horman Thread-Topic: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx perf Thread-Index: AQHP0l6UcfMND56eKE+7WeGe29qoUZwFYHuAgAAS+mA= Date: Wed, 17 Sep 2014 15:35:19 +0000 Message-ID: <59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com> References: <1410948102-12740-1-git-send-email-bruce.richardson@intel.com> <1410948102-12740-3-git-send-email-bruce.richardson@intel.com> <20140917152103.GE4213@localhost.localdomain> In-Reply-To: <20140917152103.GE4213@localhost.localdomain> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx perf X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Sep 2014 15:33:35 -0000 > -----Original Message----- > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Wednesday, September 17, 2014 4:21 PM > To: Richardson, Bruce > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-p= ath tx > perf >=20 > On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote: > > Make a small improvement to slow path TX performance by adding in a > > prefetch for the second mbuf cache line. > > Also move assignment of l2/l3 length values only when needed. > > > > Signed-off-by: Bruce Richardson > > --- > > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++----- > > 1 file changed, 7 insertions(+), 5 deletions(-) > > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > index 6f702b3..c0bb49f 100644 > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf > **tx_pkts, > > ixgbe_xmit_cleanup(txq); > > } > > > > + rte_prefetch0(&txe->mbuf->pool); > > + >=20 > Can you explain what all of these prefetches are doing? It looks to me l= ike > they're just fetching the first caheline of the mempool structure, which = it > appears amounts to the pools name. I don't see that having any use here. >=20 This does make a decent enough performance difference in my tests (the amou= nt varies depending on the RX path being used by testpmd).=20 What I've done with the prefetches is two-fold: 1) changed it from prefetching the mbuf (first cache line) to prefetching t= he mbuf pool pointer (second cache line) so that when we go to access the p= ool pointer to free transmitted mbufs we don't get a cache miss. When clear= ing the ring and freeing mbufs, the pool pointer is the only mbuf field use= d, so we don't need that first cache line. 2) changed the code to prefetch earlier - in effect to prefetch one mbuf ah= ead. The original code prefetched the mbuf to be freed as soon as it starte= d processing the mbuf to replace it. Instead now, every time we calculate w= hat the next mbuf position is going to be we prefetch the mbuf in that posi= tion (i.e. the mbuf pool pointer we are going to free the mbuf to), even wh= ile we are still updating the previous mbuf slot on the ring. This gives th= e prefetch much more time to resolve and get the data we need in the cache = before we need it. Hope this clarifies things. /Bruce