From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bruce.richardson@intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 08DA968C3
 for <dev@dpdk.org>; Wed, 17 Sep 2014 17:33:34 +0200 (CEST)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP; 17 Sep 2014 08:36:03 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="387455441"
Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157])
 by FMSMGA003.fm.intel.com with ESMTP; 17 Sep 2014 08:30:32 -0700
Received: from irsmsx105.ger.corp.intel.com (163.33.3.28) by
 IRSMSX103.ger.corp.intel.com (163.33.3.157) with Microsoft SMTP Server (TLS)
 id 14.3.195.1; Wed, 17 Sep 2014 16:35:20 +0100
Received: from irsmsx103.ger.corp.intel.com ([169.254.3.112]) by
 IRSMSX105.ger.corp.intel.com ([169.254.7.158]) with mapi id 14.03.0195.001;
 Wed, 17 Sep 2014 16:35:20 +0100
From: "Richardson, Bruce" <bruce.richardson@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>
Thread-Topic: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve
 slow-path tx perf
Thread-Index: AQHP0l6UcfMND56eKE+7WeGe29qoUZwFYHuAgAAS+mA=
Date: Wed, 17 Sep 2014 15:35:19 +0000
Message-ID: <59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com>
References: <1410948102-12740-1-git-send-email-bruce.richardson@intel.com>
 <1410948102-12740-3-git-send-email-bruce.richardson@intel.com>
 <20140917152103.GE4213@localhost.localdomain>
In-Reply-To: <20140917152103.GE4213@localhost.localdomain>
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path
 tx perf
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Sep 2014 15:33:35 -0000


> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Wednesday, September 17, 2014 4:21 PM
> To: Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-p=
ath tx
> perf
>=20
> On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote:
> > Make a small improvement to slow path TX performance by adding in a
> > prefetch for the second mbuf cache line.
> > Also move assignment of l2/l3 length values only when needed.
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > index 6f702b3..c0bb49f 100644
> > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts,
> >  		ixgbe_xmit_cleanup(txq);
> >  	}
> >
> > +	rte_prefetch0(&txe->mbuf->pool);
> > +
>=20
> Can you explain what all of these prefetches are doing?  It looks to me l=
ike
> they're just fetching the first caheline of the mempool structure, which =
it
> appears amounts to the pools name.  I don't see that having any use here.
>=20
This does make a decent enough performance difference in my tests (the amou=
nt varies depending on the RX path being used by testpmd).=20

What I've done with the prefetches is two-fold:
1) changed it from prefetching the mbuf (first cache line) to prefetching t=
he mbuf pool pointer (second cache line) so that when we go to access the p=
ool pointer to free transmitted mbufs we don't get a cache miss. When clear=
ing the ring and freeing mbufs, the pool pointer is the only mbuf field use=
d, so we don't need that first cache line.
2) changed the code to prefetch earlier - in effect to prefetch one mbuf ah=
ead. The original code prefetched the mbuf to be freed as soon as it starte=
d processing the mbuf to replace it. Instead now, every time we calculate w=
hat the next mbuf position is going to be we prefetch the mbuf in that posi=
tion (i.e. the mbuf pool pointer we are going to free the mbuf to), even wh=
ile we are still updating the previous mbuf slot on the ring. This gives th=
e prefetch much more time to resolve and get the data we need in the cache =
before we need it.

Hope this clarifies things.

/Bruce