From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id AEB481E20 for ; Thu, 6 Apr 2017 11:54:10 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=intel; t=1491472450; x=1523008450; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=C82YBQJgqk8OpJHi0WQI7J9BIOJ1z0gaN2dZuYwcbQM=; b=wLFu03s0VegbtqTrroUx85sEsDnowTIL2B0gGJJqAzsIF50umxbV2KHz yVfc4J1V+ZFiOqxfyTP7f1CkjBckDA==; Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Apr 2017 02:54:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,159,1488873600"; d="scan'208";a="245025712" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.221.140]) by fmsmga004.fm.intel.com with SMTP; 06 Apr 2017 02:54:06 -0700 Received: by (sSMTP sendmail emulation); Thu, 06 Apr 2017 10:54:06 +0100 Date: Thu, 6 Apr 2017 10:54:05 +0100 From: Bruce Richardson To: "Ananyev, Konstantin" Cc: "Pei, Yulong" , Vladyslav Buslov , "Zhang, Helin" , "Wu, Jingjing" , "Yigit, Ferruh" , "dev@dpdk.org" Message-ID: <20170406095405.GA3564@bricha3-MOBL3.ger.corp.intel.com> References: <1488365813-12442-1-git-send-email-vladyslav.buslov@harmonicinc.com> <188971FCDA171749BED5DA74ABF3E6F03B6ACF0D@shsmsx102.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB9772583FAE4246@IRSMSX109.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB9772583FAE4246@IRSMSX109.ger.corp.intel.com> Organization: Intel Research and =?iso-8859-1?Q?De=ACvel?= =?iso-8859-1?Q?opment?= Ireland Ltd. User-Agent: Mutt/1.8.0 (2017-02-23) Subject: Re: [dpdk-dev] [PATCH] net/i40e: add packet prefetch X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Apr 2017 09:54:11 -0000 On Mon, Apr 03, 2017 at 10:47:20AM +0000, Ananyev, Konstantin wrote: > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pei, Yulong > > Sent: Saturday, April 1, 2017 3:02 AM > > To: Vladyslav Buslov ; Zhang, Helin ; Wu, Jingjing ; > > Yigit, Ferruh > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH] net/i40e: add packet prefetch > > > > Hi All > > > > In Non-vector mode, without this patch, single core performance can reach 37.576Mpps with 64Byte packet, > > But after applied this patch , single core performance downgrade to 34.343Mpps with 64Byte packet. > > > > Best Regards > > Yulong Pei > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladyslav Buslov > > Sent: Wednesday, March 1, 2017 6:57 PM > > To: Zhang, Helin ; Wu, Jingjing ; Yigit, Ferruh > > Cc: dev@dpdk.org > > Subject: [dpdk-dev] [PATCH] net/i40e: add packet prefetch > > > > Prefetch both cache lines of mbuf and first cache line of payload if CONFIG_RTE_PMD_PACKET_PREFETCH is set. > > > > Signed-off-by: Vladyslav Buslov > > --- > > drivers/net/i40e/i40e_rxtx.c | 20 ++++++++++++++++---- > > 1 file changed, 16 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 48429cc..2b4e5c9 100644 > > --- a/drivers/net/i40e/i40e_rxtx.c > > +++ b/drivers/net/i40e/i40e_rxtx.c > > @@ -100,6 +100,12 @@ > > #define I40E_TX_OFFLOAD_NOTSUP_MASK \ > > (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK) > > > > +#ifdef RTE_PMD_PACKET_PREFETCH > > +#define rte_packet_prefetch(p) rte_prefetch0(p) > > +#else > > +#define rte_packet_prefetch(p) do {} while (0) > > +#endif > > + > > static uint16_t i40e_xmit_pkts_simple(void *tx_queue, > > struct rte_mbuf **tx_pkts, > > uint16_t nb_pkts); > > @@ -495,6 +501,9 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq) > > /* Translate descriptor info to mbuf parameters */ > > for (j = 0; j < nb_dd; j++) { > > mb = rxep[j].mbuf; > > + rte_packet_prefetch( > > + RTE_PTR_ADD(mb->buf_addr, > > + RTE_PKTMBUF_HEADROOM)); > > qword1 = rte_le_to_cpu_64(\ > > rxdp[j].wb.qword1.status_error_len); > > pkt_len = ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >> @@ -578,9 +587,11 @@ > > i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq) > > > > rxdp = &rxq->rx_ring[alloc_idx]; > > for (i = 0; i < rxq->rx_free_thresh; i++) { > > - if (likely(i < (rxq->rx_free_thresh - 1))) > > + if (likely(i < (rxq->rx_free_thresh - 1))) { > > /* Prefetch next mbuf */ > > - rte_prefetch0(rxep[i + 1].mbuf); > > + rte_packet_prefetch(rxep[i + 1].mbuf->cacheline0); > > + rte_packet_prefetch(rxep[i + 1].mbuf->cacheline1); > > As I can see the line aove is the only real difference in that patch. > If that so, might be worth to re-run perf tests witout that line? > Konstantin > The prefetch for the packet buf_addr+headroom above also looks new. Are both needed to get the performance boost you see? We should also investigate if the same effect can be got using a runtime option, rather than a compile-time setting. That would give us the best of both worlds. /Bruce