Re: [dpdk-dev] [PATCH] net/i40e: add packet prefetch

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Bruce Richardson <bruce.richardson@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "Pei, Yulong" <yulong.pei@intel.com>,
	Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>,
	"Zhang, Helin" <helin.zhang@intel.com>,
	"Wu, Jingjing" <jingjing.wu@intel.com>,
	"Yigit, Ferruh" <ferruh.yigit@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] net/i40e: add packet prefetch
Date: Thu, 6 Apr 2017 10:54:05 +0100	[thread overview]
Message-ID: <20170406095405.GA3564@bricha3-MOBL3.ger.corp.intel.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772583FAE4246@IRSMSX109.ger.corp.intel.com>

On Mon, Apr 03, 2017 at 10:47:20AM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pei, Yulong
> > Sent: Saturday, April 1, 2017 3:02 AM
> > To: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>; Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> > Yigit, Ferruh <ferruh.yigit@intel.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] net/i40e: add packet prefetch
> > 
> > Hi All
> > 
> > In Non-vector mode, without this patch, single core performance can reach 37.576Mpps with 64Byte packet,
> > But after applied this patch , single core performance downgrade to 34.343Mpps with 64Byte packet.
> > 
> > Best Regards
> > Yulong Pei
> > 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladyslav Buslov
> > Sent: Wednesday, March 1, 2017 6:57 PM
> > To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
> > Cc: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH] net/i40e: add packet prefetch
> > 
> > Prefetch both cache lines of mbuf and first cache line of payload if CONFIG_RTE_PMD_PACKET_PREFETCH is set.
> > 
> > Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
> > ---
> >  drivers/net/i40e/i40e_rxtx.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 48429cc..2b4e5c9 100644
> > --- a/drivers/net/i40e/i40e_rxtx.c
> > +++ b/drivers/net/i40e/i40e_rxtx.c
> > @@ -100,6 +100,12 @@
> >  #define I40E_TX_OFFLOAD_NOTSUP_MASK \
> >  		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
> > 
> > +#ifdef RTE_PMD_PACKET_PREFETCH
> > +#define rte_packet_prefetch(p)	rte_prefetch0(p)
> > +#else
> > +#define rte_packet_prefetch(p)	do {} while (0)
> > +#endif
> > +
> >  static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
> >  				      struct rte_mbuf **tx_pkts,
> >  				      uint16_t nb_pkts);
> > @@ -495,6 +501,9 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> >  		/* Translate descriptor info to mbuf parameters */
> >  		for (j = 0; j < nb_dd; j++) {
> >  			mb = rxep[j].mbuf;
> > +			rte_packet_prefetch(
> > +				RTE_PTR_ADD(mb->buf_addr,
> > +						RTE_PKTMBUF_HEADROOM));
> >  			qword1 = rte_le_to_cpu_64(\
> >  				rxdp[j].wb.qword1.status_error_len);
> >  			pkt_len = ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >> @@ -578,9 +587,11 @@
> > i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq)
> > 
> >  	rxdp = &rxq->rx_ring[alloc_idx];
> >  	for (i = 0; i < rxq->rx_free_thresh; i++) {
> > -		if (likely(i < (rxq->rx_free_thresh - 1)))
> > +		if (likely(i < (rxq->rx_free_thresh - 1))) {
> >  			/* Prefetch next mbuf */
> > -			rte_prefetch0(rxep[i + 1].mbuf);
> > +			rte_packet_prefetch(rxep[i + 1].mbuf->cacheline0);
> > +			rte_packet_prefetch(rxep[i + 1].mbuf->cacheline1);
> 
> As I can see the line aove is the only real difference in that patch.
> If that so, might be worth to re-run perf tests witout that line?
> Konstantin
>
The prefetch for the packet buf_addr+headroom above also looks new.
Are both needed to get the performance boost you see?
We should also investigate if the same effect can be got using a
runtime option, rather than a compile-time setting. That would give us
the best of both worlds.

/Bruce

     prev parent reply	other threads:[~2017-04-06  9:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01 10:56 Vladyslav Buslov
2017-03-07 10:37 ` Ferruh Yigit
2017-04-01  2:01 ` Pei, Yulong
2017-04-03 10:31   ` Ferruh Yigit
2017-04-06  9:29     ` Vladyslav Buslov
2017-04-06  9:54       ` Bruce Richardson
2017-04-03 10:47   ` Ananyev, Konstantin
2017-04-06  9:54     ` Bruce Richardson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170406095405.GA3564@bricha3-MOBL3.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=helin.zhang@intel.com \
    --cc=jingjing.wu@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=vladyslav.buslov@harmonicinc.com \
    --cc=yulong.pei@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).