From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id EDE122BBE for ; Mon, 3 Apr 2017 12:47:23 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=intel; t=1491216444; x=1522752444; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=CYZeIUKzlv5JQztp1QTgLzfXC7VcFLVzo3mc0ZPnSF8=; b=dEWzypIr2S5DnCOde6SIEHxBXZTe1wyrvgfzvJOnkCKC3ytqPEfV16MJ HsLqy4WOSCmuuZSD+WSqgdzYxj6/+g==; Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Apr 2017 03:47:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,270,1486454400"; d="scan'208";a="1150208761" Received: from irsmsx102.ger.corp.intel.com ([163.33.3.155]) by fmsmga002.fm.intel.com with ESMTP; 03 Apr 2017 03:47:21 -0700 Received: from irsmsx109.ger.corp.intel.com ([169.254.13.12]) by IRSMSX102.ger.corp.intel.com ([169.254.2.153]) with mapi id 14.03.0319.002; Mon, 3 Apr 2017 11:47:21 +0100 From: "Ananyev, Konstantin" To: "Pei, Yulong" , Vladyslav Buslov , "Zhang, Helin" , "Wu, Jingjing" , "Yigit, Ferruh" CC: "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH] net/i40e: add packet prefetch Thread-Index: AQHSknqtWn3KyjPvX0+h8khrm1DgvaGv4ioAgAPHd2A= Date: Mon, 3 Apr 2017 10:47:20 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772583FAE4246@IRSMSX109.ger.corp.intel.com> References: <1488365813-12442-1-git-send-email-vladyslav.buslov@harmonicinc.com> <188971FCDA171749BED5DA74ABF3E6F03B6ACF0D@shsmsx102.ccr.corp.intel.com> In-Reply-To: <188971FCDA171749BED5DA74ABF3E6F03B6ACF0D@shsmsx102.ccr.corp.intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] net/i40e: add packet prefetch X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Apr 2017 10:47:24 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pei, Yulong > Sent: Saturday, April 1, 2017 3:02 AM > To: Vladyslav Buslov ; Zhang, Helin ; Wu, Jingjing ; > Yigit, Ferruh > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH] net/i40e: add packet prefetch >=20 > Hi All >=20 > In Non-vector mode, without this patch, single core performance can reach= 37.576Mpps with 64Byte packet, > But after applied this patch , single core performance downgrade to 34.34= 3Mpps with 64Byte packet. >=20 > Best Regards > Yulong Pei >=20 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladyslav Buslov > Sent: Wednesday, March 1, 2017 6:57 PM > To: Zhang, Helin ; Wu, Jingjing ; Yigit, Ferruh > Cc: dev@dpdk.org > Subject: [dpdk-dev] [PATCH] net/i40e: add packet prefetch >=20 > Prefetch both cache lines of mbuf and first cache line of payload if CONF= IG_RTE_PMD_PACKET_PREFETCH is set. >=20 > Signed-off-by: Vladyslav Buslov > --- > drivers/net/i40e/i40e_rxtx.c | 20 ++++++++++++++++---- > 1 file changed, 16 insertions(+), 4 deletions(-) >=20 > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c = index 48429cc..2b4e5c9 100644 > --- a/drivers/net/i40e/i40e_rxtx.c > +++ b/drivers/net/i40e/i40e_rxtx.c > @@ -100,6 +100,12 @@ > #define I40E_TX_OFFLOAD_NOTSUP_MASK \ > (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK) >=20 > +#ifdef RTE_PMD_PACKET_PREFETCH > +#define rte_packet_prefetch(p) rte_prefetch0(p) > +#else > +#define rte_packet_prefetch(p) do {} while (0) > +#endif > + > static uint16_t i40e_xmit_pkts_simple(void *tx_queue, > struct rte_mbuf **tx_pkts, > uint16_t nb_pkts); > @@ -495,6 +501,9 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq) > /* Translate descriptor info to mbuf parameters */ > for (j =3D 0; j < nb_dd; j++) { > mb =3D rxep[j].mbuf; > + rte_packet_prefetch( > + RTE_PTR_ADD(mb->buf_addr, > + RTE_PKTMBUF_HEADROOM)); > qword1 =3D rte_le_to_cpu_64(\ > rxdp[j].wb.qword1.status_error_len); > pkt_len =3D ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >> @@ -578,9 +5= 87,11 @@ > i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq) >=20 > rxdp =3D &rxq->rx_ring[alloc_idx]; > for (i =3D 0; i < rxq->rx_free_thresh; i++) { > - if (likely(i < (rxq->rx_free_thresh - 1))) > + if (likely(i < (rxq->rx_free_thresh - 1))) { > /* Prefetch next mbuf */ > - rte_prefetch0(rxep[i + 1].mbuf); > + rte_packet_prefetch(rxep[i + 1].mbuf->cacheline0); > + rte_packet_prefetch(rxep[i + 1].mbuf->cacheline1); As I can see the line aove is the only real difference in that patch. If that so, might be worth to re-run perf tests witout that line? Konstantin > + } >=20 > mb =3D rxep[i].mbuf; > rte_mbuf_refcnt_set(mb, 1); > @@ -752,7 +763,8 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_p= kts, uint16_t nb_pkts) > I40E_RXD_QW1_LENGTH_PBUF_SHIFT) - rxq->crc_len; >=20 > rxm->data_off =3D RTE_PKTMBUF_HEADROOM; > - rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM)); > + rte_packet_prefetch(RTE_PTR_ADD(rxm->buf_addr, > + RTE_PKTMBUF_HEADROOM)); > rxm->nb_segs =3D 1; > rxm->next =3D NULL; > rxm->pkt_len =3D rx_packet_len; > @@ -939,7 +951,7 @@ i40e_recv_scattered_pkts(void *rx_queue, > first_seg->ol_flags |=3D pkt_flags; >=20 > /* Prefetch data of first segment, if configured to do so. */ > - rte_prefetch0(RTE_PTR_ADD(first_seg->buf_addr, > + rte_packet_prefetch(RTE_PTR_ADD(first_seg->buf_addr, > first_seg->data_off)); > rx_pkts[nb_rx++] =3D first_seg; > first_seg =3D NULL; > -- > 2.1.4