From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 69425B3AE for ; Tue, 23 Sep 2014 13:02:40 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP; 23 Sep 2014 04:08:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,579,1406617200"; d="scan'208";a="603959842" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga002.fm.intel.com with ESMTP; 23 Sep 2014 04:08:19 -0700 Received: from sivswdev02.ir.intel.com (sivswdev02.ir.intel.com [10.237.217.46]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id s8NB8Iwn012722; Tue, 23 Sep 2014 12:08:18 +0100 Received: from sivswdev02.ir.intel.com (localhost [127.0.0.1]) by sivswdev02.ir.intel.com with ESMTP id s8NB8IMK010428; Tue, 23 Sep 2014 12:08:18 +0100 Received: (from bricha3@localhost) by sivswdev02.ir.intel.com with id s8NB8IGs010423; Tue, 23 Sep 2014 12:08:18 +0100 From: Bruce Richardson To: dev@dpdk.org Date: Tue, 23 Sep 2014 12:08:14 +0100 Message-Id: <1411470497-10209-3-git-send-email-bruce.richardson@intel.com> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <1411470497-10209-1-git-send-email-bruce.richardson@intel.com> References: <1410948102-12740-1-git-send-email-bruce.richardson@intel.com> <1411470497-10209-1-git-send-email-bruce.richardson@intel.com> Subject: [dpdk-dev] [PATCH v2 2/5] ixgbe: add prefetch to improve slow-path tx perf X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Sep 2014 11:02:40 -0000 Make a small improvement to slow path TX performance by adding in a prefetch for the second mbuf cache line. Also move assignment of l2/l3 length values only when needed. What I've done with the prefetches is two-fold: 1) changed it from prefetching the mbuf (first cache line) to prefetching the mbuf pool pointer (second cache line) so that when we go to access the pool pointer to free transmitted mbufs we don't get a cache miss. When clearing the ring and freeing mbufs, the pool pointer is the only mbuf field used, so we don't need that first cache line. 2) changed the code to prefetch earlier - in effect to prefetch one mbuf ahead. The original code prefetched the mbuf to be freed as soon as it started processing the mbuf to replace it. Instead now, every time we calculate what the next mbuf position is going to be we prefetch the mbuf in that position (i.e. the mbuf pool pointer we are going to free the mbuf to), even while we are still updating the previous mbuf slot on the ring. This gives the prefetch much more time to resolve and get the data we need in the cache before we need it. In terms of performance difference, a quick sanity test using testpmd on a Xeon (Sandy Bridge uarch) platform showed performance increases between approx 8-18%, depending on the particular RX path used in conjuntion with this TX path code. Changes in V2: * Expanded commit message with extra details of change. Signed-off-by: Bruce Richardson --- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c index 6f702b3..c0bb49f 100644 --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, ixgbe_xmit_cleanup(txq); } + rte_prefetch0(&txe->mbuf->pool); + /* TX loop */ for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) { new_ctx = 0; tx_pkt = *tx_pkts++; pkt_len = tx_pkt->pkt_len; - RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf); - /* * Determine how many (if any) context descriptors * are needed for offload functionality. */ ol_flags = tx_pkt->ol_flags; - vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci; - vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len; /* If hardware offload required */ tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK; if (tx_ol_req) { + vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci; + vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len; + /* If new context need be built or reuse the exist ctx. */ ctx = what_advctx_update(txq, tx_ol_req, vlan_macip_lens.data); @@ -720,7 +721,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, &txr[tx_id]; txn = &sw_ring[txe->next_id]; - RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf); + rte_prefetch0(&txn->mbuf->pool); if (txe->mbuf != NULL) { rte_pktmbuf_free_seg(txe->mbuf); @@ -749,6 +750,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, do { txd = &txr[tx_id]; txn = &sw_ring[txe->next_id]; + rte_prefetch0(&txn->mbuf->pool); if (txe->mbuf != NULL) rte_pktmbuf_free_seg(txe->mbuf); -- 1.9.3