From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 9A3FB8D9F for ; Tue, 27 Oct 2015 21:56:56 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP; 27 Oct 2015 13:56:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,206,1444719600"; d="scan'208";a="673009142" Received: from orsmsx101.amr.corp.intel.com ([10.22.225.128]) by orsmga003.jf.intel.com with ESMTP; 27 Oct 2015 13:56:45 -0700 Received: from orsmsx153.amr.corp.intel.com (10.22.226.247) by ORSMSX101.amr.corp.intel.com (10.22.225.128) with Microsoft SMTP Server (TLS) id 14.3.248.2; Tue, 27 Oct 2015 13:56:45 -0700 Received: from orsmsx102.amr.corp.intel.com ([169.254.1.29]) by ORSMSX153.amr.corp.intel.com ([169.254.12.164]) with mapi id 14.03.0248.002; Tue, 27 Oct 2015 13:56:44 -0700 From: "Polehn, Mike A" To: "dev@dpdk.org" Thread-Topic: [Patch 1/2] i40e simple tx: Larger list size (33 to 128) throughput optimization Thread-Index: AdEQ+XQ1HFsqOWmYQxyXLCGxHkWUfA== Date: Tue, 27 Oct 2015 20:56:44 +0000 Message-ID: <745DB4B8861F8E4B9849C970520ABBF14974C1FC@ORSMSX102.amr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsIiwiaWQiOiIwNjk2NTdmZC1kYjRlLTRiODktOWRjMC1hMTQyNGQwZWI3YzciLCJwcm9wcyI6W3sibiI6IkludGVsRGF0YUNsYXNzaWZpY2F0aW9uIiwidmFscyI6W3sidmFsdWUiOiJDVFBfSUMifV19XX0sIlN1YmplY3RMYWJlbHMiOltdLCJUTUNWZXJzaW9uIjoiMTUuNC4xMC4xOSIsIlRydXN0ZWRMYWJlbEhhc2giOiJWTzRybldKUEQ2V2s5UmZ5VUEralNsamtFNHFJK2hUY1FiTFltNmZZcmwwPSJ9 x-inteldataclassification: CTP_IC x-originating-ip: [10.22.254.139] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: [dpdk-dev] [Patch 1/2] i40e simple tx: Larger list size (33 to 128) throughput optimization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Oct 2015 20:56:57 -0000 Reduce the 32 packet list size focus for better packet list size range hand= ling. Changed maximum new buffer loop process size to the NIC queue free buffer c= ount per loop. Removed redundant single call check to just one call with focused loop. Remove NIC register update write from per loop to one per write driver call= to minimize CPU stalls waiting for multiple SMP synchronization points and for earlier NIC = register writes that often take large cycle counts to complete. For example with an output list = size of 64, the default=20 loops size of 32, when 33 packets are queued on descriptor table, the secon= d NIC register write will occur just after TX processing for 1 packet, resu= lting in a large CPU stall time. Used some standard variables to help reduce overhead of non-standard variab= le sizes. Reordered variable structure to put most active variables in first cache li= ne, better utilize=20 memory bytes inside cache line, and reduced active cache line count during = call. Signed-off-by: Mike A. Polehn diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index ec62f75..2032e06 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -64,6 +64,7 @@ #define DEFAULT_TX_FREE_THRESH 32 #define I40E_MAX_PKT_TYPE 256 #define I40E_RX_INPUT_BUF_MAX 256 +#define I40E_RX_FREE_THRESH_MIN 2 =20 #define I40E_TX_MAX_BURST 32 =20 @@ -942,6 +943,12 @@ check_rx_burst_bulk_alloc_preconditions(__rte_unused s= truct i40e_rx_queue *rxq) "rxq->rx_free_thresh=3D%d", rxq->nb_rx_desc, rxq->rx_free_thresh); ret =3D -EINVAL; + } else if (rxq->rx_free_thresh < I40E_RX_FREE_THRESH_MIN) { + PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions: " + "rxq->rx_free_thresh=3D%d, " + "I40E_RX_FREE_THRESH_MIN=3D%d", + rxq->rx_free_thresh, I40E_RX_FREE_THRESH_MIN); + ret =3D -EINVAL; } else if (!(rxq->nb_rx_desc < (I40E_MAX_RING_DESC - RTE_PMD_I40E_RX_MAX_BURST))) { PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions: " @@ -1058,9 +1065,8 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq) { volatile union i40e_rx_desc *rxdp; struct i40e_rx_entry *rxep; - struct rte_mbuf *mb; - unsigned alloc_idx, i; - uint64_t dma_addr; + struct rte_mbuf *pk, *npk; + unsigned alloc_idx, i, l; int diag; =20 /* Allocate buffers in bulk */ @@ -1076,22 +1082,36 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq) return -ENOMEM; } =20 + pk =3D rxep->mbuf; + rte_prefetch0(pk); + rxep++; + npk =3D rxep->mbuf; + rte_prefetch0(npk); + rxep++; + l =3D rxq->rx_free_thresh - 2; + rxdp =3D &rxq->rx_ring[alloc_idx]; for (i =3D 0; i < rxq->rx_free_thresh; i++) { - if (likely(i < (rxq->rx_free_thresh - 1))) + struct rte_mbuf *mb =3D pk; + pk =3D npk; + if (likely(i < l)) { /* Prefetch next mbuf */ - rte_prefetch0(rxep[i + 1].mbuf); - - mb =3D rxep[i].mbuf; - rte_mbuf_refcnt_set(mb, 1); - mb->next =3D NULL; + npk =3D rxep->mbuf; + rte_prefetch0(npk); + rxep++; + } mb->data_off =3D RTE_PKTMBUF_HEADROOM; + rte_mbuf_refcnt_set(mb, 1); mb->nb_segs =3D 1; mb->port =3D rxq->port_id; - dma_addr =3D rte_cpu_to_le_64(\ - RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb)); - rxdp[i].read.hdr_addr =3D 0; - rxdp[i].read.pkt_addr =3D dma_addr; + mb->next =3D NULL; + { + uint64_t dma_addr =3D rte_cpu_to_le_64( + RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb)); + rxdp->read.hdr_addr =3D dma_addr; + rxdp->read.pkt_addr =3D dma_addr; + } + rxdp++; } =20 rxq->rx_last_pos =3D alloc_idx + rxq->rx_free_thresh - 1;