From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3194AA034E; Tue, 22 Feb 2022 07:10:46 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id F291C40DF4; Tue, 22 Feb 2022 07:10:45 +0100 (CET) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id 371F84068C for ; Tue, 22 Feb 2022 07:10:44 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645510244; x=1677046244; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BOQyCSpWLDlbwhtvgB1sceIJO0tPeZJ4exrSiHzP5lo=; b=bwDfEH4DnEjyLU0trgoHt4bQR4FxaZiwSZRdTE07y3i4pcffSWp9+A04 YHXA9IIuL/2SA0dans01LWxL5BijDsCDfakg7r/sUHM6cgWvV979YlGG4 3pdC+g/JwB3nQvnMmTlauh8vjUmmu9USVAxw6xKVJ7T9NbjeMe239u7kZ mtF5NCOj8v1azQhphOSt8LdbmQtkj4q3eaIOwod0H1QAqduQ5lVm1wCoF 7FfmZmLnOP9QoN597uaBI8sTgZOIWKoccd04yVJDxvfPdAHSr1ylKEvQA uJKZVL6cV6j4pT+U4SOdcESYMLdhWSHvNGAkKpO5rBEkROlUG5jguOZKt w==; X-IronPort-AV: E=McAfee;i="6200,9189,10265"; a="251571629" X-IronPort-AV: E=Sophos;i="5.88,387,1635231600"; d="scan'208";a="251571629" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Feb 2022 22:10:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,387,1635231600"; d="scan'208";a="706488686" Received: from npg-wuwenjun-dpdk-01.sh.intel.com ([10.67.110.181]) by orsmga005.jf.intel.com with ESMTP; 21 Feb 2022 22:10:41 -0800 From: Wenjun Wu To: dev@dpdk.org, qiming.yang@intel.com, qi.z.zhang@intel.com Cc: harry.van.haaren@intel.com, simei.su@intel.com, Wenjun Wu Subject: [PATCH v2] net/ice: improve performance of RX timestamp offload Date: Tue, 22 Feb 2022 13:50:16 +0800 Message-Id: <20220222055016.298763-1-wenjun1.wu@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220222051600.217761-1-wenjun1.wu@intel.com> References: <20220222051600.217761-1-wenjun1.wu@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Previously, each time a burst of packets is received, SW reads HW register and assembles it and the timestamp from descriptor together to get the complete 64 bits timestamp. This patch optimizes the algorithm. The SW only needs to check the monotonicity of the low 32bits timestamp to avoid crossing borders. Each time before SW receives a burst of packets, it should check the time difference between current time and last update time to avoid the low 32 bits timestamp cycling twice. Signed-off-by: Wenjun Wu --- v2: add conditional compilation --- drivers/net/ice/ice_ethdev.h | 3 + drivers/net/ice/ice_rxtx.c | 133 ++++++++++++++++++++++++++--------- 2 files changed, 103 insertions(+), 33 deletions(-) diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h index 3ed580d438..6778941d7d 100644 --- a/drivers/net/ice/ice_ethdev.h +++ b/drivers/net/ice/ice_ethdev.h @@ -554,6 +554,9 @@ struct ice_adapter { struct rte_timecounter tx_tstamp_tc; bool ptp_ena; uint64_t time_hw; + uint32_t hw_time_high; /* high 32 bits of timestamp */ + uint32_t hw_time_low; /* low 32 bits of timestamp */ + uint64_t hw_time_update; /* SW time of HW record updating */ struct ice_fdir_prof_info fdir_prof_info[ICE_MAX_PTGS]; struct ice_rss_prof_info rss_prof_info[ICE_MAX_PTGS]; /* True if DCF state of the associated PF is on */ diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 4f218bcd0d..6bb15ee825 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -1576,7 +1576,6 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq) #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC struct ice_vsi *vsi = rxq->vsi; struct ice_hw *hw = ICE_VSI_TO_HW(vsi); - uint64_t ts_ns; struct ice_adapter *ad = rxq->vsi->adapter; #endif rxdp = &rxq->rx_ring[rxq->rx_tail]; @@ -1588,8 +1587,17 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq) if (!(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S))) return 0; - if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) - rxq->hw_register_set = 1; +#ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC + if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) { + uint64_t sw_cur_time = rte_get_timer_cycles() / (rte_get_timer_hz() / 1000); + + if (sw_cur_time - ad->hw_time_update > 4) { + ad->hw_time_high = ICE_READ_REG(hw, GLTSYN_TIME_H(0)); + ad->hw_time_low = 0; + ad->hw_time_update = sw_cur_time; + } + } +#endif /** * Scan LOOK_AHEAD descriptors at a time to determine which @@ -1625,14 +1633,25 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq) rxd_to_pkt_fields_ops[rxq->rxdid](rxq, mb, &rxdp[j]); #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC if (ice_timestamp_dynflag > 0) { - ts_ns = ice_tstamp_convert_32b_64b(hw, ad, - rxq->hw_register_set, - rte_le_to_cpu_32(rxdp[j].wb.flex_ts.ts_high)); - rxq->hw_register_set = 0; + rxq->time_high = + rte_le_to_cpu_32(rxdp[j].wb.flex_ts.ts_high); *RTE_MBUF_DYNFIELD(mb, - ice_timestamp_dynfield_offset, - rte_mbuf_timestamp_t *) = ts_ns; - mb->ol_flags |= ice_timestamp_dynflag; + ice_timestamp_dynfield_offset, + uint32_t *) = rxq->time_high; + if (rxq->time_high > ad->hw_time_low) + *RTE_MBUF_DYNFIELD(mb, + (ice_timestamp_dynfield_offset + 4), + uint32_t *) = ad->hw_time_high; + else { + ad->hw_time_high += 1; + *RTE_MBUF_DYNFIELD(mb, + (ice_timestamp_dynfield_offset + 4), + uint32_t *) = ad->hw_time_high; + ad->hw_time_update = + rte_get_timer_cycles() / + (rte_get_timer_hz() / 1000); + } + pkt_flags |= ice_timestamp_dynflag; } if (ad->ptp_ena && ((mb->packet_type & @@ -1657,6 +1676,11 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq) break; } + if (nb_rx > 0 && rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) + ad->hw_time_low = *RTE_MBUF_DYNFIELD(rxq->rx_stage[nb_rx - 1], + ice_timestamp_dynfield_offset, + uint32_t *); + /* Clear software ring entries */ for (i = 0; i < nb_rx; i++) rxq->sw_ring[rxq->rx_tail + i].mbuf = NULL; @@ -1833,12 +1857,18 @@ ice_recv_scattered_pkts(void *rx_queue, #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC struct ice_vsi *vsi = rxq->vsi; struct ice_hw *hw = ICE_VSI_TO_HW(vsi); - uint64_t ts_ns; struct ice_adapter *ad = rxq->vsi->adapter; -#endif - if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) - rxq->hw_register_set = 1; + if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) { + uint64_t sw_cur_time = rte_get_timer_cycles() / (rte_get_timer_hz() / 1000); + + if (sw_cur_time - ad->hw_time_update > 4) { + ad->hw_time_high = ICE_READ_REG(hw, GLTSYN_TIME_H(0)); + ad->hw_time_low = 0; + ad->hw_time_update = sw_cur_time; + } + } +#endif while (nb_rx < nb_pkts) { rxdp = &rx_ring[rx_id]; @@ -1951,14 +1981,25 @@ ice_recv_scattered_pkts(void *rx_queue, pkt_flags = ice_rxd_error_to_pkt_flags(rx_stat_err0); #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC if (ice_timestamp_dynflag > 0) { - ts_ns = ice_tstamp_convert_32b_64b(hw, ad, - rxq->hw_register_set, - rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high)); - rxq->hw_register_set = 0; - *RTE_MBUF_DYNFIELD(first_seg, - ice_timestamp_dynfield_offset, - rte_mbuf_timestamp_t *) = ts_ns; - first_seg->ol_flags |= ice_timestamp_dynflag; + rxq->time_high = + rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high); + *RTE_MBUF_DYNFIELD(rxm, + ice_timestamp_dynfield_offset, + uint32_t *) = rxq->time_high; + if (rxq->time_high > ad->hw_time_low) + *RTE_MBUF_DYNFIELD(rxm, + (ice_timestamp_dynfield_offset + 4), + uint32_t *) = ad->hw_time_high; + else { + ad->hw_time_high += 1; + *RTE_MBUF_DYNFIELD(rxm, + (ice_timestamp_dynfield_offset + 4), + uint32_t *) = ad->hw_time_high; + ad->hw_time_update = + rte_get_timer_cycles() / + (rte_get_timer_hz() / 1000); + } + pkt_flags |= ice_timestamp_dynflag; } if (ad->ptp_ena && ((first_seg->packet_type & RTE_PTYPE_L2_MASK) @@ -1977,6 +2018,11 @@ ice_recv_scattered_pkts(void *rx_queue, first_seg = NULL; } + if (nb_rx > 0 && rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) + ad->hw_time_low = *RTE_MBUF_DYNFIELD(rx_pkts[nb_rx - 1], + ice_timestamp_dynfield_offset, + uint32_t *); + /* Record index of the next RX descriptor to probe. */ rxq->rx_tail = rx_id; rxq->pkt_first_seg = first_seg; @@ -2327,12 +2373,18 @@ ice_recv_pkts(void *rx_queue, #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC struct ice_vsi *vsi = rxq->vsi; struct ice_hw *hw = ICE_VSI_TO_HW(vsi); - uint64_t ts_ns; struct ice_adapter *ad = rxq->vsi->adapter; -#endif - if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) - rxq->hw_register_set = 1; + if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) { + uint64_t sw_cur_time = rte_get_timer_cycles() / (rte_get_timer_hz() / 1000); + + if (sw_cur_time - ad->hw_time_update > 4) { + ad->hw_time_high = ICE_READ_REG(hw, GLTSYN_TIME_H(0)); + ad->hw_time_low = 0; + ad->hw_time_update = sw_cur_time; + } + } +#endif while (nb_rx < nb_pkts) { rxdp = &rx_ring[rx_id]; @@ -2386,14 +2438,24 @@ ice_recv_pkts(void *rx_queue, pkt_flags = ice_rxd_error_to_pkt_flags(rx_stat_err0); #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC if (ice_timestamp_dynflag > 0) { - ts_ns = ice_tstamp_convert_32b_64b(hw, ad, - rxq->hw_register_set, - rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high)); - rxq->hw_register_set = 0; + rxq->time_high = rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high); *RTE_MBUF_DYNFIELD(rxm, - ice_timestamp_dynfield_offset, - rte_mbuf_timestamp_t *) = ts_ns; - rxm->ol_flags |= ice_timestamp_dynflag; + ice_timestamp_dynfield_offset, + uint32_t *) = rxq->time_high; + if (rxq->time_high > ad->hw_time_low) + *RTE_MBUF_DYNFIELD(rxm, + (ice_timestamp_dynfield_offset + 4), + uint32_t *) = ad->hw_time_high; + else { + ad->hw_time_high += 1; + *RTE_MBUF_DYNFIELD(rxm, + (ice_timestamp_dynfield_offset + 4), + uint32_t *) = ad->hw_time_high; + ad->hw_time_update = + rte_get_timer_cycles() / + (rte_get_timer_hz() / 1000); + } + pkt_flags |= ice_timestamp_dynflag; } if (ad->ptp_ena && ((rxm->packet_type & RTE_PTYPE_L2_MASK) == @@ -2408,6 +2470,11 @@ ice_recv_pkts(void *rx_queue, /* copy old mbuf to rx_pkts */ rx_pkts[nb_rx++] = rxm; } + + if (nb_rx > 0 && rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) + ad->hw_time_low = *RTE_MBUF_DYNFIELD(rx_pkts[nb_rx - 1], + ice_timestamp_dynfield_offset, + uint32_t *); rxq->rx_tail = rx_id; /** * If the number of free RX descriptors is greater than the RX free -- 2.25.1