From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id BF384A0093 for ; Tue, 10 May 2022 14:30:31 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id BA3214282D; Tue, 10 May 2022 14:30:31 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id BCAB742830 for ; Tue, 10 May 2022 14:30:29 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1652185829; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7EgkFFDvGoAnTdGY5JwnlBC0Wjp32oYh8zvH267iIOM=; b=JbAtyBE/SAhwesANIfgrF0CgAX+TYd6yy0s3WfI/9mZoML6uIVtEFCUApYD6F8IMYraNpi HdawWZJfiT7EelPCv7kmbQ1KFsa2eGPanTOHeklSYDTtseEKMF5TApMgDrKaGk4iRpAHkF qmlDKh+d2Ze+pL1ADGXZqozRbF6hL9A= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-266-fo87xw3wPIeZKxxfoyB1bw-1; Tue, 10 May 2022 08:30:26 -0400 X-MC-Unique: fo87xw3wPIeZKxxfoyB1bw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E4B7B8002BF; Tue, 10 May 2022 12:30:25 +0000 (UTC) Received: from rh.Home (unknown [10.39.195.199]) by smtp.corp.redhat.com (Postfix) with ESMTP id DA99740CFD06; Tue, 10 May 2022 12:30:24 +0000 (UTC) From: Kevin Traynor To: Wenjun Wu Cc: Qi Zhang , dpdk stable Subject: patch 'net/ice: improve performance of Rx timestamp offload' has been queued to stable release 21.11.2 Date: Tue, 10 May 2022 13:29:42 +0100 Message-Id: <20220510123010.159523-5-ktraynor@redhat.com> In-Reply-To: <20220510123010.159523-1-ktraynor@redhat.com> References: <20220510123010.159523-1-ktraynor@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.11.54.1 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ktraynor@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Hi, FYI, your patch has been queued to stable release 21.11.2 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 05/15/22. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. This will indicate if there was any rebasing needed to apply to the stable branch. If there were code changes for rebasing (ie: not only metadata diffs), please double check that the rebase was correctly done. Queued patches are on a temporary branch at: https://github.com/kevintraynor/dpdk-stable This queued commit can be viewed at: https://github.com/kevintraynor/dpdk-stable/commit/8ae457cbf54dd2b29491c6dceaa941d292777d45 Thanks. Kevin --- >From 8ae457cbf54dd2b29491c6dceaa941d292777d45 Mon Sep 17 00:00:00 2001 From: Wenjun Wu Date: Mon, 28 Feb 2022 15:36:07 +0800 Subject: [PATCH] net/ice: improve performance of Rx timestamp offload [ upstream commit 5543827fc6df39eabd51e2ca81f4462c291ea8d9 ] Previously, each time a burst of packets is received, SW reads HW register and assembles it and the timestamp from descriptor together to get the complete 64 bits timestamp. This patch optimizes the algorithm. The SW only needs to check the monotonicity of the low 32bits timestamp to avoid crossing borders. Each time before SW receives a burst of packets, it should check the time difference between current time and last update time to avoid the low 32 bits timestamp cycling twice. The patch proved a 50% ~ 70% single core performance improvement on a main stream Xeon server, this fix the performance gap for some use cases. Fixes: f9c561ffbccc ("net/ice: fix performance for Rx timestamp") Signed-off-by: Wenjun Wu Acked-by: Qi Zhang --- drivers/net/ice/ice_ethdev.h | 3 + drivers/net/ice/ice_rxtx.c | 118 +++++++++++++++++++++++++---------- 2 files changed, 88 insertions(+), 33 deletions(-) diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h index 1242177b42..c0d1baa1ec 100644 --- a/drivers/net/ice/ice_ethdev.h +++ b/drivers/net/ice/ice_ethdev.h @@ -530,4 +530,7 @@ struct ice_adapter { bool ptp_ena; uint64_t time_hw; + uint32_t hw_time_high; /* high 32 bits of timestamp */ + uint32_t hw_time_low; /* low 32 bits of timestamp */ + uint64_t hw_time_update; /* SW time of HW record updating */ struct ice_fdir_prof_info fdir_prof_info[ICE_MAX_PTGS]; struct ice_rss_prof_info rss_prof_info[ICE_MAX_PTGS]; diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 041f4bc91f..2dd2637fbb 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -1575,7 +1575,8 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq) uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl; #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC + bool is_tsinit = false; + uint64_t ts_ns; struct ice_vsi *vsi = rxq->vsi; struct ice_hw *hw = ICE_VSI_TO_HW(vsi); - uint64_t ts_ns; struct ice_adapter *ad = rxq->vsi->adapter; #endif @@ -1589,6 +1590,12 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq) return 0; - if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) - rxq->hw_register_set = 1; +#ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC + if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) { + uint64_t sw_cur_time = rte_get_timer_cycles() / (rte_get_timer_hz() / 1000); + + if (unlikely(sw_cur_time - ad->hw_time_update > 4)) + is_tsinit = 1; + } +#endif /** @@ -1626,12 +1633,24 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq) #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC if (ice_timestamp_dynflag > 0) { - ts_ns = ice_tstamp_convert_32b_64b(hw, ad, - rxq->hw_register_set, - rte_le_to_cpu_32(rxdp[j].wb.flex_ts.ts_high)); - rxq->hw_register_set = 0; + rxq->time_high = + rte_le_to_cpu_32(rxdp[j].wb.flex_ts.ts_high); + if (unlikely(is_tsinit)) { + ts_ns = ice_tstamp_convert_32b_64b(hw, ad, 1, + rxq->time_high); + ad->hw_time_low = (uint32_t)ts_ns; + ad->hw_time_high = (uint32_t)(ts_ns >> 32); + is_tsinit = false; + } else { + if (rxq->time_high < ad->hw_time_low) + ad->hw_time_high += 1; + ts_ns = (uint64_t)ad->hw_time_high << 32 | rxq->time_high; + ad->hw_time_low = rxq->time_high; + } + ad->hw_time_update = rte_get_timer_cycles() / + (rte_get_timer_hz() / 1000); *RTE_MBUF_DYNFIELD(mb, - ice_timestamp_dynfield_offset, - rte_mbuf_timestamp_t *) = ts_ns; - mb->ol_flags |= ice_timestamp_dynflag; + ice_timestamp_dynfield_offset, + rte_mbuf_timestamp_t *) = ts_ns; + pkt_flags |= ice_timestamp_dynflag; } @@ -1832,13 +1851,18 @@ ice_recv_scattered_pkts(void *rx_queue, uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl; #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC + bool is_tsinit = false; + uint64_t ts_ns; struct ice_vsi *vsi = rxq->vsi; struct ice_hw *hw = ICE_VSI_TO_HW(vsi); - uint64_t ts_ns; struct ice_adapter *ad = rxq->vsi->adapter; + + if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) { + uint64_t sw_cur_time = rte_get_timer_cycles() / (rte_get_timer_hz() / 1000); + + if (unlikely(sw_cur_time - ad->hw_time_update > 4)) + is_tsinit = true; + } #endif - if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) - rxq->hw_register_set = 1; - while (nb_rx < nb_pkts) { rxdp = &rx_ring[rx_id]; @@ -1952,12 +1976,23 @@ ice_recv_scattered_pkts(void *rx_queue, #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC if (ice_timestamp_dynflag > 0) { - ts_ns = ice_tstamp_convert_32b_64b(hw, ad, - rxq->hw_register_set, - rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high)); - rxq->hw_register_set = 0; - *RTE_MBUF_DYNFIELD(first_seg, - ice_timestamp_dynfield_offset, - rte_mbuf_timestamp_t *) = ts_ns; - first_seg->ol_flags |= ice_timestamp_dynflag; + rxq->time_high = + rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high); + if (unlikely(is_tsinit)) { + ts_ns = ice_tstamp_convert_32b_64b(hw, ad, 1, rxq->time_high); + ad->hw_time_low = (uint32_t)ts_ns; + ad->hw_time_high = (uint32_t)(ts_ns >> 32); + is_tsinit = false; + } else { + if (rxq->time_high < ad->hw_time_low) + ad->hw_time_high += 1; + ts_ns = (uint64_t)ad->hw_time_high << 32 | rxq->time_high; + ad->hw_time_low = rxq->time_high; + } + ad->hw_time_update = rte_get_timer_cycles() / + (rte_get_timer_hz() / 1000); + *RTE_MBUF_DYNFIELD(rxm, + (ice_timestamp_dynfield_offset), + rte_mbuf_timestamp_t *) = ts_ns; + pkt_flags |= ice_timestamp_dynflag; } @@ -2326,13 +2361,18 @@ ice_recv_pkts(void *rx_queue, uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl; #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC + bool is_tsinit = false; + uint64_t ts_ns; struct ice_vsi *vsi = rxq->vsi; struct ice_hw *hw = ICE_VSI_TO_HW(vsi); - uint64_t ts_ns; struct ice_adapter *ad = rxq->vsi->adapter; + + if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) { + uint64_t sw_cur_time = rte_get_timer_cycles() / (rte_get_timer_hz() / 1000); + + if (unlikely(sw_cur_time - ad->hw_time_update > 4)) + is_tsinit = 1; + } #endif - if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) - rxq->hw_register_set = 1; - while (nb_rx < nb_pkts) { rxdp = &rx_ring[rx_id]; @@ -2387,12 +2427,23 @@ ice_recv_pkts(void *rx_queue, #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC if (ice_timestamp_dynflag > 0) { - ts_ns = ice_tstamp_convert_32b_64b(hw, ad, - rxq->hw_register_set, - rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high)); - rxq->hw_register_set = 0; + rxq->time_high = + rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high); + if (unlikely(is_tsinit)) { + ts_ns = ice_tstamp_convert_32b_64b(hw, ad, 1, rxq->time_high); + ad->hw_time_low = (uint32_t)ts_ns; + ad->hw_time_high = (uint32_t)(ts_ns >> 32); + is_tsinit = false; + } else { + if (rxq->time_high < ad->hw_time_low) + ad->hw_time_high += 1; + ts_ns = (uint64_t)ad->hw_time_high << 32 | rxq->time_high; + ad->hw_time_low = rxq->time_high; + } + ad->hw_time_update = rte_get_timer_cycles() / + (rte_get_timer_hz() / 1000); *RTE_MBUF_DYNFIELD(rxm, - ice_timestamp_dynfield_offset, - rte_mbuf_timestamp_t *) = ts_ns; - rxm->ol_flags |= ice_timestamp_dynflag; + (ice_timestamp_dynfield_offset), + rte_mbuf_timestamp_t *) = ts_ns; + pkt_flags |= ice_timestamp_dynflag; } @@ -2409,4 +2460,5 @@ ice_recv_pkts(void *rx_queue, rx_pkts[nb_rx++] = rxm; } + rxq->rx_tail = rx_id; /** -- 2.34.1 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2022-05-10 13:24:21.730231744 +0100 +++ 0005-net-ice-improve-performance-of-Rx-timestamp-offload.patch 2022-05-10 13:24:21.555646297 +0100 @@ -1 +1 @@ -From 5543827fc6df39eabd51e2ca81f4462c291ea8d9 Mon Sep 17 00:00:00 2001 +From 8ae457cbf54dd2b29491c6dceaa941d292777d45 Mon Sep 17 00:00:00 2001 @@ -5,0 +6,2 @@ +[ upstream commit 5543827fc6df39eabd51e2ca81f4462c291ea8d9 ] + @@ -20 +21,0 @@ -Cc: stable@dpdk.org @@ -30 +31 @@ -index 09cfb60b0f..3ab310628f 100644 +index 1242177b42..c0d1baa1ec 100644 @@ -33 +34 @@ -@@ -555,4 +555,7 @@ struct ice_adapter { +@@ -530,4 +530,7 @@ struct ice_adapter {