patches for DPDK stable branches
 help / color / mirror / Atom feed
From: Kevin Traynor <ktraynor@redhat.com>
To: Wenjun Wu <wenjun1.wu@intel.com>
Cc: Qi Zhang <qi.z.zhang@intel.com>, dpdk stable <stable@dpdk.org>
Subject: patch 'net/ice: improve performance of Rx timestamp offload' has been queued to stable release 21.11.2
Date: Tue, 10 May 2022 13:29:42 +0100	[thread overview]
Message-ID: <20220510123010.159523-5-ktraynor@redhat.com> (raw)
In-Reply-To: <20220510123010.159523-1-ktraynor@redhat.com>

Hi,

FYI, your patch has been queued to stable release 21.11.2

Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet.
It will be pushed if I get no objections before 05/15/22. So please
shout if anyone has objections.

Also note that after the patch there's a diff of the upstream commit vs the
patch applied to the branch. This will indicate if there was any rebasing
needed to apply to the stable branch. If there were code changes for rebasing
(ie: not only metadata diffs), please double check that the rebase was
correctly done.

Queued patches are on a temporary branch at:
https://github.com/kevintraynor/dpdk-stable

This queued commit can be viewed at:
https://github.com/kevintraynor/dpdk-stable/commit/8ae457cbf54dd2b29491c6dceaa941d292777d45

Thanks.

Kevin

---
From 8ae457cbf54dd2b29491c6dceaa941d292777d45 Mon Sep 17 00:00:00 2001
From: Wenjun Wu <wenjun1.wu@intel.com>
Date: Mon, 28 Feb 2022 15:36:07 +0800
Subject: [PATCH] net/ice: improve performance of Rx timestamp offload

[ upstream commit 5543827fc6df39eabd51e2ca81f4462c291ea8d9 ]

Previously, each time a burst of packets is received, SW reads HW
register and assembles it and the timestamp from descriptor together to
get the complete 64 bits timestamp.

This patch optimizes the algorithm. The SW only needs to check the
monotonicity of the low 32bits timestamp to avoid crossing borders.
Each time before SW receives a burst of packets, it should check the
time difference between current time and last update time to avoid
the low 32 bits timestamp cycling twice.

The patch proved a 50% ~ 70% single core performance improvement on a
main stream Xeon server, this fix the performance gap for some use cases.

Fixes: f9c561ffbccc ("net/ice: fix performance for Rx timestamp")

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
---
 drivers/net/ice/ice_ethdev.h |   3 +
 drivers/net/ice/ice_rxtx.c   | 118 +++++++++++++++++++++++++----------
 2 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 1242177b42..c0d1baa1ec 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -530,4 +530,7 @@ struct ice_adapter {
 	bool ptp_ena;
 	uint64_t time_hw;
+	uint32_t hw_time_high; /* high 32 bits of timestamp */
+	uint32_t hw_time_low; /* low 32 bits of timestamp */
+	uint64_t hw_time_update; /* SW time of HW record updating */
 	struct ice_fdir_prof_info fdir_prof_info[ICE_MAX_PTGS];
 	struct ice_rss_prof_info rss_prof_info[ICE_MAX_PTGS];
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 041f4bc91f..2dd2637fbb 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1575,7 +1575,8 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl;
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
+	bool is_tsinit = false;
+	uint64_t ts_ns;
 	struct ice_vsi *vsi = rxq->vsi;
 	struct ice_hw *hw = ICE_VSI_TO_HW(vsi);
-	uint64_t ts_ns;
 	struct ice_adapter *ad = rxq->vsi->adapter;
 #endif
@@ -1589,6 +1590,12 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 		return 0;
 
-	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
-		rxq->hw_register_set = 1;
+#ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
+		uint64_t sw_cur_time = rte_get_timer_cycles() / (rte_get_timer_hz() / 1000);
+
+		if (unlikely(sw_cur_time - ad->hw_time_update > 4))
+			is_tsinit = 1;
+	}
+#endif
 
 	/**
@@ -1626,12 +1633,24 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 			if (ice_timestamp_dynflag > 0) {
-				ts_ns = ice_tstamp_convert_32b_64b(hw, ad,
-					rxq->hw_register_set,
-					rte_le_to_cpu_32(rxdp[j].wb.flex_ts.ts_high));
-				rxq->hw_register_set = 0;
+				rxq->time_high =
+				rte_le_to_cpu_32(rxdp[j].wb.flex_ts.ts_high);
+				if (unlikely(is_tsinit)) {
+					ts_ns = ice_tstamp_convert_32b_64b(hw, ad, 1,
+									   rxq->time_high);
+					ad->hw_time_low = (uint32_t)ts_ns;
+					ad->hw_time_high = (uint32_t)(ts_ns >> 32);
+					is_tsinit = false;
+				} else {
+					if (rxq->time_high < ad->hw_time_low)
+						ad->hw_time_high += 1;
+					ts_ns = (uint64_t)ad->hw_time_high << 32 | rxq->time_high;
+					ad->hw_time_low = rxq->time_high;
+				}
+				ad->hw_time_update = rte_get_timer_cycles() /
+						     (rte_get_timer_hz() / 1000);
 				*RTE_MBUF_DYNFIELD(mb,
-					ice_timestamp_dynfield_offset,
-					rte_mbuf_timestamp_t *) = ts_ns;
-				mb->ol_flags |= ice_timestamp_dynflag;
+						   ice_timestamp_dynfield_offset,
+						   rte_mbuf_timestamp_t *) = ts_ns;
+				pkt_flags |= ice_timestamp_dynflag;
 			}
 
@@ -1832,13 +1851,18 @@ ice_recv_scattered_pkts(void *rx_queue,
 	uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl;
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
+	bool is_tsinit = false;
+	uint64_t ts_ns;
 	struct ice_vsi *vsi = rxq->vsi;
 	struct ice_hw *hw = ICE_VSI_TO_HW(vsi);
-	uint64_t ts_ns;
 	struct ice_adapter *ad = rxq->vsi->adapter;
+
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
+		uint64_t sw_cur_time = rte_get_timer_cycles() / (rte_get_timer_hz() / 1000);
+
+		if (unlikely(sw_cur_time - ad->hw_time_update > 4))
+			is_tsinit = true;
+	}
 #endif
 
-	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
-		rxq->hw_register_set = 1;
-
 	while (nb_rx < nb_pkts) {
 		rxdp = &rx_ring[rx_id];
@@ -1952,12 +1976,23 @@ ice_recv_scattered_pkts(void *rx_queue,
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		if (ice_timestamp_dynflag > 0) {
-			ts_ns = ice_tstamp_convert_32b_64b(hw, ad,
-				rxq->hw_register_set,
-				rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high));
-			rxq->hw_register_set = 0;
-			*RTE_MBUF_DYNFIELD(first_seg,
-				ice_timestamp_dynfield_offset,
-				rte_mbuf_timestamp_t *) = ts_ns;
-			first_seg->ol_flags |= ice_timestamp_dynflag;
+			rxq->time_high =
+			   rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high);
+			if (unlikely(is_tsinit)) {
+				ts_ns = ice_tstamp_convert_32b_64b(hw, ad, 1, rxq->time_high);
+				ad->hw_time_low = (uint32_t)ts_ns;
+				ad->hw_time_high = (uint32_t)(ts_ns >> 32);
+				is_tsinit = false;
+			} else {
+				if (rxq->time_high < ad->hw_time_low)
+					ad->hw_time_high += 1;
+				ts_ns = (uint64_t)ad->hw_time_high << 32 | rxq->time_high;
+				ad->hw_time_low = rxq->time_high;
+			}
+			ad->hw_time_update = rte_get_timer_cycles() /
+					     (rte_get_timer_hz() / 1000);
+			*RTE_MBUF_DYNFIELD(rxm,
+					   (ice_timestamp_dynfield_offset),
+					   rte_mbuf_timestamp_t *) = ts_ns;
+			pkt_flags |= ice_timestamp_dynflag;
 		}
 
@@ -2326,13 +2361,18 @@ ice_recv_pkts(void *rx_queue,
 	uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl;
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
+	bool is_tsinit = false;
+	uint64_t ts_ns;
 	struct ice_vsi *vsi = rxq->vsi;
 	struct ice_hw *hw = ICE_VSI_TO_HW(vsi);
-	uint64_t ts_ns;
 	struct ice_adapter *ad = rxq->vsi->adapter;
+
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
+		uint64_t sw_cur_time = rte_get_timer_cycles() / (rte_get_timer_hz() / 1000);
+
+		if (unlikely(sw_cur_time - ad->hw_time_update > 4))
+			is_tsinit = 1;
+	}
 #endif
 
-	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
-		rxq->hw_register_set = 1;
-
 	while (nb_rx < nb_pkts) {
 		rxdp = &rx_ring[rx_id];
@@ -2387,12 +2427,23 @@ ice_recv_pkts(void *rx_queue,
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		if (ice_timestamp_dynflag > 0) {
-			ts_ns = ice_tstamp_convert_32b_64b(hw, ad,
-				rxq->hw_register_set,
-				rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high));
-			rxq->hw_register_set = 0;
+			rxq->time_high =
+			   rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high);
+			if (unlikely(is_tsinit)) {
+				ts_ns = ice_tstamp_convert_32b_64b(hw, ad, 1, rxq->time_high);
+				ad->hw_time_low = (uint32_t)ts_ns;
+				ad->hw_time_high = (uint32_t)(ts_ns >> 32);
+				is_tsinit = false;
+			} else {
+				if (rxq->time_high < ad->hw_time_low)
+					ad->hw_time_high += 1;
+				ts_ns = (uint64_t)ad->hw_time_high << 32 | rxq->time_high;
+				ad->hw_time_low = rxq->time_high;
+			}
+			ad->hw_time_update = rte_get_timer_cycles() /
+					     (rte_get_timer_hz() / 1000);
 			*RTE_MBUF_DYNFIELD(rxm,
-				ice_timestamp_dynfield_offset,
-				rte_mbuf_timestamp_t *) = ts_ns;
-			rxm->ol_flags |= ice_timestamp_dynflag;
+					   (ice_timestamp_dynfield_offset),
+					   rte_mbuf_timestamp_t *) = ts_ns;
+			pkt_flags |= ice_timestamp_dynflag;
 		}
 
@@ -2409,4 +2460,5 @@ ice_recv_pkts(void *rx_queue,
 		rx_pkts[nb_rx++] = rxm;
 	}
+
 	rxq->rx_tail = rx_id;
 	/**
-- 
2.34.1

---
  Diff of the applied patch vs upstream commit (please double-check if non-empty:
---
--- -	2022-05-10 13:24:21.730231744 +0100
+++ 0005-net-ice-improve-performance-of-Rx-timestamp-offload.patch	2022-05-10 13:24:21.555646297 +0100
@@ -1 +1 @@
-From 5543827fc6df39eabd51e2ca81f4462c291ea8d9 Mon Sep 17 00:00:00 2001
+From 8ae457cbf54dd2b29491c6dceaa941d292777d45 Mon Sep 17 00:00:00 2001
@@ -5,0 +6,2 @@
+[ upstream commit 5543827fc6df39eabd51e2ca81f4462c291ea8d9 ]
+
@@ -20 +21,0 @@
-Cc: stable@dpdk.org
@@ -30 +31 @@
-index 09cfb60b0f..3ab310628f 100644
+index 1242177b42..c0d1baa1ec 100644
@@ -33 +34 @@
-@@ -555,4 +555,7 @@ struct ice_adapter {
+@@ -530,4 +530,7 @@ struct ice_adapter {


  parent reply	other threads:[~2022-05-10 12:30 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-10 12:29 patch 'eal/windows: add missing C++ include guards' " Kevin Traynor
2022-05-10 12:29 ` patch 'net/dpaa2: fix dpdmux default interface' " Kevin Traynor
2022-05-10 12:29 ` patch 'examples/bond: fix invalid use of trylock' " Kevin Traynor
2022-05-10 12:29 ` patch 'test/bpf: skip test if libpcap is unavailable' " Kevin Traynor
2022-05-10 12:29 ` Kevin Traynor [this message]
2022-05-10 12:29 ` patch 'net/i40e: populate error in flow director parser' " Kevin Traynor
2022-05-10 12:29 ` patch 'net/ice: add missing Tx burst mode name' " Kevin Traynor
2022-05-10 12:29 ` patch 'net/ice: refactor parser usage' " Kevin Traynor
2022-05-10 12:29 ` patch 'net/ice: fix raw flow input pattern parsing' " Kevin Traynor
2022-05-10 12:29 ` patch 'net/netvsc: fix calculation of checksums based on mbuf flag' " Kevin Traynor
2022-05-10 12:29 ` patch 'common/mlx5: fix memory region range calculation' " Kevin Traynor
2022-05-10 12:29 ` patch 'net/mlx5: fix Tx when inlining is impossible' " Kevin Traynor
2022-05-10 12:29 ` patch 'net/mlx5: fix probing with secondary bonding member' " Kevin Traynor
2022-05-10 12:29 ` patch 'net/mlx5: fix counter in non-termination meter' " Kevin Traynor
2022-05-10 12:29 ` patch 'net/mlx5: restrict Rx queue array access to boundary' " Kevin Traynor
2022-05-10 12:29 ` patch 'net/mlx5: fix GTP handling in header modify action' " Kevin Traynor
2022-05-10 12:29 ` patch 'net/mlx5: fix Rx/Tx stats concurrency' " Kevin Traynor
2022-05-10 12:29 ` patch 'test/table: fix buffer overflow on lpm entry' " Kevin Traynor
2022-05-10 12:29 ` patch 'mem: skip attaching external memory in secondary process' " Kevin Traynor
2022-05-10 12:29 ` patch 'malloc: fix ASan handling for unmapped memory' " Kevin Traynor
2022-05-10 12:29 ` patch 'eal: fix C++ include for device event and DMA' " Kevin Traynor
2022-05-10 12:29 ` patch 'crypto/dpaa_sec: fix digest size' " Kevin Traynor
2022-05-10 12:30 ` patch 'security: fix SA lifetime comments' " Kevin Traynor
2022-05-10 12:30 ` patch 'crypto/mlx5: fix login cleanup' " Kevin Traynor
2022-05-10 12:30 ` patch 'crypto/dpaa2_sec: fix fle buffer leak' " Kevin Traynor
2022-05-10 12:30 ` patch 'crypto/dpaa2_sec: fix buffer pool ID check' " Kevin Traynor
2022-05-10 12:30 ` patch 'crypto/dpaa_sec: fix chained FD length in raw datapath' " Kevin Traynor
2022-05-10 12:30 ` patch 'crypto/dpaa2_sec: " Kevin Traynor
2022-05-10 12:30 ` patch 'crypto/dpaa_sec: fix secondary process probing' " Kevin Traynor
2022-05-10 12:30 ` patch 'crypto/dpaa2_sec: fix crypto operation pointer' " Kevin Traynor
2022-05-10 12:30 ` patch 'crypto/dpaa2_sec: fix operation status for simple FD' " Kevin Traynor
2022-05-10 12:30 ` patch 'common/dpaax: fix short MAC-I IV calculation for ZUC' " Kevin Traynor
2022-05-10 12:30 ` patch 'examples/l2fwd-crypto: fix stats refresh rate' " Kevin Traynor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220510123010.159523-5-ktraynor@redhat.com \
    --to=ktraynor@redhat.com \
    --cc=qi.z.zhang@intel.com \
    --cc=stable@dpdk.org \
    --cc=wenjun1.wu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).