From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A35A5A0C45; Thu, 28 Oct 2021 11:09:32 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9D11F4067B; Thu, 28 Oct 2021 11:09:31 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by mails.dpdk.org (Postfix) with ESMTP id 48C7B4003F for ; Thu, 28 Oct 2021 11:09:29 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10150"; a="210435658" X-IronPort-AV: E=Sophos;i="5.87,189,1631602800"; d="scan'208";a="210435658" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Oct 2021 01:49:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,189,1631602800"; d="scan'208";a="498291809" Received: from fmsmsx602.amr.corp.intel.com ([10.18.126.82]) by orsmga008.jf.intel.com with ESMTP; 28 Oct 2021 01:49:40 -0700 Received: from shsmsx605.ccr.corp.intel.com (10.109.6.215) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12; Thu, 28 Oct 2021 01:49:38 -0700 Received: from shsmsx601.ccr.corp.intel.com (10.109.6.141) by SHSMSX605.ccr.corp.intel.com (10.109.6.215) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12; Thu, 28 Oct 2021 16:49:31 +0800 Received: from shsmsx601.ccr.corp.intel.com ([10.109.6.141]) by SHSMSX601.ccr.corp.intel.com ([10.109.6.141]) with mapi id 15.01.2242.012; Thu, 28 Oct 2021 16:49:31 +0800 From: "Zhang, Qi Z" To: "Su, Simei" CC: "dev@dpdk.org" , "Van Haaren, Harry" , "Wu, Wenjun1" Thread-Topic: [PATCH] net/ice: fix performance issue for Rx timestamp Thread-Index: AQHXy9JsylThnnZuw0uHG2El4r0bDqvoGUVQ Date: Thu, 28 Oct 2021 08:49:31 +0000 Message-ID: <1269c978d11f4844ace2602529d5f246@intel.com> References: <20211028075822.330220-1-simei.su@intel.com> In-Reply-To: <20211028075822.330220-1-simei.su@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-reaction: no-action dlp-version: 11.6.200.16 dlp-product: dlpe-windows x-originating-ip: [10.239.127.36] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] net/ice: fix performance issue for Rx timestamp X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Su, Simei > Sent: Thursday, October 28, 2021 3:58 PM > To: Zhang, Qi Z > Cc: dev@dpdk.org; Van Haaren, Harry ; Wu, > Wenjun1 ; Su, Simei > Subject: [PATCH] net/ice: fix performance issue for Rx timestamp >=20 > In Rx data path, it reads hardware registers per packet, resulting in big > performance drop. This patch improves performance from two aspects: > (1) replace per packet hardware register read by per burst. > (2) reduce hardware register read time from 3 to 2 when the low value of > time is not close to overflow. >=20 > Meanwhile, this patch refines "ice_timesync_read_rx_timestamp" and > "ice_timesync_read_tx_timestamp" API in which > "ice_tstamp_convert_32b_64b" > is also used. >=20 > Fixes: 953e74e6b73a ("net/ice: enable Rx timestamp on flex descriptor") > Fixes: 646dcbe6c701 ("net/ice: support IEEE 1588 PTP") >=20 > Suggested-by: Harry van Haaren > Signed-off-by: Simei Su > --- > drivers/net/ice/ice_ethdev.c | 4 +-- > drivers/net/ice/ice_ethdev.h | 1 + > drivers/net/ice/ice_rxtx.c | 59 ++++++++++++++++++++++++++------------= ------ > drivers/net/ice/ice_rxtx.h | 34 +++++++++++++++---------- > 4 files changed, 59 insertions(+), 39 deletions(-) >=20 > diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c = index > ef6ee1c..13a7a97 100644 > --- a/drivers/net/ice/ice_ethdev.c > +++ b/drivers/net/ice/ice_ethdev.c > @@ -5560,7 +5560,7 @@ ice_timesync_read_rx_timestamp(struct > rte_eth_dev *dev, > rxq =3D dev->data->rx_queues[flags]; >=20 > ts_high =3D rxq->time_high; > - ts_ns =3D ice_tstamp_convert_32b_64b(hw, ts_high); > + ts_ns =3D ice_tstamp_convert_32b_64b(hw, ad, 1, ts_high); > ns =3D rte_timecounter_update(&ad->rx_tstamp_tc, ts_ns); > *timestamp =3D rte_ns_to_timespec(ns); >=20 > @@ -5587,7 +5587,7 @@ ice_timesync_read_tx_timestamp(struct > rte_eth_dev *dev, > return -1; > } >=20 > - ts_ns =3D ice_tstamp_convert_32b_64b(hw, (tstamp >> 8) & mask); > + ts_ns =3D ice_tstamp_convert_32b_64b(hw, ad, 1, (tstamp >> 8) & mask); > ns =3D rte_timecounter_update(&ad->tx_tstamp_tc, ts_ns); > *timestamp =3D rte_ns_to_timespec(ns); >=20 > diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h = index > 599e002..0e42c4c 100644 > --- a/drivers/net/ice/ice_ethdev.h > +++ b/drivers/net/ice/ice_ethdev.h > @@ -509,6 +509,7 @@ struct ice_adapter { > struct rte_timecounter rx_tstamp_tc; > struct rte_timecounter tx_tstamp_tc; > bool ptp_ena; > + uint64_t time_hw; > #ifdef RTE_ARCH_X86 > bool rx_use_avx2; > bool rx_use_avx512; > diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c inde= x > c3cad2f..2d771ea 100644 > --- a/drivers/net/ice/ice_rxtx.c > +++ b/drivers/net/ice/ice_rxtx.c > @@ -1581,6 +1581,9 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq) > if (!(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S))) > return 0; >=20 > + if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) > + rxq->hw_register_set =3D 1; > + > /** > * Scan LOOK_AHEAD descriptors at a time to determine which > * descriptors reference packets that are ready to be received. > @@ -1614,15 +1617,15 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq) > ice_rxd_to_vlan_tci(mb, &rxdp[j]); > rxd_to_pkt_fields_ops[rxq->rxdid](rxq, mb, &rxdp[j]); #ifndef > RTE_LIBRTE_ICE_16BYTE_RX_DESC > - if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) { > - ts_ns =3D ice_tstamp_convert_32b_64b(hw, > + if (ice_timestamp_dynflag > 0) { > + ts_ns =3D ice_tstamp_convert_32b_64b(hw, ad, > + rxq->hw_register_set, > rte_le_to_cpu_32(rxdp[j].wb.flex_ts.ts_high)); > - if (ice_timestamp_dynflag > 0) { > - *RTE_MBUF_DYNFIELD(mb, > - ice_timestamp_dynfield_offset, > - rte_mbuf_timestamp_t *) =3D ts_ns; > - mb->ol_flags |=3D ice_timestamp_dynflag; > - } > + rxq->hw_register_set =3D 0; > + *RTE_MBUF_DYNFIELD(mb, > + ice_timestamp_dynfield_offset, > + rte_mbuf_timestamp_t *) =3D ts_ns; > + mb->ol_flags |=3D ice_timestamp_dynflag; > } >=20 > if (ad->ptp_ena && ((mb->packet_type & @@ -1822,6 +1825,10 > @@ ice_recv_scattered_pkts(void *rx_queue, > uint64_t ts_ns; > struct ice_adapter *ad =3D rxq->vsi->adapter; #endif > + > + if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) > + rxq->hw_register_set =3D 1; > + > while (nb_rx < nb_pkts) { > rxdp =3D &rx_ring[rx_id]; > rx_stat_err0 =3D rte_le_to_cpu_16(rxdp->wb.status_error0); > @@ -1932,15 +1939,15 @@ ice_recv_scattered_pkts(void *rx_queue, > rxd_to_pkt_fields_ops[rxq->rxdid](rxq, first_seg, &rxd); > pkt_flags =3D ice_rxd_error_to_pkt_flags(rx_stat_err0); > #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC > - if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) { > - ts_ns =3D ice_tstamp_convert_32b_64b(hw, > + if (ice_timestamp_dynflag > 0) { > + ts_ns =3D ice_tstamp_convert_32b_64b(hw, ad, > + rxq->hw_register_set, > rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high)); > - if (ice_timestamp_dynflag > 0) { > - *RTE_MBUF_DYNFIELD(first_seg, > - ice_timestamp_dynfield_offset, > - rte_mbuf_timestamp_t *) =3D ts_ns; > - first_seg->ol_flags |=3D ice_timestamp_dynflag; > - } > + rxq->hw_register_set =3D 0; > + *RTE_MBUF_DYNFIELD(first_seg, > + ice_timestamp_dynfield_offset, > + rte_mbuf_timestamp_t *) =3D ts_ns; > + first_seg->ol_flags |=3D ice_timestamp_dynflag; > } >=20 > if (ad->ptp_ena && ((first_seg->packet_type & RTE_PTYPE_L2_MASK) > @@ -2312,6 +2319,10 @@ ice_recv_pkts(void *rx_queue, > uint64_t ts_ns; > struct ice_adapter *ad =3D rxq->vsi->adapter; #endif > + > + if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) > + rxq->hw_register_set =3D 1; > + > while (nb_rx < nb_pkts) { > rxdp =3D &rx_ring[rx_id]; > rx_stat_err0 =3D rte_le_to_cpu_16(rxdp->wb.status_error0); > @@ -2363,15 +2374,15 @@ ice_recv_pkts(void *rx_queue, > rxd_to_pkt_fields_ops[rxq->rxdid](rxq, rxm, &rxd); > pkt_flags =3D ice_rxd_error_to_pkt_flags(rx_stat_err0); > #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC > - if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) { > - ts_ns =3D ice_tstamp_convert_32b_64b(hw, > + if (ice_timestamp_dynflag > 0) { > + ts_ns =3D ice_tstamp_convert_32b_64b(hw, ad, > + rxq->hw_register_set, > rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high)); > - if (ice_timestamp_dynflag > 0) { > - *RTE_MBUF_DYNFIELD(rxm, > - ice_timestamp_dynfield_offset, > - rte_mbuf_timestamp_t *) =3D ts_ns; > - rxm->ol_flags |=3D ice_timestamp_dynflag; > - } > + rxq->hw_register_set =3D 0; > + *RTE_MBUF_DYNFIELD(rxm, > + ice_timestamp_dynfield_offset, > + rte_mbuf_timestamp_t *) =3D ts_ns; > + rxm->ol_flags |=3D ice_timestamp_dynflag; > } >=20 > if (ad->ptp_ena && ((rxm->packet_type & RTE_PTYPE_L2_MASK) =3D=3D > diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h inde= x > 146dc1f..ef58c0d 100644 > --- a/drivers/net/ice/ice_rxtx.h > +++ b/drivers/net/ice/ice_rxtx.h > @@ -93,6 +93,7 @@ struct ice_rx_queue { > ice_rx_release_mbufs_t rx_rel_mbufs; > uint64_t offloads; > uint32_t time_high; > + uint32_t hw_register_set; > const struct rte_memzone *mz; > }; >=20 > @@ -321,29 +322,36 @@ void ice_fdir_rx_parsing_enable(struct ice_adapter > *ad, bool on) >=20 > /* Helper function to convert a 32b nanoseconds timestamp to 64b. */ > static inline -uint64_t ice_tstamp_convert_32b_64b(struct ice_hw *hw, > uint32_t in_timestamp) > +uint64_t ice_tstamp_convert_32b_64b(struct ice_hw *hw, struct ice_adapte= r > *ad, > + uint32_t flag, uint32_t in_timestamp) > { > const uint64_t mask =3D 0xFFFFFFFF; > uint32_t hi, lo, lo2, delta; > - uint64_t time, ns; > + uint64_t ns; >=20 > - lo =3D ICE_READ_REG(hw, GLTSYN_TIME_L(0)); > - hi =3D ICE_READ_REG(hw, GLTSYN_TIME_H(0)); > - lo2 =3D ICE_READ_REG(hw, GLTSYN_TIME_L(0)); > - > - if (lo2 < lo) { > + if (flag) { > lo =3D ICE_READ_REG(hw, GLTSYN_TIME_L(0)); > hi =3D ICE_READ_REG(hw, GLTSYN_TIME_H(0)); > - } >=20 > - time =3D ((uint64_t)hi << 32) | lo; > + if (lo > UINT32_MAX) lo is type of uint32_t, the check should always be false.