From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 68765A0096 for ; Fri, 7 Jun 2019 18:07:49 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2FACB1BB21; Fri, 7 Jun 2019 18:07:48 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 7C2E11BAA7 for ; Fri, 7 Jun 2019 18:07:46 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Jun 2019 09:07:45 -0700 X-ExtLoop1: 1 Received: from irsmsx102.ger.corp.intel.com ([163.33.3.155]) by orsmga005.jf.intel.com with ESMTP; 07 Jun 2019 09:07:44 -0700 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.121]) by IRSMSX102.ger.corp.intel.com ([169.254.2.238]) with mapi id 14.03.0415.000; Fri, 7 Jun 2019 17:07:43 +0100 From: "Iremonger, Bernard" To: Viacheslav Ovsiienko , "dev@dpdk.org" CC: "Yigit, Ferruh" Thread-Topic: [dpdk-dev] [RFC] app/testpmd: add profiling for Rx/Tx burst routines Thread-Index: AQHVFE+3R/jkWgM/vEqCO09DYNAzzaaQaZjQ Date: Fri, 7 Jun 2019 16:07:42 +0000 Message-ID: <8CEF83825BEC744B83065625E567D7C260DAD4FD@IRSMSX108.ger.corp.intel.com> References: <1558936043-6259-1-git-send-email-viacheslavo@mellanox.com> In-Reply-To: <1558936043-6259-1-git-send-email-viacheslavo@mellanox.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOWYwNDkyMWYtODQ0NS00MjQwLTljZjItZTg1NTRhYzg2YzFiIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiU0RzUFVHNXpsSlRLaWpiM0h1cENIRSsrSmx2dEZUWnlRRnRvTXVKZlFcL2d1OW00d3FvdURtMllaeDNrMnlYYmEifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC] app/testpmd: add profiling for Rx/Tx burst routines X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Viacheslav, > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav Ovsiienko > Sent: Monday, May 27, 2019 6:47 AM > To: dev@dpdk.org > Cc: Yigit, Ferruh > Subject: [dpdk-dev] [RFC] app/testpmd: add profiling for Rx/Tx burst rout= ines >=20 > There is the testpmd configuration option called > RTE_TEST_PMD_RECORD_CORE_CYCLES, if this one is turned on the testpmd > application measures the CPU clocks spent within forwarding loop. This ti= me is > the sum of execution times of rte_eth_rx_burst(), rte_eth_tx_burst(), > rte_delay_us(), > rte_pktmbuf_free() and so on, depending on fwd mode set. >=20 > While debugging and performance optimization of datapath burst routines t= t > would be useful to see the pure execution times of these ones. It is prop= osed to > add separated profiling > options: >=20 > CONFIG_RTE_TEST_PMD_RECORD_CORE_TX_CYCLES > enables gathering profiling data for transmit datapath, > ticks spent within rte_eth_tx_burst() >=20 > CONFIG_RTE_TEST_PMD_RECORD_CORE_RX_CYCLES > enables gathering profiling data for transmit datapath, > ticks spent within rte_eth_rx_burst() >=20 > Signed-off-by: Viacheslav Ovsiienko > --- > app/test-pmd/csumonly.c | 25 ++++++++++++------------- > app/test-pmd/flowgen.c | 25 +++++++++++++------------ > app/test-pmd/icmpecho.c | 26 +++++++++++++------------- > app/test-pmd/iofwd.c | 24 ++++++++++++------------ > app/test-pmd/macfwd.c | 24 +++++++++++++----------- > app/test-pmd/macswap.c | 26 ++++++++++++++------------ > app/test-pmd/rxonly.c | 17 ++++++----------- > app/test-pmd/softnicfwd.c | 24 ++++++++++++------------ > app/test-pmd/testpmd.c | 32 ++++++++++++++++++++++++++++++++ > app/test-pmd/testpmd.h | 40 > ++++++++++++++++++++++++++++++++++++++++ > app/test-pmd/txonly.c | 23 +++++++++++------------ > config/common_base | 2 ++ > 12 files changed, 180 insertions(+), 108 deletions(-) >=20 > diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index > f4f2a7b..251e179 100644 > --- a/app/test-pmd/csumonly.c > +++ b/app/test-pmd/csumonly.c > @@ -710,19 +710,19 @@ struct simple_gre_hdr { > uint16_t nb_segments =3D 0; > int ret; >=20 > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - uint64_t start_tsc; > - uint64_t end_tsc; > - uint64_t core_cycles; > +#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES) > + uint64_t start_tx_tsc; Should the RTE_TEST_PMD_RECORD_CORE_CYCLES macro be checked here too? > #endif > - > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - start_tsc =3D rte_rdtsc(); > +#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \ > + defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES) > + uint64_t start_rx_tsc; > #endif >=20 > /* receive a burst of packet */ > + TEST_PMD_CORE_CYC_RX_START(start_rx_tsc); > nb_rx =3D rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst, > nb_pkt_per_burst); > + TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc); > if (unlikely(nb_rx =3D=3D 0)) > return; > #ifdef RTE_TEST_PMD_RECORD_BURST_STATS > @@ -982,8 +982,10 @@ struct simple_gre_hdr { > printf("Preparing packet burst to transmit failed: %s\n", > rte_strerror(rte_errno)); >=20 > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx =3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, tx_pkts_burst, > nb_prep); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); >=20 > /* > * Retry if necessary > @@ -992,8 +994,10 @@ struct simple_gre_hdr { > retry =3D 0; > while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) { > rte_delay_us(burst_tx_delay_time); > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx +=3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > &tx_pkts_burst[nb_tx], nb_rx - nb_tx); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > } > } > fs->tx_packets +=3D nb_tx; > @@ -1010,12 +1014,7 @@ struct simple_gre_hdr { > rte_pktmbuf_free(tx_pkts_burst[nb_tx]); > } while (++nb_tx < nb_rx); > } > - > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - end_tsc =3D rte_rdtsc(); > - core_cycles =3D (end_tsc - start_tsc); > - fs->core_cycles =3D (uint64_t) (fs->core_cycles + core_cycles); > -#endif > + TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc); > } >=20 > struct fwd_engine csum_fwd_engine =3D { > diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c index > 3214e3c..b128e68 100644 > --- a/app/test-pmd/flowgen.c > +++ b/app/test-pmd/flowgen.c > @@ -130,20 +130,21 @@ > uint16_t i; > uint32_t retry; > uint64_t tx_offloads; > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - uint64_t start_tsc; > - uint64_t end_tsc; > - uint64_t core_cycles; > -#endif > static int next_flow =3D 0; >=20 > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - start_tsc =3D rte_rdtsc(); > +#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES) Should the RTE_TEST_PMD_RECORD_CORE_CYCLES macro be checked here too? > + uint64_t start_tx_tsc; > +#endif > +#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \ > + defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES) > + uint64_t start_rx_tsc; > #endif >=20 > /* Receive a burst of packets and discard them. */ > + TEST_PMD_CORE_CYC_RX_START(start_rx_tsc); > nb_rx =3D rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst, > nb_pkt_per_burst); > + TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc); > fs->rx_packets +=3D nb_rx; >=20 > for (i =3D 0; i < nb_rx; i++) > @@ -212,7 +213,9 @@ > next_flow =3D (next_flow + 1) % cfg_n_flows; > } >=20 > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx =3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pk= t); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > /* > * Retry if necessary > */ > @@ -220,8 +223,10 @@ > retry =3D 0; > while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) { > rte_delay_us(burst_tx_delay_time); > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx +=3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > &pkts_burst[nb_tx], nb_rx - nb_tx); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > } > } > fs->tx_packets +=3D nb_tx; > @@ -239,11 +244,7 @@ > rte_pktmbuf_free(pkts_burst[nb_tx]); > } while (++nb_tx < nb_pkt); > } > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - end_tsc =3D rte_rdtsc(); > - core_cycles =3D (end_tsc - start_tsc); > - fs->core_cycles =3D (uint64_t) (fs->core_cycles + core_cycles); > -#endif > + TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc); > } >=20 > struct fwd_engine flow_gen_engine =3D { > diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c index > 55d266d..a539fe8 100644 > --- a/app/test-pmd/icmpecho.c > +++ b/app/test-pmd/icmpecho.c > @@ -293,21 +293,22 @@ > uint32_t cksum; > uint8_t i; > int l2_len; > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - uint64_t start_tsc; > - uint64_t end_tsc; > - uint64_t core_cycles; > -#endif >=20 > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - start_tsc =3D rte_rdtsc(); > +#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES) > + uint64_t start_tx_tsc; > +#endif > +#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \ > + defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES) > + uint64_t start_rx_tsc; > #endif >=20 > /* > * First, receive a burst of packets. > */ > + TEST_PMD_CORE_CYC_RX_START(start_rx_tsc); > nb_rx =3D rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst, > nb_pkt_per_burst); > + TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc); > if (unlikely(nb_rx =3D=3D 0)) > return; >=20 > @@ -487,8 +488,10 @@ >=20 > /* Send back ICMP echo replies, if any. */ > if (nb_replies > 0) { > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx =3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, > nb_replies); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > /* > * Retry if necessary > */ > @@ -497,10 +500,12 @@ > while (nb_tx < nb_replies && > retry++ < burst_tx_retry_num) { > rte_delay_us(burst_tx_delay_time); > + > TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx +=3D rte_eth_tx_burst(fs->tx_port, > fs->tx_queue, > &pkts_burst[nb_tx], > nb_replies - nb_tx); > + TEST_PMD_CORE_CYC_TX_ADD(fs, > start_tx_tsc); > } > } > fs->tx_packets +=3D nb_tx; > @@ -514,12 +519,7 @@ > } while (++nb_tx < nb_replies); > } > } > - > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - end_tsc =3D rte_rdtsc(); > - core_cycles =3D (end_tsc - start_tsc); > - fs->core_cycles =3D (uint64_t) (fs->core_cycles + core_cycles); > -#endif > + TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc); > } >=20 > struct fwd_engine icmp_echo_engine =3D { > diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c index > 9dce76e..dc66a88 100644 > --- a/app/test-pmd/iofwd.c > +++ b/app/test-pmd/iofwd.c > @@ -51,21 +51,21 @@ > uint16_t nb_tx; > uint32_t retry; >=20 > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - uint64_t start_tsc; > - uint64_t end_tsc; > - uint64_t core_cycles; > +#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES) Should the RTE_TEST_PMD_RECORD_CORE_CYCLES macro be checked here too? > + uint64_t start_tx_tsc; > #endif > - > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - start_tsc =3D rte_rdtsc(); > +#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \ > + defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES) > + uint64_t start_rx_tsc; > #endif >=20 > /* > * Receive a burst of packets and forward them. > */ > + TEST_PMD_CORE_CYC_RX_START(start_rx_tsc); > nb_rx =3D rte_eth_rx_burst(fs->rx_port, fs->rx_queue, > pkts_burst, nb_pkt_per_burst); > + TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc); > if (unlikely(nb_rx =3D=3D 0)) > return; > fs->rx_packets +=3D nb_rx; > @@ -73,8 +73,10 @@ > #ifdef RTE_TEST_PMD_RECORD_BURST_STATS > fs->rx_burst_stats.pkt_burst_spread[nb_rx]++; > #endif > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx =3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > pkts_burst, nb_rx); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > /* > * Retry if necessary > */ > @@ -82,8 +84,10 @@ > retry =3D 0; > while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) { > rte_delay_us(burst_tx_delay_time); > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx +=3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > &pkts_burst[nb_tx], nb_rx - nb_tx); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > } > } > fs->tx_packets +=3D nb_tx; > @@ -96,11 +100,7 @@ > rte_pktmbuf_free(pkts_burst[nb_tx]); > } while (++nb_tx < nb_rx); > } > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - end_tsc =3D rte_rdtsc(); > - core_cycles =3D (end_tsc - start_tsc); > - fs->core_cycles =3D (uint64_t) (fs->core_cycles + core_cycles); > -#endif > + TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc); > } >=20 > struct fwd_engine io_fwd_engine =3D { > diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c index > 7cac757..2fd38ea 100644 > --- a/app/test-pmd/macfwd.c > +++ b/app/test-pmd/macfwd.c > @@ -56,21 +56,23 @@ > uint16_t i; > uint64_t ol_flags =3D 0; > uint64_t tx_offloads; > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - uint64_t start_tsc; > - uint64_t end_tsc; > - uint64_t core_cycles; > + > +#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES) Should the RTE_TEST_PMD_RECORD_CORE_CYCLES macro be checked here too? > + uint64_t start_tx_tsc; > #endif > +#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \ > + defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES) > + uint64_t start_rx_tsc; >=20 > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - start_tsc =3D rte_rdtsc(); > #endif >=20 > /* > * Receive a burst of packets and forward them. > */ > + TEST_PMD_CORE_CYC_RX_START(start_rx_tsc); > nb_rx =3D rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst, > nb_pkt_per_burst); > + TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc); > if (unlikely(nb_rx =3D=3D 0)) > return; >=20 > @@ -103,7 +105,9 @@ > mb->vlan_tci =3D txp->tx_vlan_id; > mb->vlan_tci_outer =3D txp->tx_vlan_id_outer; > } > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx =3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx= ); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > /* > * Retry if necessary > */ > @@ -111,8 +115,10 @@ > retry =3D 0; > while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) { > rte_delay_us(burst_tx_delay_time); > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx +=3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > &pkts_burst[nb_tx], nb_rx - nb_tx); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > } > } >=20 > @@ -126,11 +132,7 @@ > rte_pktmbuf_free(pkts_burst[nb_tx]); > } while (++nb_tx < nb_rx); > } > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - end_tsc =3D rte_rdtsc(); > - core_cycles =3D (end_tsc - start_tsc); > - fs->core_cycles =3D (uint64_t) (fs->core_cycles + core_cycles); > -#endif > + TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc); > } >=20 > struct fwd_engine mac_fwd_engine =3D { > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c index > 71af916..b22acdb 100644 > --- a/app/test-pmd/macswap.c > +++ b/app/test-pmd/macswap.c > @@ -86,21 +86,22 @@ > uint16_t nb_rx; > uint16_t nb_tx; > uint32_t retry; > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - uint64_t start_tsc; > - uint64_t end_tsc; > - uint64_t core_cycles; > -#endif >=20 > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - start_tsc =3D rte_rdtsc(); > +#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES) Should the RTE_TEST_PMD_RECORD_CORE_CYCLES macro be checked here too? > + uint64_t start_tx_tsc; > +#endif > +#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \ > + defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES) > + uint64_t start_rx_tsc; > #endif >=20 > /* > * Receive a burst of packets and forward them. > */ > + TEST_PMD_CORE_CYC_RX_START(start_rx_tsc); > nb_rx =3D rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst, > nb_pkt_per_burst); > + TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc); > if (unlikely(nb_rx =3D=3D 0)) > return; >=20 > @@ -112,7 +113,10 @@ >=20 > do_macswap(pkts_burst, nb_rx, txp); >=20 > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx =3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx= ); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > + > /* > * Retry if necessary > */ > @@ -120,8 +124,10 @@ > retry =3D 0; > while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) { > rte_delay_us(burst_tx_delay_time); > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx +=3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > &pkts_burst[nb_tx], nb_rx - nb_tx); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > } > } > fs->tx_packets +=3D nb_tx; > @@ -134,11 +140,7 @@ > rte_pktmbuf_free(pkts_burst[nb_tx]); > } while (++nb_tx < nb_rx); > } > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - end_tsc =3D rte_rdtsc(); > - core_cycles =3D (end_tsc - start_tsc); > - fs->core_cycles =3D (uint64_t) (fs->core_cycles + core_cycles); > -#endif > + TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc); > } >=20 > struct fwd_engine mac_swap_engine =3D { > diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c index > 5c65fc4..d1da357 100644 > --- a/app/test-pmd/rxonly.c > +++ b/app/test-pmd/rxonly.c > @@ -50,19 +50,18 @@ > uint16_t nb_rx; > uint16_t i; >=20 > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - uint64_t start_tsc; > - uint64_t end_tsc; > - uint64_t core_cycles; > - > - start_tsc =3D rte_rdtsc(); > +#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \ > + defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES) > + uint64_t start_rx_tsc; > #endif >=20 > /* > * Receive a burst of packets. > */ > + TEST_PMD_CORE_CYC_RX_START(start_rx_tsc); > nb_rx =3D rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst, > nb_pkt_per_burst); > + TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc); > if (unlikely(nb_rx =3D=3D 0)) > return; >=20 > @@ -73,11 +72,7 @@ > for (i =3D 0; i < nb_rx; i++) > rte_pktmbuf_free(pkts_burst[i]); >=20 > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - end_tsc =3D rte_rdtsc(); > - core_cycles =3D (end_tsc - start_tsc); > - fs->core_cycles =3D (uint64_t) (fs->core_cycles + core_cycles); > -#endif > + TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc); > } >=20 > struct fwd_engine rx_only_engine =3D { > diff --git a/app/test-pmd/softnicfwd.c b/app/test-pmd/softnicfwd.c index > 94e6669..9b2b0e6 100644 > --- a/app/test-pmd/softnicfwd.c > +++ b/app/test-pmd/softnicfwd.c > @@ -87,35 +87,39 @@ struct tm_hierarchy { > uint16_t nb_tx; > uint32_t retry; >=20 > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - uint64_t start_tsc; > - uint64_t end_tsc; > - uint64_t core_cycles; > +#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES) Should the RTE_TEST_PMD_RECORD_CORE_CYCLES macro be checked here too? > + uint64_t start_tx_tsc; > #endif > - > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - start_tsc =3D rte_rdtsc(); > +#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \ > + defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES) > + uint64_t start_rx_tsc; > #endif >=20 > /* Packets Receive */ > + TEST_PMD_CORE_CYC_RX_START(start_rx_tsc); > nb_rx =3D rte_eth_rx_burst(fs->rx_port, fs->rx_queue, > pkts_burst, nb_pkt_per_burst); > + TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc); > fs->rx_packets +=3D nb_rx; >=20 > #ifdef RTE_TEST_PMD_RECORD_BURST_STATS > fs->rx_burst_stats.pkt_burst_spread[nb_rx]++; > #endif >=20 > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx =3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > pkts_burst, nb_rx); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); >=20 > /* Retry if necessary */ > if (unlikely(nb_tx < nb_rx) && fs->retry_enabled) { > retry =3D 0; > while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) { > rte_delay_us(burst_tx_delay_time); > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx +=3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > &pkts_burst[nb_tx], nb_rx - nb_tx); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > } > } > fs->tx_packets +=3D nb_tx; > @@ -130,11 +134,7 @@ struct tm_hierarchy { > rte_pktmbuf_free(pkts_burst[nb_tx]); > } while (++nb_tx < nb_rx); > } > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - end_tsc =3D rte_rdtsc(); > - core_cycles =3D (end_tsc - start_tsc); > - fs->core_cycles =3D (uint64_t) (fs->core_cycles + core_cycles); > -#endif > + TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc); > } >=20 > static void > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index > f0061d9..de8478f 100644 > --- a/app/test-pmd/testpmd.c > +++ b/app/test-pmd/testpmd.c > @@ -1483,6 +1483,12 @@ struct extmem_param { #ifdef > RTE_TEST_PMD_RECORD_CORE_CYCLES > uint64_t fwd_cycles =3D 0; > #endif > +#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES > + uint64_t rx_cycles =3D 0; > +#endif > +#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES > + uint64_t tx_cycles =3D 0; > +#endif > uint64_t total_recv =3D 0; > uint64_t total_xmit =3D 0; > struct rte_port *port; > @@ -1513,6 +1519,12 @@ struct extmem_param { #ifdef > RTE_TEST_PMD_RECORD_CORE_CYCLES > fwd_cycles +=3D fs->core_cycles; > #endif > +#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES > + rx_cycles +=3D fs->core_rx_cycles; > +#endif > +#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES > + tx_cycles +=3D fs->core_tx_cycles; > +#endif > } > for (i =3D 0; i < cur_fwd_config.nb_fwd_ports; i++) { > uint8_t j; > @@ -1648,6 +1660,20 @@ struct extmem_param { > (unsigned int)(fwd_cycles / total_recv), > fwd_cycles, total_recv); > #endif > +#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES > + if (total_recv > 0) > + printf("\n rx CPU cycles/packet=3D%u (total cycles=3D" > + "%"PRIu64" / total RX packets=3D%"PRIu64")\n", > + (unsigned int)(rx_cycles / total_recv), > + rx_cycles, total_recv); > +#endif > +#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES > + if (total_xmit > 0) > + printf("\n tx CPU cycles/packet=3D%u (total cycles=3D" > + "%"PRIu64" / total TX packets=3D%"PRIu64")\n", > + (unsigned int)(tx_cycles / total_xmit), > + tx_cycles, total_xmit); > +#endif > } >=20 > void > @@ -1678,6 +1704,12 @@ struct extmem_param { #ifdef > RTE_TEST_PMD_RECORD_CORE_CYCLES > fs->core_cycles =3D 0; > #endif > +#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES > + fs->core_rx_cycles =3D 0; > +#endif > +#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES > + fs->core_tx_cycles =3D 0; > +#endif > } > } >=20 > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index > 1d9b7a2..4e8af8a 100644 > --- a/app/test-pmd/testpmd.h > +++ b/app/test-pmd/testpmd.h > @@ -130,12 +130,52 @@ struct fwd_stream { #ifdef > RTE_TEST_PMD_RECORD_CORE_CYCLES > uint64_t core_cycles; /**< used for RX and TX processing */ > #endif > +#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES > + uint64_t core_tx_cycles; /**< used for tx_burst processing */ > +#endif > +#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES > + uint64_t core_rx_cycles; /**< used for rx_burst processing */ > +#endif > #ifdef RTE_TEST_PMD_RECORD_BURST_STATS > struct pkt_burst_stats rx_burst_stats; > struct pkt_burst_stats tx_burst_stats; #endif }; >=20 > +#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES) > +#define TEST_PMD_CORE_CYC_TX_START(a) {a =3D rte_rdtsc(); } #else #defin= e > +TEST_PMD_CORE_CYC_TX_START(a) #endif > + > +#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \ > + defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES) > +#define TEST_PMD_CORE_CYC_RX_START(a) {a =3D rte_rdtsc(); } #else #defin= e > +TEST_PMD_CORE_CYC_RX_START(a) #endif > + > +#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > +#define TEST_PMD_CORE_CYC_FWD_ADD(fs, s) \ {uint64_t end_tsc =3D > +rte_rdtsc(); fs->core_cycles +=3D end_tsc - (s); } #else #define > +TEST_PMD_CORE_CYC_FWD_ADD(fs, s) #endif > + > +#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES > +#define TEST_PMD_CORE_CYC_TX_ADD(fs, s) \ {uint64_t end_tsc =3D > +rte_rdtsc(); fs->core_tx_cycles +=3D end_tsc - (s); } #else #define > +TEST_PMD_CORE_CYC_TX_ADD(fs, s) #endif > + > +#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES > +#define TEST_PMD_CORE_CYC_RX_ADD(fs, s) \ {uint64_t end_tsc =3D > +rte_rdtsc(); fs->core_rx_cycles +=3D end_tsc - (s); } #else #define > +TEST_PMD_CORE_CYC_RX_ADD(fs, s) #endif > + > /** Descriptor for a single flow. */ > struct port_flow { > struct port_flow *next; /**< Next flow in list. */ diff --git a/app/tes= t- > pmd/txonly.c b/app/test-pmd/txonly.c index fdfca14..fe3045a 100644 > --- a/app/test-pmd/txonly.c > +++ b/app/test-pmd/txonly.c > @@ -241,16 +241,16 @@ > uint32_t retry; > uint64_t ol_flags =3D 0; > uint64_t tx_offloads; > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - uint64_t start_tsc; > - uint64_t end_tsc; > - uint64_t core_cycles; > +#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES) > + uint64_t start_tx_tsc; > +#endif > +#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) > + uint64_t start_rx_tsc; > #endif >=20 > #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - start_tsc =3D rte_rdtsc(); > + TEST_PMD_CORE_CYC_RX_START(start_rx_tsc); > #endif > - > mbp =3D current_fwd_lcore()->mbp; > txp =3D &ports[fs->tx_port]; > tx_offloads =3D txp->dev_conf.txmode.offloads; @@ -302,7 +302,9 @@ > if (nb_pkt =3D=3D 0) > return; >=20 > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx =3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pk= t); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > /* > * Retry if necessary > */ > @@ -310,8 +312,10 @@ > retry =3D 0; > while (nb_tx < nb_pkt && retry++ < burst_tx_retry_num) { > rte_delay_us(burst_tx_delay_time); > + TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx +=3D rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > &pkts_burst[nb_tx], nb_pkt - nb_tx); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > } > } > fs->tx_packets +=3D nb_tx; > @@ -334,12 +338,7 @@ > rte_pktmbuf_free(pkts_burst[nb_tx]); > } while (++nb_tx < nb_pkt); > } > - > -#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > - end_tsc =3D rte_rdtsc(); > - core_cycles =3D (end_tsc - start_tsc); > - fs->core_cycles =3D (uint64_t) (fs->core_cycles + core_cycles); > -#endif > + TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc); > } >=20 > static void > diff --git a/config/common_base b/config/common_base index > 6b96e0e..6e84af4 100644 > --- a/config/common_base > +++ b/config/common_base > @@ -998,6 +998,8 @@ CONFIG_RTE_PROC_INFO=3Dn # > CONFIG_RTE_TEST_PMD=3Dy > CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=3Dn > +CONFIG_RTE_TEST_PMD_RECORD_CORE_RX_CYCLES=3Dn > +CONFIG_RTE_TEST_PMD_RECORD_CORE_TX_CYCLES=3Dn > CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=3Dn Should the RECORD macros be documented in the run_app.rst file ? =20 > # > -- > 1.8.3.1 Regards, Bernard