From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2AE81A0583; Fri, 20 Mar 2020 17:03:31 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E607E2BB9; Fri, 20 Mar 2020 17:03:29 +0100 (CET) Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com [209.85.208.194]) by dpdk.org (Postfix) with ESMTP id 3DC47F90 for ; Fri, 20 Mar 2020 17:03:29 +0100 (CET) Received: by mail-lj1-f194.google.com with SMTP id w1so6979080ljh.5 for ; Fri, 20 Mar 2020 09:03:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf-com.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=mHXGKLyuPEZdFTKuuXuuie15LciZ/zBeJa7Ra+Xo+eU=; b=KT4Bqu5eI+Y+6h+rm0dGy0ywpOJZeGgxPPd7F2eClKL5IlV7dYRm5Wxio22PT6pEtF DBIsxts8xzJKkb4fTV6G7zL2STZXTV7zWVPcdQD1wCAxn+maQQ1GH0z+TVIMZrhOHKx4 scdZbW1jcqzePLaTzPlBP9e8nUx3Mnnv4yGaS/dSk4tMVYEB7GqKV5wJXB7eKb81Vg/N aHAaQtdiRp1fJNMn4iXu8UJZ6UA2kpMupy11b/q7IfGbVr0APIElekHSBqK4mbCaODzN 1cLHfKb3dHLcLBeV2KOh2bhkMfB6uTgAcM/NIUIqouTtwZfuFF2fBSbMCwAlTCKP0Gu5 Hjrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=mHXGKLyuPEZdFTKuuXuuie15LciZ/zBeJa7Ra+Xo+eU=; b=owXQoAcyuX+OoHdayYvUPhxYH2epHZIrJBE1bwtTC5CrLKlpK/pCRNvTwajsZkg5Tv es3R1ndnBYwU4cpiKymxyJfagaZ7xv7b6lbncZKG4YfSADWKyuDZBNyxgXB6Y5mhyZwz aVbjceDdJvSSoGB9zKodleJQA8wagGLS8w/kQmwcdEuBp4T/4Ft77ulIbN90sIRnmLfG L8RdW+VboiKyDA8mwLyqjlWoyOhqfjdVVhSIuQuiVVyhmS0gNvnwgpUatVP8JaiNV6sx 4W9GG0qzQ6Fa/tuEY/UojT8ixH4uos3lcrdfcePx5bdBalkQhiT4IrnbP2qrp8+mSs+8 RtYA== X-Gm-Message-State: ANhLgQ06rNOX6KkcbOd2JqkluDg07ggE7KdrmG/5N8fDsnY+k9JPTi4A kxlnhkfSHFjbR7N9VuvGdAwvmA== X-Google-Smtp-Source: ADFU+vt01AiQfffivfYNlPrYRRbNtcZ7Qf/9R5XkmtTcQy7Oe7wcTNdgi+pul1LfztEqcrmS2YJ28Q== X-Received: by 2002:a05:651c:445:: with SMTP id g5mr5721362ljg.125.1584720208389; Fri, 20 Mar 2020 09:03:28 -0700 (PDT) Received: from [10.0.0.72] (31-172-191-173.noc.fibertech.net.pl. [31.172.191.173]) by smtp.gmail.com with ESMTPSA id u14sm4236165ljj.54.2020.03.20.09.03.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 20 Mar 2020 09:03:26 -0700 (PDT) To: Viacheslav Ovsiienko , dev@dpdk.org Cc: ferruh.yigit@intel.com, thomas@monjalon.net, bernard.iremonger@intel.com References: <1561553317-16777-1-git-send-email-viacheslavo@mellanox.com> <1584625851-10291-1-git-send-email-viacheslavo@mellanox.com> <1584625851-10291-4-git-send-email-viacheslavo@mellanox.com> From: Andrzej Ostruszka Message-ID: Date: Fri, 20 Mar 2020 17:03:24 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <1584625851-10291-4-git-send-email-viacheslavo@mellanox.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v2 3/3] app/testpmd: qualify profiling statistics on burst size X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Viacheslav, thanks for the patches. I'll comment here on the whole patch set. This patch set has many similar changes so I'll focus on the first (csumonly) and the comment will apply to all remaining - OK? On 3/19/20 2:50 PM, Viacheslav Ovsiienko wrote: > The execution time of rx/tx burst routine depends on the burst size. > It would be meaningful to research this dependency, the patch > provides an extra profiling data per rx/tx burst size. > > Signed-off-by: Viacheslav Ovsiienko > --- > app/test-pmd/csumonly.c | 11 +++++++---- > app/test-pmd/flowgen.c | 11 +++++++---- > app/test-pmd/icmpecho.c | 12 ++++++++---- > app/test-pmd/iofwd.c | 11 +++++++---- > app/test-pmd/macfwd.c | 11 +++++++---- > app/test-pmd/macswap.c | 11 +++++++---- > app/test-pmd/rxonly.c | 2 +- > app/test-pmd/softnicfwd.c | 11 +++++++---- > app/test-pmd/testpmd.c | 31 ++++++++++++++++++++++++------- > app/test-pmd/testpmd.h | 26 ++++++++++++++++++++++---- > app/test-pmd/txonly.c | 9 ++++++--- > 11 files changed, 103 insertions(+), 43 deletions(-) > > diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c > index 4104737..c966892 100644 > --- a/app/test-pmd/csumonly.c > +++ b/app/test-pmd/csumonly.c > @@ -797,7 +797,7 @@ struct simple_gre_hdr { > TEST_PMD_CORE_CYC_RX_START(start_rx_tsc); > nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst, > nb_pkt_per_burst); > - TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc); > + TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc, nb_rx); > if (unlikely(nb_rx == 0)) > return; > #ifdef RTE_TEST_PMD_RECORD_BURST_STATS > @@ -1067,7 +1067,7 @@ struct simple_gre_hdr { > TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, tx_pkts_burst, > nb_prep); > - TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc, nb_tx); > > /* > * Retry if necessary > @@ -1075,11 +1075,14 @@ struct simple_gre_hdr { > if (unlikely(nb_tx < nb_rx) && fs->retry_enabled) { > retry = 0; > while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) { > + uint16_t nb_rt; > + > rte_delay_us(burst_tx_delay_time); > TEST_PMD_CORE_CYC_TX_START(start_tx_tsc); > - nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > + nb_rt = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > &tx_pkts_burst[nb_tx], nb_rx - nb_tx); > - TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc); > + nb_tx += nb_rt; > + TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc, nb_rt); > } > } > fs->tx_packets += nb_tx; Below this line there is (not visible in patch) code (I'll name it as "total tx bump" in order to refer to it below): #ifdef RTE_TEST_PMD_RECORD_BURST_STATS fs->tx_burst_stats.pkt_burst_spread[nb_tx]++; #endif So when the BURST_STATS is enabled you will have: struct pkt_burst_stats { unsigned int pkt_burst_spread[MAX_PKT_BURST]; unsigned int pkt_retry_spread[MAX_PKT_BURST]; uint64_t pkt_ticks_spread[MAX_PKT_BURST]; }; * pkt_ticks_spread will contain ticks per every tx_burst attempt (spreaded by the number of packets sent), * pkt_retry_spread will contain number of tx_burst attempts (spreaded by the number of packets sent), * pkt_burst_spread will have logged +1 for each each bucket that is a sum of all packets sent in given rx/tx loop iteration What the last item actually gives the user that cannot be infered from the middle one? If I want to have total number of packets sent I can sum: slot*spread[slot] over slots. This should give the same result when done over burst_spread and retry_spread. I might be missing something here so could you elaborate - to me that "total tx bump" looks like an artificial stat? If there is actually nothing substantial there, then I'd suggest to remove pkt_retry_spread (that name actually is also wrong for the sunny day scenario when there are no retransmissions) and use pkt_burst_spread for that purpose. AFAIU the 'retry' member is not used at all for the RX path - correct? So my suggestion would be to: - keep only pkt_burst_spread and pkt_ticks_spread - use them properly per every rx/tx_burst attempt - skip the "total tx bump" code That way RX/TX stats would be symmetrical and reflect the actual rx/tx_burst attempts. [...] > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c > index b195880..1d4b55b 100644 > --- a/app/test-pmd/testpmd.c > +++ b/app/test-pmd/testpmd.c > @@ -1515,7 +1515,7 @@ struct extmem_param { > > #ifdef RTE_TEST_PMD_RECORD_BURST_STATS > static void > -pkt_burst_stats_display(const char *rx_tx, struct pkt_burst_stats *pbs) > +pkt_burst_stats_display(int nrx_tx, struct pkt_burst_stats *pbs) Maybe another name for 'nrx_tx'? Like 'is_tx' or 'tx_stats'? And maybe bool? > { > unsigned int total_burst; > unsigned int nb_burst; > @@ -1549,8 +1549,8 @@ struct extmem_param { > if (total_burst == 0) > return; > burst_percent[0] = (burst_stats[0] * 100) / total_burst; > - printf(" %s-bursts : %u [%d%% of %d pkts", rx_tx, total_burst, > - burst_percent[0], (int) pktnb_stats[0]); > + printf(" %s-bursts : %u [%d%% of %d pkts", nrx_tx ? "TX" : "RX", > + total_burst, burst_percent[0], (int) pktnb_stats[0]); > if (burst_stats[0] == total_burst) { > printf("]\n"); > return; > @@ -1568,6 +1568,23 @@ struct extmem_param { > } > printf(" + %d%% of %d pkts + %d%% of others]\n", > burst_percent[1], (int) pktnb_stats[1], burst_percent[2]); > +#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > + if (!(fwdprof_flags & (nrx_tx ? RECORD_CORE_CYCLES_TX > + : RECORD_CORE_CYCLES_RX))) > + return; > + for (nb_pkt = 0; nb_pkt < MAX_PKT_BURST; nb_pkt++) { > + nb_burst = nrx_tx ? pbs->pkt_retry_spread[nb_pkt] > + : pbs->pkt_burst_spread[nb_pkt]; So you actually don't use pkt_burst_spread for TX. Then just follow my suggestion above and remove 'retry' and use pkt_burst_spread as you use 'retry' now. > + if (nb_burst == 0) > + continue; > + printf(" CPU cycles/%u packet burst=%u (total cycles=" > + "%"PRIu64" / total %s bursts=%u)\n", > + (unsigned int)nb_pkt, > + (unsigned int)(pbs->pkt_ticks_spread[nb_pkt] / nb_burst), > + pbs->pkt_ticks_spread[nb_pkt], > + nrx_tx ? "TX" : "RX", nb_burst); > + } > +#endif > } > #endif /* RTE_TEST_PMD_RECORD_BURST_STATS */ [...] > --- a/app/test-pmd/testpmd.h > +++ b/app/test-pmd/testpmd.h > @@ -89,6 +89,10 @@ enum { > */ > struct pkt_burst_stats { > unsigned int pkt_burst_spread[MAX_PKT_BURST]; > +#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > + unsigned int pkt_retry_spread[MAX_PKT_BURST]; > + uint64_t pkt_ticks_spread[MAX_PKT_BURST]; > +#endif > }; > #endif > > @@ -340,21 +344,35 @@ struct queue_stats_mappings { > {if (fwdprof_flags & RECORD_CORE_CYCLES_FWD) \ > {uint64_t tsc = rte_rdtsc(); tsc -= (s); fs->core_cycles += tsc; } } General comment to the macros below - such macros are usually defined with the "traditional": do { ... } while (0) structure. That way they behave better when used in conditional statements: if (cond) MACRO(args); else ... With the definition below the "else" would produce compiler error. I know that they are not used that way, but I'd say it would be cleaner to do that way anyway :). > -#define TEST_PMD_CORE_CYC_TX_ADD(fs, s) \ > +#ifdef RTE_TEST_PMD_RECORD_BURST_STATS > +#define TEST_PMD_CORE_CYC_TX_ADD(fs, s, np) \ > +{if (fwdprof_flags & RECORD_CORE_CYCLES_TX) \ > +{uint64_t tsc = rte_rdtsc(); tsc -= (s); fs->core_tx_cycles += tsc; \ > +fs->tx_burst_stats.pkt_ticks_spread[np] += tsc; \ > +fs->tx_burst_stats.pkt_retry_spread[np]++; } } > + > +#define TEST_PMD_CORE_CYC_RX_ADD(fs, s, np) \ > +{if (fwdprof_flags & RECORD_CORE_CYCLES_RX) \ > +{uint64_t tsc = rte_rdtsc(); tsc -= (s); fs->core_rx_cycles += tsc; \ > +fs->rx_burst_stats.pkt_ticks_spread[np] += tsc; } } > + > +#else /* RTE_TEST_PMD_RECORD_BURST_STATS */ > +#define TEST_PMD_CORE_CYC_TX_ADD(fs, s, np) \ > {if (fwdprof_flags & RECORD_CORE_CYCLES_TX) \ > {uint64_t tsc = rte_rdtsc(); tsc -= (s); fs->core_tx_cycles += tsc; } } > > -#define TEST_PMD_CORE_CYC_RX_ADD(fs, s) \ > +#define TEST_PMD_CORE_CYC_RX_ADD(fs, s, np) \ > {if (fwdprof_flags & RECORD_CORE_CYCLES_RX) \ > {uint64_t tsc = rte_rdtsc(); tsc -= (s); fs->core_rx_cycles += tsc; } } > +#endif /* RTE_TEST_PMD_RECORD_BURST_STATS */ In addition to the above "do/while(0)" comment maybe you could merge these macros above like (not tested): #ifdef RTE_TEST_PMD_RECORD_BURST_STATS #define TEST_PMD_BURST_STATS_ADD(stats, np, tsc) \ stats.pkt_ticks_spread[np] += tsc; \ stats.pkt_burst_spread[np]++ #else #define TEST_PMD_BURST_STATS_ADD(stats, np, tsc) #endif #define TEST_PMD_CORE_CYC_TX_ADD(fs, s, np) \ do { if (fwdprof_flags & RECORD_CORE_CYCLES_TX) { \ uint64_t tsc = rte_rdtsc(); tsc -= (s); \ fs->core_tx_cycles += tsc; \ TEST_PMD_BURST_STATS_ADD(fs->tx_burst_stats, np, tsc); \ }} while(0) #define TEST_PMD_CORE_CYC_RX_ADD(fs, s, np) \ do { if (fwdprof_flags & RECORD_CORE_CYCLES_RX) { \ uint64_t tsc = rte_rdtsc(); tsc -= (s); \ fs->core_rx_cycles += tsc; \ TEST_PMD_BURST_STATS_ADD(fs->rx_burst_stats, np, tsc); \ }} while(0) That way handling of RX and TX would be very symmetrical - you'd just call TX_ADD/RX_ADD where needed and you'd remove both "total TX bump" and "total RX bump" code. Note however that accepting this suggestion and removal of "total RX bump" would also be a behaviour change - since currently when no packet is read nothing happens (apart from the tics updated) whereas with the proposed changes (and enabled burst stats) you'd be bumping rx_burst_stats.pkt_burst_spread[0]. I'd say that would be change for good since this would allow to have a glimpse how many iterations are idle. > #else > /* No profiling statistics is configured. */ > #define TEST_PMD_CORE_CYC_TX_START(a) > #define TEST_PMD_CORE_CYC_RX_START(a) > #define TEST_PMD_CORE_CYC_FWD_ADD(fs, s) > -#define TEST_PMD_CORE_CYC_TX_ADD(fs, s) > -#define TEST_PMD_CORE_CYC_RX_ADD(fs, s) > +#define TEST_PMD_CORE_CYC_TX_ADD(fs, s, np) > +#define TEST_PMD_CORE_CYC_RX_ADD(fs, s, np) > #endif /* RTE_TEST_PMD_RECORD_CORE_CYCLES */ [...] With regards Andrzej Ostruszka Reviewed-by: Andrzej Ostruszka