From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 5E2E65B2C for ; Wed, 3 Apr 2019 09:02:48 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C41961596; Wed, 3 Apr 2019 00:02:47 -0700 (PDT) Received: from phil-VirtualBox.shanghai.arm.com (unknown [10.169.109.179]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 553B33F68F; Wed, 3 Apr 2019 00:02:46 -0700 (PDT) From: Phil Yang To: dev@dpdk.org, thomas@monjalon.net Cc: david.hunt@intel.com, reshma.pattan@intel.com, gavin.hu@arm.com, honnappa.nagarahalli@arm.com, phil.yang@arm.com, nd@arm.com Date: Wed, 3 Apr 2019 14:59:54 +0800 Message-Id: <1554274796-23258-2-git-send-email-phil.yang@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1554274796-23258-1-git-send-email-phil.yang@arm.com> References: <1546508946-12552-1-git-send-email-phil.yang@arm.com> <1554274796-23258-1-git-send-email-phil.yang@arm.com> Subject: [dpdk-dev] [PATCH v3 1/3] packet_ordering: add statistics for each worker thread X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Apr 2019 07:02:48 -0000 The current implementation using the '__sync' built-ins to synchronize statistics within worker threads. The '__sync' built-ins functions are full barriers which will affect the performance, so add a per worker packets statistics to remove the synchronisation between worker threads. Since the maximum core number can get to 256, so disable the per core stats print in default and add the --insight-worker option to enable it. For example: sudo examples/packet_ordering/arm64-armv8a-linuxapp-gcc/packet_ordering \ -l 112-115 --socket-mem=1024,1024 -n 4 -- -p 0x03 --insight-worker RX thread stats: - Pkts rxd: 226539223 - Pkts enqd to workers ring: 226539223 Worker thread stats on core [113]: - Pkts deqd from workers ring: 77557888 - Pkts enqd to tx ring: 77557888 - Pkts enq to tx failed: 0 Worker thread stats on core [114]: - Pkts deqd from workers ring: 148981335 - Pkts enqd to tx ring: 148981335 - Pkts enq to tx failed: 0 Worker thread stats: - Pkts deqd from workers ring: 226539223 - Pkts enqd to tx ring: 226539223 - Pkts enq to tx failed: 0 TX stats: - Pkts deqd from tx ring: 226539223 - Ro Pkts transmitted: 226539168 - Ro Pkts tx failed: 0 - Pkts transmitted w/o reorder: 0 - Pkts tx failed w/o reorder: 0 Suggested-by: Honnappa Nagarahalli Signed-off-by: Phil Yang Reviewed-by: Gavin Hu --- doc/guides/sample_app_ug/packet_ordering.rst | 4 ++- examples/packet_ordering/main.c | 50 +++++++++++++++++++++++++--- 2 files changed, 48 insertions(+), 6 deletions(-) diff --git a/doc/guides/sample_app_ug/packet_ordering.rst b/doc/guides/sample_app_ug/packet_ordering.rst index 7cfcf3f..1c8ee5d 100644 --- a/doc/guides/sample_app_ug/packet_ordering.rst +++ b/doc/guides/sample_app_ug/packet_ordering.rst @@ -43,7 +43,7 @@ The application execution command line is: .. code-block:: console - ./test-pipeline [EAL options] -- -p PORTMASK [--disable-reorder] + ./packet_ordering [EAL options] -- -p PORTMASK [--disable-reorder] [--insight-worker] The -c EAL CPU_COREMASK option has to contain at least 3 CPU cores. The first CPU core in the core mask is the master core and would be assigned to @@ -56,3 +56,5 @@ then the other pair from 2 to 3 and from 3 to 2, having [0,1] and [2,3] pairs. The disable-reorder long option does, as its name implies, disable the reordering of traffic, which should help evaluate reordering performance impact. + +The insight-worker long option enables output the packet statistics of each worker thread. diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c index 149bfdd..8145074 100644 --- a/examples/packet_ordering/main.c +++ b/examples/packet_ordering/main.c @@ -31,6 +31,7 @@ unsigned int portmask; unsigned int disable_reorder; +unsigned int insight_worker; volatile uint8_t quit_signal; static struct rte_mempool *mbuf_pool; @@ -71,6 +72,14 @@ volatile struct app_stats { } tx __rte_cache_aligned; } app_stats; +/* per worker lcore stats */ +struct wkr_stats_per { + uint64_t deq_pkts; + uint64_t enq_pkts; + uint64_t enq_failed_pkts; +} __rte_cache_aligned; + +static struct wkr_stats_per wkr_stats[RTE_MAX_LCORE] = {0}; /** * Get the last enabled lcore ID * @@ -152,6 +161,7 @@ parse_args(int argc, char **argv) char *prgname = argv[0]; static struct option lgopts[] = { {"disable-reorder", 0, 0, 0}, + {"insight-worker", 0, 0, 0}, {NULL, 0, 0, 0} }; @@ -175,6 +185,11 @@ parse_args(int argc, char **argv) printf("reorder disabled\n"); disable_reorder = 1; } + if (!strcmp(lgopts[option_index].name, + "insight-worker")) { + printf("print all worker statistics\n"); + insight_worker = 1; + } break; default: print_usage(prgname); @@ -319,6 +334,11 @@ print_stats(void) { uint16_t i; struct rte_eth_stats eth_stats; + unsigned int lcore_id, last_lcore_id, master_lcore_id, end_w_lcore_id; + + last_lcore_id = get_last_lcore_id(); + master_lcore_id = rte_get_master_lcore(); + end_w_lcore_id = get_previous_lcore_id(last_lcore_id); printf("\nRX thread stats:\n"); printf(" - Pkts rxd: %"PRIu64"\n", @@ -326,6 +346,26 @@ print_stats(void) printf(" - Pkts enqd to workers ring: %"PRIu64"\n", app_stats.rx.enqueue_pkts); + for (lcore_id = 0; lcore_id <= end_w_lcore_id; lcore_id++) { + if (insight_worker + && rte_lcore_is_enabled(lcore_id) + && lcore_id != master_lcore_id) { + printf("\nWorker thread stats on core [%u]:\n", + lcore_id); + printf(" - Pkts deqd from workers ring: %"PRIu64"\n", + wkr_stats[lcore_id].deq_pkts); + printf(" - Pkts enqd to tx ring: %"PRIu64"\n", + wkr_stats[lcore_id].enq_pkts); + printf(" - Pkts enq to tx failed: %"PRIu64"\n", + wkr_stats[lcore_id].enq_failed_pkts); + } + + app_stats.wkr.dequeue_pkts += wkr_stats[lcore_id].deq_pkts; + app_stats.wkr.enqueue_pkts += wkr_stats[lcore_id].enq_pkts; + app_stats.wkr.enqueue_failed_pkts += + wkr_stats[lcore_id].enq_failed_pkts; + } + printf("\nWorker thread stats:\n"); printf(" - Pkts deqd from workers ring: %"PRIu64"\n", app_stats.wkr.dequeue_pkts); @@ -432,13 +472,14 @@ worker_thread(void *args_ptr) struct rte_mbuf *burst_buffer[MAX_PKTS_BURST] = { NULL }; struct rte_ring *ring_in, *ring_out; const unsigned xor_val = (nb_ports > 1); + unsigned int core_id = rte_lcore_id(); args = (struct worker_thread_args *) args_ptr; ring_in = args->ring_in; ring_out = args->ring_out; RTE_LOG(INFO, REORDERAPP, "%s() started on lcore %u\n", __func__, - rte_lcore_id()); + core_id); while (!quit_signal) { @@ -448,7 +489,7 @@ worker_thread(void *args_ptr) if (unlikely(burst_size == 0)) continue; - __sync_fetch_and_add(&app_stats.wkr.dequeue_pkts, burst_size); + wkr_stats[core_id].deq_pkts += burst_size; /* just do some operation on mbuf */ for (i = 0; i < burst_size;) @@ -457,11 +498,10 @@ worker_thread(void *args_ptr) /* enqueue the modified mbufs to workers_to_tx ring */ ret = rte_ring_enqueue_burst(ring_out, (void *)burst_buffer, burst_size, NULL); - __sync_fetch_and_add(&app_stats.wkr.enqueue_pkts, ret); + wkr_stats[core_id].enq_pkts += ret; if (unlikely(ret < burst_size)) { /* Return the mbufs to their respective pool, dropping packets */ - __sync_fetch_and_add(&app_stats.wkr.enqueue_failed_pkts, - (int)burst_size - ret); + wkr_stats[core_id].enq_failed_pkts += burst_size - ret; pktmbuf_free_bulk(&burst_buffer[ret], burst_size - ret); } } -- 2.7.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id D3F71A0679 for ; Wed, 3 Apr 2019 09:02:58 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D7AF85F72; Wed, 3 Apr 2019 09:02:51 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 5E2E65B2C for ; Wed, 3 Apr 2019 09:02:48 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C41961596; Wed, 3 Apr 2019 00:02:47 -0700 (PDT) Received: from phil-VirtualBox.shanghai.arm.com (unknown [10.169.109.179]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 553B33F68F; Wed, 3 Apr 2019 00:02:46 -0700 (PDT) From: Phil Yang To: dev@dpdk.org, thomas@monjalon.net Cc: david.hunt@intel.com, reshma.pattan@intel.com, gavin.hu@arm.com, honnappa.nagarahalli@arm.com, phil.yang@arm.com, nd@arm.com Date: Wed, 3 Apr 2019 14:59:54 +0800 Message-Id: <1554274796-23258-2-git-send-email-phil.yang@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1554274796-23258-1-git-send-email-phil.yang@arm.com> References: <1546508946-12552-1-git-send-email-phil.yang@arm.com> <1554274796-23258-1-git-send-email-phil.yang@arm.com> Subject: [dpdk-dev] [PATCH v3 1/3] packet_ordering: add statistics for each worker thread X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Content-Type: text/plain; charset="UTF-8" Message-ID: <20190403065954.CE383M5cNq7skeqtF15d74mpQYg7obmj25TLQ8penpo@z> The current implementation using the '__sync' built-ins to synchronize statistics within worker threads. The '__sync' built-ins functions are full barriers which will affect the performance, so add a per worker packets statistics to remove the synchronisation between worker threads. Since the maximum core number can get to 256, so disable the per core stats print in default and add the --insight-worker option to enable it. For example: sudo examples/packet_ordering/arm64-armv8a-linuxapp-gcc/packet_ordering \ -l 112-115 --socket-mem=1024,1024 -n 4 -- -p 0x03 --insight-worker RX thread stats: - Pkts rxd: 226539223 - Pkts enqd to workers ring: 226539223 Worker thread stats on core [113]: - Pkts deqd from workers ring: 77557888 - Pkts enqd to tx ring: 77557888 - Pkts enq to tx failed: 0 Worker thread stats on core [114]: - Pkts deqd from workers ring: 148981335 - Pkts enqd to tx ring: 148981335 - Pkts enq to tx failed: 0 Worker thread stats: - Pkts deqd from workers ring: 226539223 - Pkts enqd to tx ring: 226539223 - Pkts enq to tx failed: 0 TX stats: - Pkts deqd from tx ring: 226539223 - Ro Pkts transmitted: 226539168 - Ro Pkts tx failed: 0 - Pkts transmitted w/o reorder: 0 - Pkts tx failed w/o reorder: 0 Suggested-by: Honnappa Nagarahalli Signed-off-by: Phil Yang Reviewed-by: Gavin Hu --- doc/guides/sample_app_ug/packet_ordering.rst | 4 ++- examples/packet_ordering/main.c | 50 +++++++++++++++++++++++++--- 2 files changed, 48 insertions(+), 6 deletions(-) diff --git a/doc/guides/sample_app_ug/packet_ordering.rst b/doc/guides/sample_app_ug/packet_ordering.rst index 7cfcf3f..1c8ee5d 100644 --- a/doc/guides/sample_app_ug/packet_ordering.rst +++ b/doc/guides/sample_app_ug/packet_ordering.rst @@ -43,7 +43,7 @@ The application execution command line is: .. code-block:: console - ./test-pipeline [EAL options] -- -p PORTMASK [--disable-reorder] + ./packet_ordering [EAL options] -- -p PORTMASK [--disable-reorder] [--insight-worker] The -c EAL CPU_COREMASK option has to contain at least 3 CPU cores. The first CPU core in the core mask is the master core and would be assigned to @@ -56,3 +56,5 @@ then the other pair from 2 to 3 and from 3 to 2, having [0,1] and [2,3] pairs. The disable-reorder long option does, as its name implies, disable the reordering of traffic, which should help evaluate reordering performance impact. + +The insight-worker long option enables output the packet statistics of each worker thread. diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c index 149bfdd..8145074 100644 --- a/examples/packet_ordering/main.c +++ b/examples/packet_ordering/main.c @@ -31,6 +31,7 @@ unsigned int portmask; unsigned int disable_reorder; +unsigned int insight_worker; volatile uint8_t quit_signal; static struct rte_mempool *mbuf_pool; @@ -71,6 +72,14 @@ volatile struct app_stats { } tx __rte_cache_aligned; } app_stats; +/* per worker lcore stats */ +struct wkr_stats_per { + uint64_t deq_pkts; + uint64_t enq_pkts; + uint64_t enq_failed_pkts; +} __rte_cache_aligned; + +static struct wkr_stats_per wkr_stats[RTE_MAX_LCORE] = {0}; /** * Get the last enabled lcore ID * @@ -152,6 +161,7 @@ parse_args(int argc, char **argv) char *prgname = argv[0]; static struct option lgopts[] = { {"disable-reorder", 0, 0, 0}, + {"insight-worker", 0, 0, 0}, {NULL, 0, 0, 0} }; @@ -175,6 +185,11 @@ parse_args(int argc, char **argv) printf("reorder disabled\n"); disable_reorder = 1; } + if (!strcmp(lgopts[option_index].name, + "insight-worker")) { + printf("print all worker statistics\n"); + insight_worker = 1; + } break; default: print_usage(prgname); @@ -319,6 +334,11 @@ print_stats(void) { uint16_t i; struct rte_eth_stats eth_stats; + unsigned int lcore_id, last_lcore_id, master_lcore_id, end_w_lcore_id; + + last_lcore_id = get_last_lcore_id(); + master_lcore_id = rte_get_master_lcore(); + end_w_lcore_id = get_previous_lcore_id(last_lcore_id); printf("\nRX thread stats:\n"); printf(" - Pkts rxd: %"PRIu64"\n", @@ -326,6 +346,26 @@ print_stats(void) printf(" - Pkts enqd to workers ring: %"PRIu64"\n", app_stats.rx.enqueue_pkts); + for (lcore_id = 0; lcore_id <= end_w_lcore_id; lcore_id++) { + if (insight_worker + && rte_lcore_is_enabled(lcore_id) + && lcore_id != master_lcore_id) { + printf("\nWorker thread stats on core [%u]:\n", + lcore_id); + printf(" - Pkts deqd from workers ring: %"PRIu64"\n", + wkr_stats[lcore_id].deq_pkts); + printf(" - Pkts enqd to tx ring: %"PRIu64"\n", + wkr_stats[lcore_id].enq_pkts); + printf(" - Pkts enq to tx failed: %"PRIu64"\n", + wkr_stats[lcore_id].enq_failed_pkts); + } + + app_stats.wkr.dequeue_pkts += wkr_stats[lcore_id].deq_pkts; + app_stats.wkr.enqueue_pkts += wkr_stats[lcore_id].enq_pkts; + app_stats.wkr.enqueue_failed_pkts += + wkr_stats[lcore_id].enq_failed_pkts; + } + printf("\nWorker thread stats:\n"); printf(" - Pkts deqd from workers ring: %"PRIu64"\n", app_stats.wkr.dequeue_pkts); @@ -432,13 +472,14 @@ worker_thread(void *args_ptr) struct rte_mbuf *burst_buffer[MAX_PKTS_BURST] = { NULL }; struct rte_ring *ring_in, *ring_out; const unsigned xor_val = (nb_ports > 1); + unsigned int core_id = rte_lcore_id(); args = (struct worker_thread_args *) args_ptr; ring_in = args->ring_in; ring_out = args->ring_out; RTE_LOG(INFO, REORDERAPP, "%s() started on lcore %u\n", __func__, - rte_lcore_id()); + core_id); while (!quit_signal) { @@ -448,7 +489,7 @@ worker_thread(void *args_ptr) if (unlikely(burst_size == 0)) continue; - __sync_fetch_and_add(&app_stats.wkr.dequeue_pkts, burst_size); + wkr_stats[core_id].deq_pkts += burst_size; /* just do some operation on mbuf */ for (i = 0; i < burst_size;) @@ -457,11 +498,10 @@ worker_thread(void *args_ptr) /* enqueue the modified mbufs to workers_to_tx ring */ ret = rte_ring_enqueue_burst(ring_out, (void *)burst_buffer, burst_size, NULL); - __sync_fetch_and_add(&app_stats.wkr.enqueue_pkts, ret); + wkr_stats[core_id].enq_pkts += ret; if (unlikely(ret < burst_size)) { /* Return the mbufs to their respective pool, dropping packets */ - __sync_fetch_and_add(&app_stats.wkr.enqueue_failed_pkts, - (int)burst_size - ret); + wkr_stats[core_id].enq_failed_pkts += burst_size - ret; pktmbuf_free_bulk(&burst_buffer[ret], burst_size - ret); } } -- 2.7.4