DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/4] app/flow-perf: add multi threaded support
@ 2020-11-26 11:15 Wisam Jaddo
  2020-11-26 11:15 ` [dpdk-dev] [PATCH 1/4] app/flow-perf: refactor flows handler Wisam Jaddo
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Wisam Jaddo @ 2020-11-26 11:15 UTC (permalink / raw)
  To: thomas, arybchenko, suanmingm, akozyrev; +Cc: dev

After this series the application will start supporting testing
multiple threaded insertion and deletion rates.

Also it will provide the latency & throughput rates of all threads.


Wisam Jaddo (4):
  app/flow-perf: refactor flows handler
  app/flow-perf: add multiple cores insertion and deletion
  app/flow-perf: change clock measurement functions
  app/flow-perf: remove redundant items memset and vars

 app/test-flow-perf/actions_gen.c | 205 +++++++------
 app/test-flow-perf/actions_gen.h |   2 +-
 app/test-flow-perf/config.h      |   1 +
 app/test-flow-perf/flow_gen.c    |   5 +-
 app/test-flow-perf/flow_gen.h    |   1 +
 app/test-flow-perf/items_gen.c   | 206 +++++--------
 app/test-flow-perf/items_gen.h   |   2 +-
 app/test-flow-perf/main.c        | 486 +++++++++++++++++++++----------
 doc/guides/tools/flow-perf.rst   |  14 +-
 9 files changed, 524 insertions(+), 398 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH 1/4] app/flow-perf: refactor flows handler
  2020-11-26 11:15 [dpdk-dev] [PATCH 0/4] app/flow-perf: add multi threaded support Wisam Jaddo
@ 2020-11-26 11:15 ` Wisam Jaddo
  2020-11-26 11:15 ` [dpdk-dev] [PATCH 2/4] app/flow-perf: add multiple cores insertion and deletion Wisam Jaddo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Wisam Jaddo @ 2020-11-26 11:15 UTC (permalink / raw)
  To: thomas, arybchenko, suanmingm, akozyrev; +Cc: dev

Provide the flows_handler() function the ability to control
flow performance processes. It is made possible after the
introduction of the insert_flows() function.

Also provide to the flows_handler() function the ability to print
the DPDK layer memory consumption of rte_flow rule, regardless
if deletion feature is enabled or not, while in previous
solution it was printing all memory changes after flows_handler().
Thus if deletion is there, it will not provide any memory that
represents the rte_flow rule size.

Also current design is easier to read and understand.

Signed-off-by: Wisam Jaddo <wisamm@nvidia.com>
Reviewed-by: Alexander Kozyrev <akozyrev@nvidia.com>
Reviewed-by: Suanming Mou <suanmingm@nvidia.com>
---
 app/test-flow-perf/main.c | 300 ++++++++++++++++++++------------------
 1 file changed, 158 insertions(+), 142 deletions(-)

diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
index e2fc5b7f65..5ec9a15c61 100644
--- a/app/test-flow-perf/main.c
+++ b/app/test-flow-perf/main.c
@@ -38,7 +38,7 @@
 #include "config.h"
 #include "flow_gen.h"
 
-#define MAX_ITERATIONS             100
+#define MAX_BATCHES_COUNT          100
 #define DEFAULT_RULES_COUNT    4000000
 #define DEFAULT_RULES_BATCH     100000
 #define DEFAULT_GROUP                0
@@ -826,188 +826,210 @@ print_flow_error(struct rte_flow_error error)
 }
 
 static inline void
-destroy_flows(int port_id, struct rte_flow **flow_list)
+print_rules_batches(double *cpu_time_per_batch)
+{
+	uint8_t idx;
+	double delta;
+	double rate;
+
+	for (idx = 0; idx < MAX_BATCHES_COUNT; idx++) {
+		if (!cpu_time_per_batch[idx])
+			break;
+		delta = (double)(rules_batch / cpu_time_per_batch[idx]);
+		rate = delta / 1000; /* Save rate in K unit. */
+		printf(":: Rules batch #%d: %d rules "
+			"in %f sec[ Rate = %f K Rule/Sec ]\n",
+			idx, rules_batch,
+			cpu_time_per_batch[idx], rate);
+	}
+}
+
+static inline void
+destroy_flows(int port_id, struct rte_flow **flows_list)
 {
 	struct rte_flow_error error;
-	clock_t start_iter, end_iter;
+	clock_t start_batch, end_batch;
 	double cpu_time_used = 0;
-	double flows_rate;
-	double cpu_time_per_iter[MAX_ITERATIONS];
+	double deletion_rate;
+	double cpu_time_per_batch[MAX_BATCHES_COUNT] = { 0 };
 	double delta;
 	uint32_t i;
-	int iter_id;
-
-	for (i = 0; i < MAX_ITERATIONS; i++)
-		cpu_time_per_iter[i] = -1;
-
-	if (rules_batch > rules_count)
-		rules_batch = rules_count;
+	int rules_batch_idx;
 
 	/* Deletion Rate */
-	printf("Flows Deletion on port = %d\n", port_id);
-	start_iter = clock();
+	printf("\nRules Deletion on port = %d\n", port_id);
+
+	start_batch = clock();
 	for (i = 0; i < rules_count; i++) {
-		if (flow_list[i] == 0)
+		if (flows_list[i] == 0)
 			break;
 
 		memset(&error, 0x33, sizeof(error));
-		if (rte_flow_destroy(port_id, flow_list[i], &error)) {
+		if (rte_flow_destroy(port_id, flows_list[i], &error)) {
 			print_flow_error(error);
 			rte_exit(EXIT_FAILURE, "Error in deleting flow");
 		}
 
-		if (i && !((i + 1) % rules_batch)) {
-			/* Save the deletion rate of each iter */
-			end_iter = clock();
-			delta = (double) (end_iter - start_iter);
-			iter_id = ((i + 1) / rules_batch) - 1;
-			cpu_time_per_iter[iter_id] =
-				delta / CLOCKS_PER_SEC;
-			cpu_time_used += cpu_time_per_iter[iter_id];
-			start_iter = clock();
+		/*
+		 * Save the deletion rate for rules batch.
+		 * Check if the deletion reached the rules
+		 * patch counter, then save the deletion rate
+		 * for this batch.
+		 */
+		if (!((i + 1) % rules_batch)) {
+			end_batch = clock();
+			delta = (double) (end_batch - start_batch);
+			rules_batch_idx = ((i + 1) / rules_batch) - 1;
+			cpu_time_per_batch[rules_batch_idx] = delta / CLOCKS_PER_SEC;
+			cpu_time_used += cpu_time_per_batch[rules_batch_idx];
+			start_batch = clock();
 		}
 	}
 
-	/* Deletion rate per iteration */
+	/* Print deletion rates for all batches */
 	if (dump_iterations)
-		for (i = 0; i < MAX_ITERATIONS; i++) {
-			if (cpu_time_per_iter[i] == -1)
-				continue;
-			delta = (double)(rules_batch /
-				cpu_time_per_iter[i]);
-			flows_rate = delta / 1000;
-			printf(":: Iteration #%d: %d flows "
-				"in %f sec[ Rate = %f K/Sec ]\n",
-				i, rules_batch,
-				cpu_time_per_iter[i], flows_rate);
-		}
+		print_rules_batches(cpu_time_per_batch);
 
-	/* Deletion rate for all flows */
-	flows_rate = ((double) (rules_count / cpu_time_used) / 1000);
-	printf("\n:: Total flow deletion rate -> %f K/Sec\n",
-		flows_rate);
-	printf(":: The time for deleting %d in flows %f seconds\n",
+	/* Deletion rate for all rules */
+	deletion_rate = ((double) (rules_count / cpu_time_used) / 1000);
+	printf(":: Total rules deletion rate -> %f K Rule/Sec\n",
+		deletion_rate);
+	printf(":: The time for deleting %d in rules %f seconds\n",
 		rules_count, cpu_time_used);
 }
 
-static inline void
-flows_handler(void)
+static struct rte_flow **
+insert_flows(int port_id)
 {
-	struct rte_flow **flow_list;
+	struct rte_flow **flows_list;
 	struct rte_flow_error error;
-	clock_t start_iter, end_iter;
+	clock_t start_batch, end_batch;
 	double cpu_time_used;
-	double flows_rate;
-	double cpu_time_per_iter[MAX_ITERATIONS];
+	double insertion_rate;
+	double cpu_time_per_batch[MAX_BATCHES_COUNT] = { 0 };
 	double delta;
-	uint16_t nr_ports;
-	uint32_t i;
-	int port_id;
-	int iter_id;
 	uint32_t flow_index;
+	uint32_t counter;
 	uint64_t global_items[MAX_ITEMS_NUM] = { 0 };
 	uint64_t global_actions[MAX_ACTIONS_NUM] = { 0 };
+	int rules_batch_idx;
 
 	global_items[0] = FLOW_ITEM_MASK(RTE_FLOW_ITEM_TYPE_ETH);
 	global_actions[0] = FLOW_ITEM_MASK(RTE_FLOW_ACTION_TYPE_JUMP);
 
-	nr_ports = rte_eth_dev_count_avail();
+	flows_list = rte_zmalloc("flows_list",
+		(sizeof(struct rte_flow *) * rules_count) + 1, 0);
+	if (flows_list == NULL)
+		rte_exit(EXIT_FAILURE, "No Memory available!");
+
+	cpu_time_used = 0;
+	flow_index = 0;
+	if (flow_group > 0) {
+		/*
+		 * Create global rule to jump into flow_group,
+		 * this way the app will avoid the default rules.
+		 *
+		 * Global rule:
+		 * group 0 eth / end actions jump group <flow_group>
+		 */
+		flow = generate_flow(port_id, 0, flow_attrs,
+			global_items, global_actions,
+			flow_group, 0, 0, 0, 0, &error);
+
+		if (flow == NULL) {
+			print_flow_error(error);
+			rte_exit(EXIT_FAILURE, "error in creating flow");
+		}
+		flows_list[flow_index++] = flow;
+	}
+
+	/* Insertion Rate */
+	printf("Rules insertion on port = %d\n", port_id);
+	start_batch = clock();
+	for (counter = 0; counter < rules_count; counter++) {
+		flow = generate_flow(port_id, flow_group,
+			flow_attrs, flow_items, flow_actions,
+			JUMP_ACTION_TABLE, counter,
+			hairpin_queues_num,
+			encap_data, decap_data,
+			&error);
+
+		if (force_quit)
+			counter = rules_count;
+
+		if (!flow) {
+			print_flow_error(error);
+			rte_exit(EXIT_FAILURE, "error in creating flow");
+		}
 
-	for (i = 0; i < MAX_ITERATIONS; i++)
-		cpu_time_per_iter[i] = -1;
+		flows_list[flow_index++] = flow;
+
+		/*
+		 * Save the insertion rate for rules batch.
+		 * Check if the insertion reached the rules
+		 * patch counter, then save the insertion rate
+		 * for this batch.
+		 */
+		if (!((counter + 1) % rules_batch)) {
+			end_batch = clock();
+			delta = (double) (end_batch - start_batch);
+			rules_batch_idx = ((counter + 1) / rules_batch) - 1;
+			cpu_time_per_batch[rules_batch_idx] = delta / CLOCKS_PER_SEC;
+			cpu_time_used += cpu_time_per_batch[rules_batch_idx];
+			start_batch = clock();
+		}
+	}
+
+	/* Print insertion rates for all batches */
+	if (dump_iterations)
+		print_rules_batches(cpu_time_per_batch);
+
+	/* Insertion rate for all rules */
+	insertion_rate = ((double) (rules_count / cpu_time_used) / 1000);
+	printf(":: Total flow insertion rate -> %f K Rule/Sec\n",
+			insertion_rate);
+	printf(":: The time for creating %d in flows %f seconds\n",
+			rules_count, cpu_time_used);
+
+	return flows_list;
+}
+
+static inline void
+flows_handler(void)
+{
+	struct rte_flow **flows_list;
+	uint16_t nr_ports;
+	int64_t alloc, last_alloc;
+	int flow_size_in_bytes;
+	int port_id;
+
+	nr_ports = rte_eth_dev_count_avail();
 
 	if (rules_batch > rules_count)
 		rules_batch = rules_count;
 
-	printf(":: Flows Count per port: %d\n", rules_count);
-
-	flow_list = rte_zmalloc("flow_list",
-		(sizeof(struct rte_flow *) * rules_count) + 1, 0);
-	if (flow_list == NULL)
-		rte_exit(EXIT_FAILURE, "No Memory available!");
+	printf(":: Rules Count per port: %d\n\n", rules_count);
 
 	for (port_id = 0; port_id < nr_ports; port_id++) {
 		/* If port outside portmask */
 		if (!((ports_mask >> port_id) & 0x1))
 			continue;
-		cpu_time_used = 0;
-		flow_index = 0;
-		if (flow_group > 0) {
-			/*
-			 * Create global rule to jump into flow_group,
-			 * this way the app will avoid the default rules.
-			 *
-			 * Global rule:
-			 * group 0 eth / end actions jump group <flow_group>
-			 *
-			 */
-			flow = generate_flow(port_id, 0, flow_attrs,
-				global_items, global_actions,
-				flow_group, 0, 0, 0, 0, &error);
 
-			if (flow == NULL) {
-				print_flow_error(error);
-				rte_exit(EXIT_FAILURE, "error in creating flow");
-			}
-			flow_list[flow_index++] = flow;
-		}
+		/* Insertion part. */
+		last_alloc = (int64_t)dump_socket_mem(stdout);
+		flows_list = insert_flows(port_id);
+		alloc = (int64_t)dump_socket_mem(stdout);
 
-		/* Insertion Rate */
-		printf("Flows insertion on port = %d\n", port_id);
-		start_iter = clock();
-		for (i = 0; i < rules_count; i++) {
-			flow = generate_flow(port_id, flow_group,
-				flow_attrs, flow_items, flow_actions,
-				JUMP_ACTION_TABLE, i,
-				hairpin_queues_num,
-				encap_data, decap_data,
-				&error);
-
-			if (force_quit)
-				i = rules_count;
-
-			if (!flow) {
-				print_flow_error(error);
-				rte_exit(EXIT_FAILURE, "error in creating flow");
-			}
+		/* Deletion part. */
+		if (delete_flag)
+			destroy_flows(port_id, flows_list);
 
-			flow_list[flow_index++] = flow;
-
-			if (i && !((i + 1) % rules_batch)) {
-				/* Save the insertion rate of each iter */
-				end_iter = clock();
-				delta = (double) (end_iter - start_iter);
-				iter_id = ((i + 1) / rules_batch) - 1;
-				cpu_time_per_iter[iter_id] =
-					delta / CLOCKS_PER_SEC;
-				cpu_time_used += cpu_time_per_iter[iter_id];
-				start_iter = clock();
-			}
+		/* Report rte_flow size in huge pages. */
+		if (last_alloc) {
+			flow_size_in_bytes = (alloc - last_alloc) / rules_count;
+			printf("\n:: rte_flow size in DPDK layer: %d Bytes",
+				flow_size_in_bytes);
 		}
-
-		/* Iteration rate per iteration */
-		if (dump_iterations)
-			for (i = 0; i < MAX_ITERATIONS; i++) {
-				if (cpu_time_per_iter[i] == -1)
-					continue;
-				delta = (double)(rules_batch /
-					cpu_time_per_iter[i]);
-				flows_rate = delta / 1000;
-				printf(":: Iteration #%d: %d flows "
-					"in %f sec[ Rate = %f K/Sec ]\n",
-					i, rules_batch,
-					cpu_time_per_iter[i], flows_rate);
-			}
-
-		/* Insertion rate for all flows */
-		flows_rate = ((double) (rules_count / cpu_time_used) / 1000);
-		printf("\n:: Total flow insertion rate -> %f K/Sec\n",
-						flows_rate);
-		printf(":: The time for creating %d in flows %f seconds\n",
-						rules_count, cpu_time_used);
-
-		if (delete_flag)
-			destroy_flows(port_id, flow_list);
 	}
 }
 
@@ -1421,7 +1443,6 @@ main(int argc, char **argv)
 	int ret;
 	uint16_t port;
 	struct rte_flow_error error;
-	int64_t alloc, last_alloc;
 
 	ret = rte_eal_init(argc, argv);
 	if (ret < 0)
@@ -1449,13 +1470,7 @@ main(int argc, char **argv)
 	if (nb_lcores <= 1)
 		rte_exit(EXIT_FAILURE, "This app needs at least two cores\n");
 
-	last_alloc = (int64_t)dump_socket_mem(stdout);
 	flows_handler();
-	alloc = (int64_t)dump_socket_mem(stdout);
-
-	if (last_alloc)
-		fprintf(stdout, ":: Memory allocation change(M): %.6lf\n",
-		(alloc - last_alloc) / 1.0e6);
 
 	if (enable_fwd) {
 		init_lcore_info();
@@ -1468,5 +1483,6 @@ main(int argc, char **argv)
 			printf("Failed to stop device on port %u\n", port);
 		rte_eth_dev_close(port);
 	}
+	printf("\nBye ...\n");
 	return 0;
 }
-- 
2.21.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH 2/4] app/flow-perf: add multiple cores insertion and deletion
  2020-11-26 11:15 [dpdk-dev] [PATCH 0/4] app/flow-perf: add multi threaded support Wisam Jaddo
  2020-11-26 11:15 ` [dpdk-dev] [PATCH 1/4] app/flow-perf: refactor flows handler Wisam Jaddo
@ 2020-11-26 11:15 ` Wisam Jaddo
  2020-11-26 11:15 ` [dpdk-dev] [PATCH 3/4] app/flow-perf: change clock measurement functions Wisam Jaddo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Wisam Jaddo @ 2020-11-26 11:15 UTC (permalink / raw)
  To: thomas, arybchenko, suanmingm, akozyrev; +Cc: dev

One of the ways to increase the insertion/deletion rate is to use
multi-threaded insertion/deletion. Thus it's needed to have support
for testing and measure those rates using flow-perf application.

Now we generate cores and distribute all flows to those cores,
and start inserting/deleting in parallel.

The app now receive the cores count to use from command line option,
then it distribute the rte_flow rules evenly between the cores, and
start inserting/deleting. Each worker will report it's own results,
and in the end the MAIN worker will report the total results for all
cores.

The total results are calculated using RULES_COUNT divided over
max time used between all cores.

Also this touches the memory area, since inserting using multiple cores
in same time the pre solution for memory is not valid, thus now we save
memory before and after each allocation for all cores. In the end we
pick the min pre memory and the max post memory from all cores.

The difference between those values represent the total memory consumed
by the total rte_flow rules from all cores, and then report the total
size of single rte_flow in byte for each port.

How to use this feature:
--cores=N

Where 1 =< N <= RTE_MAX_LCORE

Signed-off-by: Wisam Jaddo <wisamm@nvidia.com>
Reviewed-by: Alexander Kozyrev <akozyrev@nvidia.com>
Reviewed-by: Suanming Mou <suanmingm@nvidia.com>
---
 app/test-flow-perf/actions_gen.c | 175 ++++++++++----------
 app/test-flow-perf/actions_gen.h |   2 +-
 app/test-flow-perf/config.h      |   1 +
 app/test-flow-perf/flow_gen.c    |   5 +-
 app/test-flow-perf/flow_gen.h    |   1 +
 app/test-flow-perf/items_gen.c   | 103 ++++++------
 app/test-flow-perf/items_gen.h   |   2 +-
 app/test-flow-perf/main.c        | 266 +++++++++++++++++++++++++------
 doc/guides/tools/flow-perf.rst   |  14 +-
 9 files changed, 372 insertions(+), 197 deletions(-)

diff --git a/app/test-flow-perf/actions_gen.c b/app/test-flow-perf/actions_gen.c
index ac525f6fdb..1364407056 100644
--- a/app/test-flow-perf/actions_gen.c
+++ b/app/test-flow-perf/actions_gen.c
@@ -29,6 +29,7 @@ struct additional_para {
 	uint32_t counter;
 	uint64_t encap_data;
 	uint64_t decap_data;
+	uint8_t core_idx;
 };
 
 /* Storage for struct rte_flow_action_raw_encap including external data. */
@@ -58,16 +59,16 @@ add_mark(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	struct additional_para para)
 {
-	static struct rte_flow_action_mark mark_action;
+	static struct rte_flow_action_mark mark_actions[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t counter = para.counter;
 
 	do {
 		/* Random values from 1 to 256 */
-		mark_action.id = (counter % 255) + 1;
+		mark_actions[para.core_idx].id = (counter % 255) + 1;
 	} while (0);
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_MARK;
-	actions[actions_counter].conf = &mark_action;
+	actions[actions_counter].conf = &mark_actions[para.core_idx];
 }
 
 static void
@@ -75,14 +76,14 @@ add_queue(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	struct additional_para para)
 {
-	static struct rte_flow_action_queue queue_action;
+	static struct rte_flow_action_queue queue_actions[RTE_MAX_LCORE] __rte_cache_aligned;
 
 	do {
-		queue_action.index = para.queue;
+		queue_actions[para.core_idx].index = para.queue;
 	} while (0);
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_QUEUE;
-	actions[actions_counter].conf = &queue_action;
+	actions[actions_counter].conf = &queue_actions[para.core_idx];
 }
 
 static void
@@ -105,39 +106,36 @@ add_rss(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	struct additional_para para)
 {
-	static struct rte_flow_action_rss *rss_action;
-	static struct action_rss_data *rss_data;
+	static struct action_rss_data *rss_data[RTE_MAX_LCORE] __rte_cache_aligned;
 
 	uint16_t queue;
 
-	if (rss_data == NULL)
-		rss_data = rte_malloc("rss_data",
+	if (rss_data[para.core_idx] == NULL)
+		rss_data[para.core_idx] = rte_malloc("rss_data",
 			sizeof(struct action_rss_data), 0);
 
-	if (rss_data == NULL)
+	if (rss_data[para.core_idx] == NULL)
 		rte_exit(EXIT_FAILURE, "No Memory available!");
 
-	*rss_data = (struct action_rss_data){
+	*rss_data[para.core_idx] = (struct action_rss_data){
 		.conf = (struct rte_flow_action_rss){
 			.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
 			.level = 0,
 			.types = GET_RSS_HF(),
-			.key_len = sizeof(rss_data->key),
+			.key_len = sizeof(rss_data[para.core_idx]->key),
 			.queue_num = para.queues_number,
-			.key = rss_data->key,
-			.queue = rss_data->queue,
+			.key = rss_data[para.core_idx]->key,
+			.queue = rss_data[para.core_idx]->queue,
 		},
 		.key = { 1 },
 		.queue = { 0 },
 	};
 
 	for (queue = 0; queue < para.queues_number; queue++)
-		rss_data->queue[queue] = para.queues[queue];
-
-	rss_action = &rss_data->conf;
+		rss_data[para.core_idx]->queue[queue] = para.queues[queue];
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_RSS;
-	actions[actions_counter].conf = rss_action;
+	actions[actions_counter].conf = &rss_data[para.core_idx]->conf;
 }
 
 static void
@@ -212,7 +210,7 @@ add_set_src_mac(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_mac set_mac;
+	static struct rte_flow_action_set_mac set_macs[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t mac = para.counter;
 	uint16_t i;
 
@@ -222,12 +220,12 @@ add_set_src_mac(struct rte_flow_action *actions,
 
 	/* Mac address to be set is random each time */
 	for (i = 0; i < RTE_ETHER_ADDR_LEN; i++) {
-		set_mac.mac_addr[i] = mac & 0xff;
+		set_macs[para.core_idx].mac_addr[i] = mac & 0xff;
 		mac = mac >> 8;
 	}
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_MAC_SRC;
-	actions[actions_counter].conf = &set_mac;
+	actions[actions_counter].conf = &set_macs[para.core_idx];
 }
 
 static void
@@ -235,7 +233,7 @@ add_set_dst_mac(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_mac set_mac;
+	static struct rte_flow_action_set_mac set_macs[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t mac = para.counter;
 	uint16_t i;
 
@@ -245,12 +243,12 @@ add_set_dst_mac(struct rte_flow_action *actions,
 
 	/* Mac address to be set is random each time */
 	for (i = 0; i < RTE_ETHER_ADDR_LEN; i++) {
-		set_mac.mac_addr[i] = mac & 0xff;
+		set_macs[para.core_idx].mac_addr[i] = mac & 0xff;
 		mac = mac >> 8;
 	}
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_MAC_DST;
-	actions[actions_counter].conf = &set_mac;
+	actions[actions_counter].conf = &set_macs[para.core_idx];
 }
 
 static void
@@ -258,7 +256,7 @@ add_set_src_ipv4(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_ipv4 set_ipv4;
+	static struct rte_flow_action_set_ipv4 set_ipv4[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t ip = para.counter;
 
 	/* Fixed value */
@@ -266,10 +264,10 @@ add_set_src_ipv4(struct rte_flow_action *actions,
 		ip = 1;
 
 	/* IPv4 value to be set is random each time */
-	set_ipv4.ipv4_addr = RTE_BE32(ip + 1);
+	set_ipv4[para.core_idx].ipv4_addr = RTE_BE32(ip + 1);
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC;
-	actions[actions_counter].conf = &set_ipv4;
+	actions[actions_counter].conf = &set_ipv4[para.core_idx];
 }
 
 static void
@@ -277,7 +275,7 @@ add_set_dst_ipv4(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_ipv4 set_ipv4;
+	static struct rte_flow_action_set_ipv4 set_ipv4[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t ip = para.counter;
 
 	/* Fixed value */
@@ -285,10 +283,10 @@ add_set_dst_ipv4(struct rte_flow_action *actions,
 		ip = 1;
 
 	/* IPv4 value to be set is random each time */
-	set_ipv4.ipv4_addr = RTE_BE32(ip + 1);
+	set_ipv4[para.core_idx].ipv4_addr = RTE_BE32(ip + 1);
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV4_DST;
-	actions[actions_counter].conf = &set_ipv4;
+	actions[actions_counter].conf = &set_ipv4[para.core_idx];
 }
 
 static void
@@ -296,7 +294,7 @@ add_set_src_ipv6(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_ipv6 set_ipv6;
+	static struct rte_flow_action_set_ipv6 set_ipv6[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t ipv6 = para.counter;
 	uint8_t i;
 
@@ -306,12 +304,12 @@ add_set_src_ipv6(struct rte_flow_action *actions,
 
 	/* IPv6 value to set is random each time */
 	for (i = 0; i < 16; i++) {
-		set_ipv6.ipv6_addr[i] = ipv6 & 0xff;
+		set_ipv6[para.core_idx].ipv6_addr[i] = ipv6 & 0xff;
 		ipv6 = ipv6 >> 8;
 	}
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC;
-	actions[actions_counter].conf = &set_ipv6;
+	actions[actions_counter].conf = &set_ipv6[para.core_idx];
 }
 
 static void
@@ -319,7 +317,7 @@ add_set_dst_ipv6(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_ipv6 set_ipv6;
+	static struct rte_flow_action_set_ipv6 set_ipv6[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t ipv6 = para.counter;
 	uint8_t i;
 
@@ -329,12 +327,12 @@ add_set_dst_ipv6(struct rte_flow_action *actions,
 
 	/* IPv6 value to set is random each time */
 	for (i = 0; i < 16; i++) {
-		set_ipv6.ipv6_addr[i] = ipv6 & 0xff;
+		set_ipv6[para.core_idx].ipv6_addr[i] = ipv6 & 0xff;
 		ipv6 = ipv6 >> 8;
 	}
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV6_DST;
-	actions[actions_counter].conf = &set_ipv6;
+	actions[actions_counter].conf = &set_ipv6[para.core_idx];
 }
 
 static void
@@ -342,7 +340,7 @@ add_set_src_tp(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_tp set_tp;
+	static struct rte_flow_action_set_tp set_tp[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t tp = para.counter;
 
 	/* Fixed value */
@@ -352,10 +350,10 @@ add_set_src_tp(struct rte_flow_action *actions,
 	/* TP src port is random each time */
 	tp = tp % 0xffff;
 
-	set_tp.port = RTE_BE16(tp & 0xffff);
+	set_tp[para.core_idx].port = RTE_BE16(tp & 0xffff);
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_TP_SRC;
-	actions[actions_counter].conf = &set_tp;
+	actions[actions_counter].conf = &set_tp[para.core_idx];
 }
 
 static void
@@ -363,7 +361,7 @@ add_set_dst_tp(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_tp set_tp;
+	static struct rte_flow_action_set_tp set_tp[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t tp = para.counter;
 
 	/* Fixed value */
@@ -374,10 +372,10 @@ add_set_dst_tp(struct rte_flow_action *actions,
 	if (tp > 0xffff)
 		tp = tp >> 16;
 
-	set_tp.port = RTE_BE16(tp & 0xffff);
+	set_tp[para.core_idx].port = RTE_BE16(tp & 0xffff);
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_TP_DST;
-	actions[actions_counter].conf = &set_tp;
+	actions[actions_counter].conf = &set_tp[para.core_idx];
 }
 
 static void
@@ -385,17 +383,17 @@ add_inc_tcp_ack(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static rte_be32_t value;
+	static rte_be32_t value[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t ack_value = para.counter;
 
 	/* Fixed value */
 	if (FIXED_VALUES)
 		ack_value = 1;
 
-	value = RTE_BE32(ack_value);
+	value[para.core_idx] = RTE_BE32(ack_value);
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_INC_TCP_ACK;
-	actions[actions_counter].conf = &value;
+	actions[actions_counter].conf = &value[para.core_idx];
 }
 
 static void
@@ -403,17 +401,17 @@ add_dec_tcp_ack(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static rte_be32_t value;
+	static rte_be32_t value[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t ack_value = para.counter;
 
 	/* Fixed value */
 	if (FIXED_VALUES)
 		ack_value = 1;
 
-	value = RTE_BE32(ack_value);
+	value[para.core_idx] = RTE_BE32(ack_value);
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_DEC_TCP_ACK;
-	actions[actions_counter].conf = &value;
+	actions[actions_counter].conf = &value[para.core_idx];
 }
 
 static void
@@ -421,17 +419,17 @@ add_inc_tcp_seq(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static rte_be32_t value;
+	static rte_be32_t value[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t seq_value = para.counter;
 
 	/* Fixed value */
 	if (FIXED_VALUES)
 		seq_value = 1;
 
-	value = RTE_BE32(seq_value);
+	value[para.core_idx] = RTE_BE32(seq_value);
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_INC_TCP_SEQ;
-	actions[actions_counter].conf = &value;
+	actions[actions_counter].conf = &value[para.core_idx];
 }
 
 static void
@@ -439,17 +437,17 @@ add_dec_tcp_seq(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static rte_be32_t value;
+	static rte_be32_t value[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t seq_value = para.counter;
 
 	/* Fixed value */
 	if (FIXED_VALUES)
 		seq_value = 1;
 
-	value	= RTE_BE32(seq_value);
+	value[para.core_idx] = RTE_BE32(seq_value);
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_DEC_TCP_SEQ;
-	actions[actions_counter].conf = &value;
+	actions[actions_counter].conf = &value[para.core_idx];
 }
 
 static void
@@ -457,7 +455,7 @@ add_set_ttl(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_ttl set_ttl;
+	static struct rte_flow_action_set_ttl set_ttl[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t ttl_value = para.counter;
 
 	/* Fixed value */
@@ -467,10 +465,10 @@ add_set_ttl(struct rte_flow_action *actions,
 	/* Set ttl to random value each time */
 	ttl_value = ttl_value % 0xff;
 
-	set_ttl.ttl_value = ttl_value;
+	set_ttl[para.core_idx].ttl_value = ttl_value;
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_TTL;
-	actions[actions_counter].conf = &set_ttl;
+	actions[actions_counter].conf = &set_ttl[para.core_idx];
 }
 
 static void
@@ -486,7 +484,7 @@ add_set_ipv4_dscp(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_dscp set_dscp;
+	static struct rte_flow_action_set_dscp set_dscp[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t dscp_value = para.counter;
 
 	/* Fixed value */
@@ -496,10 +494,10 @@ add_set_ipv4_dscp(struct rte_flow_action *actions,
 	/* Set dscp to random value each time */
 	dscp_value = dscp_value % 0xff;
 
-	set_dscp.dscp = dscp_value;
+	set_dscp[para.core_idx].dscp = dscp_value;
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP;
-	actions[actions_counter].conf = &set_dscp;
+	actions[actions_counter].conf = &set_dscp[para.core_idx];
 }
 
 static void
@@ -507,7 +505,7 @@ add_set_ipv6_dscp(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_dscp set_dscp;
+	static struct rte_flow_action_set_dscp set_dscp[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint32_t dscp_value = para.counter;
 
 	/* Fixed value */
@@ -517,10 +515,10 @@ add_set_ipv6_dscp(struct rte_flow_action *actions,
 	/* Set dscp to random value each time */
 	dscp_value = dscp_value % 0xff;
 
-	set_dscp.dscp = dscp_value;
+	set_dscp[para.core_idx].dscp = dscp_value;
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP;
-	actions[actions_counter].conf = &set_dscp;
+	actions[actions_counter].conf = &set_dscp[para.core_idx];
 }
 
 static void
@@ -774,36 +772,36 @@ add_raw_encap(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	struct additional_para para)
 {
-	static struct action_raw_encap_data *action_encap_data;
+	static struct action_raw_encap_data *action_encap_data[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint64_t encap_data = para.encap_data;
 	uint8_t *header;
 	uint8_t i;
 
 	/* Avoid double allocation. */
-	if (action_encap_data == NULL)
-		action_encap_data = rte_malloc("encap_data",
+	if (action_encap_data[para.core_idx] == NULL)
+		action_encap_data[para.core_idx] = rte_malloc("encap_data",
 			sizeof(struct action_raw_encap_data), 0);
 
 	/* Check if allocation failed. */
-	if (action_encap_data == NULL)
+	if (action_encap_data[para.core_idx] == NULL)
 		rte_exit(EXIT_FAILURE, "No Memory available!");
 
-	*action_encap_data = (struct action_raw_encap_data) {
+	*action_encap_data[para.core_idx] = (struct action_raw_encap_data) {
 		.conf = (struct rte_flow_action_raw_encap) {
-			.data = action_encap_data->data,
+			.data = action_encap_data[para.core_idx]->data,
 		},
 			.data = {},
 	};
-	header = action_encap_data->data;
+	header = action_encap_data[para.core_idx]->data;
 
 	for (i = 0; i < RTE_DIM(headers); i++)
 		headers[i].funct(&header, encap_data, para);
 
-	action_encap_data->conf.size = header -
-		action_encap_data->data;
+	action_encap_data[para.core_idx]->conf.size = header -
+		action_encap_data[para.core_idx]->data;
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_RAW_ENCAP;
-	actions[actions_counter].conf = &action_encap_data->conf;
+	actions[actions_counter].conf = &action_encap_data[para.core_idx]->conf;
 }
 
 static void
@@ -811,36 +809,36 @@ add_raw_decap(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	struct additional_para para)
 {
-	static struct action_raw_decap_data *action_decap_data;
+	static struct action_raw_decap_data *action_decap_data[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint64_t decap_data = para.decap_data;
 	uint8_t *header;
 	uint8_t i;
 
 	/* Avoid double allocation. */
-	if (action_decap_data == NULL)
-		action_decap_data = rte_malloc("decap_data",
+	if (action_decap_data[para.core_idx] == NULL)
+		action_decap_data[para.core_idx] = rte_malloc("decap_data",
 			sizeof(struct action_raw_decap_data), 0);
 
 	/* Check if allocation failed. */
-	if (action_decap_data == NULL)
+	if (action_decap_data[para.core_idx] == NULL)
 		rte_exit(EXIT_FAILURE, "No Memory available!");
 
-	*action_decap_data = (struct action_raw_decap_data) {
+	*action_decap_data[para.core_idx] = (struct action_raw_decap_data) {
 		.conf = (struct rte_flow_action_raw_decap) {
-			.data = action_decap_data->data,
+			.data = action_decap_data[para.core_idx]->data,
 		},
 			.data = {},
 	};
-	header = action_decap_data->data;
+	header = action_decap_data[para.core_idx]->data;
 
 	for (i = 0; i < RTE_DIM(headers); i++)
 		headers[i].funct(&header, decap_data, para);
 
-	action_decap_data->conf.size = header -
-		action_decap_data->data;
+	action_decap_data[para.core_idx]->conf.size = header -
+		action_decap_data[para.core_idx]->data;
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_RAW_DECAP;
-	actions[actions_counter].conf = &action_decap_data->conf;
+	actions[actions_counter].conf = &action_decap_data[para.core_idx]->conf;
 }
 
 static void
@@ -848,7 +846,7 @@ add_vxlan_encap(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_vxlan_encap vxlan_encap;
+	static struct rte_flow_action_vxlan_encap vxlan_encap[RTE_MAX_LCORE] __rte_cache_aligned;
 	static struct rte_flow_item items[5];
 	static struct rte_flow_item_eth item_eth;
 	static struct rte_flow_item_ipv4 item_ipv4;
@@ -885,10 +883,10 @@ add_vxlan_encap(struct rte_flow_action *actions,
 
 	items[4].type = RTE_FLOW_ITEM_TYPE_END;
 
-	vxlan_encap.definition = items;
+	vxlan_encap[para.core_idx].definition = items;
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP;
-	actions[actions_counter].conf = &vxlan_encap;
+	actions[actions_counter].conf = &vxlan_encap[para.core_idx];
 }
 
 static void
@@ -902,7 +900,7 @@ add_vxlan_decap(struct rte_flow_action *actions,
 void
 fill_actions(struct rte_flow_action *actions, uint64_t *flow_actions,
 	uint32_t counter, uint16_t next_table, uint16_t hairpinq,
-	uint64_t encap_data, uint64_t decap_data)
+	uint64_t encap_data, uint64_t decap_data, uint8_t core_idx)
 {
 	struct additional_para additional_para_data;
 	uint8_t actions_counter = 0;
@@ -924,6 +922,7 @@ fill_actions(struct rte_flow_action *actions, uint64_t *flow_actions,
 		.counter = counter,
 		.encap_data = encap_data,
 		.decap_data = decap_data,
+		.core_idx = core_idx,
 	};
 
 	if (hairpinq != 0) {
diff --git a/app/test-flow-perf/actions_gen.h b/app/test-flow-perf/actions_gen.h
index 85e3176b09..77353cfe09 100644
--- a/app/test-flow-perf/actions_gen.h
+++ b/app/test-flow-perf/actions_gen.h
@@ -19,6 +19,6 @@
 
 void fill_actions(struct rte_flow_action *actions, uint64_t *flow_actions,
 	uint32_t counter, uint16_t next_table, uint16_t hairpinq,
-	uint64_t encap_data, uint64_t decap_data);
+	uint64_t encap_data, uint64_t decap_data, uint8_t core_idx);
 
 #endif /* FLOW_PERF_ACTION_GEN */
diff --git a/app/test-flow-perf/config.h b/app/test-flow-perf/config.h
index 8f42bc589c..94e83c9abc 100644
--- a/app/test-flow-perf/config.h
+++ b/app/test-flow-perf/config.h
@@ -15,6 +15,7 @@
 #define MBUF_CACHE_SIZE 512
 #define NR_RXD  256
 #define NR_TXD  256
+#define MAX_PORTS 64
 
 /* This is used for encap/decap & header modify actions.
  * When it's 1: it means all actions have fixed values.
diff --git a/app/test-flow-perf/flow_gen.c b/app/test-flow-perf/flow_gen.c
index a979b3856d..df4af16de8 100644
--- a/app/test-flow-perf/flow_gen.c
+++ b/app/test-flow-perf/flow_gen.c
@@ -45,6 +45,7 @@ generate_flow(uint16_t port_id,
 	uint16_t hairpinq,
 	uint64_t encap_data,
 	uint64_t decap_data,
+	uint8_t core_idx,
 	struct rte_flow_error *error)
 {
 	struct rte_flow_attr attr;
@@ -60,9 +61,9 @@ generate_flow(uint16_t port_id,
 
 	fill_actions(actions, flow_actions,
 		outer_ip_src, next_table, hairpinq,
-		encap_data, decap_data);
+		encap_data, decap_data, core_idx);
 
-	fill_items(items, flow_items, outer_ip_src);
+	fill_items(items, flow_items, outer_ip_src, core_idx);
 
 	flow = rte_flow_create(port_id, &attr, items, actions, error);
 	return flow;
diff --git a/app/test-flow-perf/flow_gen.h b/app/test-flow-perf/flow_gen.h
index 3d13737d65..f1d0999af1 100644
--- a/app/test-flow-perf/flow_gen.h
+++ b/app/test-flow-perf/flow_gen.h
@@ -34,6 +34,7 @@ generate_flow(uint16_t port_id,
 	uint16_t hairpinq,
 	uint64_t encap_data,
 	uint64_t decap_data,
+	uint8_t core_idx,
 	struct rte_flow_error *error);
 
 #endif /* FLOW_PERF_FLOW_GEN */
diff --git a/app/test-flow-perf/items_gen.c b/app/test-flow-perf/items_gen.c
index 2b1ab41467..0950023608 100644
--- a/app/test-flow-perf/items_gen.c
+++ b/app/test-flow-perf/items_gen.c
@@ -15,6 +15,7 @@
 /* Storage for additional parameters for items */
 struct additional_para {
 	rte_be32_t src_ip;
+	uint8_t core_idx;
 };
 
 static void
@@ -58,18 +59,19 @@ static void
 add_ipv4(struct rte_flow_item *items,
 	uint8_t items_counter, struct additional_para para)
 {
-	static struct rte_flow_item_ipv4 ipv4_spec;
-	static struct rte_flow_item_ipv4 ipv4_mask;
+	static struct rte_flow_item_ipv4 ipv4_specs[RTE_MAX_LCORE] __rte_cache_aligned;
+	static struct rte_flow_item_ipv4 ipv4_masks[RTE_MAX_LCORE] __rte_cache_aligned;
+	uint8_t ti = para.core_idx;
 
-	memset(&ipv4_spec, 0, sizeof(struct rte_flow_item_ipv4));
-	memset(&ipv4_mask, 0, sizeof(struct rte_flow_item_ipv4));
+	memset(&ipv4_specs[ti], 0, sizeof(struct rte_flow_item_ipv4));
+	memset(&ipv4_masks[ti], 0, sizeof(struct rte_flow_item_ipv4));
 
-	ipv4_spec.hdr.src_addr = RTE_BE32(para.src_ip);
-	ipv4_mask.hdr.src_addr = RTE_BE32(0xffffffff);
+	ipv4_specs[ti].hdr.src_addr = RTE_BE32(para.src_ip);
+	ipv4_masks[ti].hdr.src_addr = RTE_BE32(0xffffffff);
 
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_IPV4;
-	items[items_counter].spec = &ipv4_spec;
-	items[items_counter].mask = &ipv4_mask;
+	items[items_counter].spec = &ipv4_specs[ti];
+	items[items_counter].mask = &ipv4_masks[ti];
 }
 
 
@@ -77,23 +79,24 @@ static void
 add_ipv6(struct rte_flow_item *items,
 	uint8_t items_counter, struct additional_para para)
 {
-	static struct rte_flow_item_ipv6 ipv6_spec;
-	static struct rte_flow_item_ipv6 ipv6_mask;
+	static struct rte_flow_item_ipv6 ipv6_specs[RTE_MAX_LCORE] __rte_cache_aligned;
+	static struct rte_flow_item_ipv6 ipv6_masks[RTE_MAX_LCORE] __rte_cache_aligned;
+	uint8_t ti = para.core_idx;
 
-	memset(&ipv6_spec, 0, sizeof(struct rte_flow_item_ipv6));
-	memset(&ipv6_mask, 0, sizeof(struct rte_flow_item_ipv6));
+	memset(&ipv6_specs[ti], 0, sizeof(struct rte_flow_item_ipv6));
+	memset(&ipv6_masks[ti], 0, sizeof(struct rte_flow_item_ipv6));
 
 	/** Set ipv6 src **/
-	memset(&ipv6_spec.hdr.src_addr, para.src_ip,
-		sizeof(ipv6_spec.hdr.src_addr) / 2);
+	memset(&ipv6_specs[ti].hdr.src_addr, para.src_ip,
+		sizeof(ipv6_specs->hdr.src_addr) / 2);
 
 	/** Full mask **/
-	memset(&ipv6_mask.hdr.src_addr, 0xff,
-		sizeof(ipv6_spec.hdr.src_addr));
+	memset(&ipv6_masks[ti].hdr.src_addr, 0xff,
+		sizeof(ipv6_specs->hdr.src_addr));
 
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_IPV6;
-	items[items_counter].spec = &ipv6_spec;
-	items[items_counter].mask = &ipv6_mask;
+	items[items_counter].spec = &ipv6_specs[ti];
+	items[items_counter].mask = &ipv6_masks[ti];
 }
 
 static void
@@ -131,31 +134,31 @@ add_udp(struct rte_flow_item *items,
 static void
 add_vxlan(struct rte_flow_item *items,
 	uint8_t items_counter,
-	__rte_unused struct additional_para para)
+	struct additional_para para)
 {
-	static struct rte_flow_item_vxlan vxlan_spec;
-	static struct rte_flow_item_vxlan vxlan_mask;
-
+	static struct rte_flow_item_vxlan vxlan_specs[RTE_MAX_LCORE] __rte_cache_aligned;
+	static struct rte_flow_item_vxlan vxlan_masks[RTE_MAX_LCORE] __rte_cache_aligned;
+	uint8_t ti = para.core_idx;
 	uint32_t vni_value;
 	uint8_t i;
 
 	vni_value = VNI_VALUE;
 
-	memset(&vxlan_spec, 0, sizeof(struct rte_flow_item_vxlan));
-	memset(&vxlan_mask, 0, sizeof(struct rte_flow_item_vxlan));
+	memset(&vxlan_specs[ti], 0, sizeof(struct rte_flow_item_vxlan));
+	memset(&vxlan_masks[ti], 0, sizeof(struct rte_flow_item_vxlan));
 
 	/* Set standard vxlan vni */
 	for (i = 0; i < 3; i++) {
-		vxlan_spec.vni[2 - i] = vni_value >> (i * 8);
-		vxlan_mask.vni[2 - i] = 0xff;
+		vxlan_specs[ti].vni[2 - i] = vni_value >> (i * 8);
+		vxlan_masks[ti].vni[2 - i] = 0xff;
 	}
 
 	/* Standard vxlan flags */
-	vxlan_spec.flags = 0x8;
+	vxlan_specs[ti].flags = 0x8;
 
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_VXLAN;
-	items[items_counter].spec = &vxlan_spec;
-	items[items_counter].mask = &vxlan_mask;
+	items[items_counter].spec = &vxlan_specs[ti];
+	items[items_counter].mask = &vxlan_masks[ti];
 }
 
 static void
@@ -163,29 +166,29 @@ add_vxlan_gpe(struct rte_flow_item *items,
 	uint8_t items_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_item_vxlan_gpe vxlan_gpe_spec;
-	static struct rte_flow_item_vxlan_gpe vxlan_gpe_mask;
-
+	static struct rte_flow_item_vxlan_gpe vxlan_gpe_specs[RTE_MAX_LCORE] __rte_cache_aligned;
+	static struct rte_flow_item_vxlan_gpe vxlan_gpe_masks[RTE_MAX_LCORE] __rte_cache_aligned;
+	uint8_t ti = para.core_idx;
 	uint32_t vni_value;
 	uint8_t i;
 
 	vni_value = VNI_VALUE;
 
-	memset(&vxlan_gpe_spec, 0, sizeof(struct rte_flow_item_vxlan_gpe));
-	memset(&vxlan_gpe_mask, 0, sizeof(struct rte_flow_item_vxlan_gpe));
+	memset(&vxlan_gpe_specs[ti], 0, sizeof(struct rte_flow_item_vxlan_gpe));
+	memset(&vxlan_gpe_masks[ti], 0, sizeof(struct rte_flow_item_vxlan_gpe));
 
 	/* Set vxlan-gpe vni */
 	for (i = 0; i < 3; i++) {
-		vxlan_gpe_spec.vni[2 - i] = vni_value >> (i * 8);
-		vxlan_gpe_mask.vni[2 - i] = 0xff;
+		vxlan_gpe_specs[ti].vni[2 - i] = vni_value >> (i * 8);
+		vxlan_gpe_masks[ti].vni[2 - i] = 0xff;
 	}
 
 	/* vxlan-gpe flags */
-	vxlan_gpe_spec.flags = 0x0c;
+	vxlan_gpe_specs[ti].flags = 0x0c;
 
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_VXLAN_GPE;
-	items[items_counter].spec = &vxlan_gpe_spec;
-	items[items_counter].mask = &vxlan_gpe_mask;
+	items[items_counter].spec = &vxlan_gpe_specs[ti];
+	items[items_counter].mask = &vxlan_gpe_masks[ti];
 }
 
 static void
@@ -216,25 +219,25 @@ add_geneve(struct rte_flow_item *items,
 	uint8_t items_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_item_geneve geneve_spec;
-	static struct rte_flow_item_geneve geneve_mask;
-
+	static struct rte_flow_item_geneve geneve_specs[RTE_MAX_LCORE] __rte_cache_aligned;
+	static struct rte_flow_item_geneve geneve_masks[RTE_MAX_LCORE] __rte_cache_aligned;
+	uint8_t ti = para.core_idx;
 	uint32_t vni_value;
 	uint8_t i;
 
 	vni_value = VNI_VALUE;
 
-	memset(&geneve_spec, 0, sizeof(struct rte_flow_item_geneve));
-	memset(&geneve_mask, 0, sizeof(struct rte_flow_item_geneve));
+	memset(&geneve_specs[ti], 0, sizeof(struct rte_flow_item_geneve));
+	memset(&geneve_masks[ti], 0, sizeof(struct rte_flow_item_geneve));
 
 	for (i = 0; i < 3; i++) {
-		geneve_spec.vni[2 - i] = vni_value >> (i * 8);
-		geneve_mask.vni[2 - i] = 0xff;
+		geneve_specs[ti].vni[2 - i] = vni_value >> (i * 8);
+		geneve_masks[ti].vni[2 - i] = 0xff;
 	}
 
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_GENEVE;
-	items[items_counter].spec = &geneve_spec;
-	items[items_counter].mask = &geneve_mask;
+	items[items_counter].spec = &geneve_specs[ti];
+	items[items_counter].mask = &geneve_masks[ti];
 }
 
 static void
@@ -344,12 +347,14 @@ add_icmpv6(struct rte_flow_item *items,
 
 void
 fill_items(struct rte_flow_item *items,
-	uint64_t *flow_items, uint32_t outer_ip_src)
+	uint64_t *flow_items, uint32_t outer_ip_src,
+	uint8_t core_idx)
 {
 	uint8_t items_counter = 0;
 	uint8_t i, j;
 	struct additional_para additional_para_data = {
 		.src_ip = outer_ip_src,
+		.core_idx = core_idx,
 	};
 
 	/* Support outer items up to tunnel layer only. */
diff --git a/app/test-flow-perf/items_gen.h b/app/test-flow-perf/items_gen.h
index d68958e4d3..f4b0e9a981 100644
--- a/app/test-flow-perf/items_gen.h
+++ b/app/test-flow-perf/items_gen.h
@@ -13,6 +13,6 @@
 #include "config.h"
 
 void fill_items(struct rte_flow_item *items, uint64_t *flow_items,
-	uint32_t outer_ip_src);
+	uint32_t outer_ip_src, uint8_t core_idx);
 
 #endif /* FLOW_PERF_ITEMS_GEN */
diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
index 5ec9a15c61..663b2e9bae 100644
--- a/app/test-flow-perf/main.c
+++ b/app/test-flow-perf/main.c
@@ -72,7 +72,6 @@ static uint32_t nb_lcores;
 #define LCORE_MODE_PKT    1
 #define LCORE_MODE_STATS  2
 #define MAX_STREAMS      64
-#define MAX_LCORES       64
 
 struct stream {
 	int tx_port;
@@ -92,7 +91,20 @@ struct lcore_info {
 	struct rte_mbuf *pkts[MAX_PKT_BURST];
 } __rte_cache_aligned;
 
-static struct lcore_info lcore_infos[MAX_LCORES];
+static struct lcore_info lcore_infos[RTE_MAX_LCORE];
+
+struct multi_cores_pool {
+	uint32_t cores_count;
+	uint32_t rules_count;
+	double cpu_time_used_insertion[MAX_PORTS][RTE_MAX_LCORE];
+	double cpu_time_used_deletion[MAX_PORTS][RTE_MAX_LCORE];
+	int64_t last_alloc[RTE_MAX_LCORE];
+	int64_t current_alloc[RTE_MAX_LCORE];
+} __rte_cache_aligned;
+
+static struct multi_cores_pool mc_pool = {
+	.cores_count = 1,
+};
 
 static void
 usage(char *progname)
@@ -118,6 +130,8 @@ usage(char *progname)
 	printf("  --transfer: set transfer attribute in flows\n");
 	printf("  --group=N: set group for all flows,"
 		" default is %d\n", DEFAULT_GROUP);
+	printf("  --cores=N: to set the number of needed "
+		"cores to insert rte_flow rules, default is 1\n");
 
 	printf("To set flow items:\n");
 	printf("  --ether: add ether layer in flow items\n");
@@ -537,6 +551,7 @@ args_parse(int argc, char **argv)
 		{ "dump-socket-mem",            0, 0, 0 },
 		{ "enable-fwd",                 0, 0, 0 },
 		{ "portmask",                   1, 0, 0 },
+		{ "cores",                      1, 0, 0 },
 		/* Attributes */
 		{ "ingress",                    0, 0, 0 },
 		{ "egress",                     0, 0, 0 },
@@ -750,6 +765,21 @@ args_parse(int argc, char **argv)
 					rte_exit(EXIT_FAILURE, "Invalid fwd port mask\n");
 				ports_mask = pm;
 			}
+			if (strcmp(lgopts[opt_idx].name, "cores") == 0) {
+				n = atoi(optarg);
+				if ((int) rte_lcore_count() <= n) {
+					printf("\nError: you need %d cores to run on multi-cores\n"
+						"Existing cores are: %d\n", n, rte_lcore_count());
+					rte_exit(EXIT_FAILURE, " ");
+				}
+				if (n <= RTE_MAX_LCORE && n > 0)
+					mc_pool.cores_count = n;
+				else {
+					printf("Error: cores count must be > 0 "
+						" and < %d\n", RTE_MAX_LCORE);
+					rte_exit(EXIT_FAILURE, " ");
+				}
+			}
 			break;
 		default:
 			fprintf(stderr, "Invalid option: %s\n", argv[optind]);
@@ -845,7 +875,7 @@ print_rules_batches(double *cpu_time_per_batch)
 }
 
 static inline void
-destroy_flows(int port_id, struct rte_flow **flows_list)
+destroy_flows(int port_id, uint8_t core_id, struct rte_flow **flows_list)
 {
 	struct rte_flow_error error;
 	clock_t start_batch, end_batch;
@@ -855,12 +885,12 @@ destroy_flows(int port_id, struct rte_flow **flows_list)
 	double delta;
 	uint32_t i;
 	int rules_batch_idx;
+	int rules_count_per_core;
 
-	/* Deletion Rate */
-	printf("\nRules Deletion on port = %d\n", port_id);
+	rules_count_per_core = rules_count / mc_pool.cores_count;
 
 	start_batch = clock();
-	for (i = 0; i < rules_count; i++) {
+	for (i = 0; i < (uint32_t) rules_count_per_core; i++) {
 		if (flows_list[i] == 0)
 			break;
 
@@ -891,15 +921,17 @@ destroy_flows(int port_id, struct rte_flow **flows_list)
 		print_rules_batches(cpu_time_per_batch);
 
 	/* Deletion rate for all rules */
-	deletion_rate = ((double) (rules_count / cpu_time_used) / 1000);
-	printf(":: Total rules deletion rate -> %f K Rule/Sec\n",
-		deletion_rate);
-	printf(":: The time for deleting %d in rules %f seconds\n",
-		rules_count, cpu_time_used);
+	deletion_rate = ((double) (rules_count_per_core / cpu_time_used) / 1000);
+	printf(":: Port %d :: Core %d :: Rules deletion rate -> %f K Rule/Sec\n",
+		port_id, core_id, deletion_rate);
+	printf(":: Port %d :: Core %d :: The time for deleting %d rules is %f seconds\n",
+		port_id, core_id, rules_count_per_core, cpu_time_used);
+
+	mc_pool.cpu_time_used_deletion[port_id][core_id] = cpu_time_used;
 }
 
 static struct rte_flow **
-insert_flows(int port_id)
+insert_flows(int port_id, uint8_t core_id)
 {
 	struct rte_flow **flows_list;
 	struct rte_flow_error error;
@@ -909,32 +941,42 @@ insert_flows(int port_id)
 	double cpu_time_per_batch[MAX_BATCHES_COUNT] = { 0 };
 	double delta;
 	uint32_t flow_index;
-	uint32_t counter;
+	uint32_t counter, start_counter = 0, end_counter;
 	uint64_t global_items[MAX_ITEMS_NUM] = { 0 };
 	uint64_t global_actions[MAX_ACTIONS_NUM] = { 0 };
 	int rules_batch_idx;
+	int rules_count_per_core;
+
+	rules_count_per_core = rules_count / mc_pool.cores_count;
+
+	/* Set boundaries of rules for each core. */
+	if (core_id)
+		start_counter = core_id * rules_count_per_core;
+	end_counter = (core_id + 1) * rules_count_per_core;
 
 	global_items[0] = FLOW_ITEM_MASK(RTE_FLOW_ITEM_TYPE_ETH);
 	global_actions[0] = FLOW_ITEM_MASK(RTE_FLOW_ACTION_TYPE_JUMP);
 
 	flows_list = rte_zmalloc("flows_list",
-		(sizeof(struct rte_flow *) * rules_count) + 1, 0);
+		(sizeof(struct rte_flow *) * rules_count_per_core) + 1, 0);
 	if (flows_list == NULL)
 		rte_exit(EXIT_FAILURE, "No Memory available!");
 
 	cpu_time_used = 0;
 	flow_index = 0;
-	if (flow_group > 0) {
+	if (flow_group > 0 && core_id == 0) {
 		/*
 		 * Create global rule to jump into flow_group,
 		 * this way the app will avoid the default rules.
 		 *
+		 * This rule will be created only once.
+		 *
 		 * Global rule:
 		 * group 0 eth / end actions jump group <flow_group>
 		 */
 		flow = generate_flow(port_id, 0, flow_attrs,
 			global_items, global_actions,
-			flow_group, 0, 0, 0, 0, &error);
+			flow_group, 0, 0, 0, 0, core_id, &error);
 
 		if (flow == NULL) {
 			print_flow_error(error);
@@ -943,19 +985,17 @@ insert_flows(int port_id)
 		flows_list[flow_index++] = flow;
 	}
 
-	/* Insertion Rate */
-	printf("Rules insertion on port = %d\n", port_id);
 	start_batch = clock();
-	for (counter = 0; counter < rules_count; counter++) {
+	for (counter = start_counter; counter < end_counter; counter++) {
 		flow = generate_flow(port_id, flow_group,
 			flow_attrs, flow_items, flow_actions,
 			JUMP_ACTION_TABLE, counter,
 			hairpin_queues_num,
 			encap_data, decap_data,
-			&error);
+			core_id, &error);
 
 		if (force_quit)
-			counter = rules_count;
+			counter = end_counter;
 
 		if (!flow) {
 			print_flow_error(error);
@@ -984,23 +1024,25 @@ insert_flows(int port_id)
 	if (dump_iterations)
 		print_rules_batches(cpu_time_per_batch);
 
-	/* Insertion rate for all rules */
-	insertion_rate = ((double) (rules_count / cpu_time_used) / 1000);
-	printf(":: Total flow insertion rate -> %f K Rule/Sec\n",
-			insertion_rate);
-	printf(":: The time for creating %d in flows %f seconds\n",
-			rules_count, cpu_time_used);
+	printf(":: Port %d :: Core %d boundaries :: start @[%d] - end @[%d]\n",
+		port_id, core_id, start_counter, end_counter - 1);
+
+	/* Insertion rate for all rules in one core */
+	insertion_rate = ((double) (rules_count_per_core / cpu_time_used) / 1000);
+	printf(":: Port %d :: Core %d :: Rules insertion rate -> %f K Rule/Sec\n",
+		port_id, core_id, insertion_rate);
+	printf(":: Port %d :: Core %d :: The time for creating %d in rules %f seconds\n",
+		port_id, core_id, rules_count_per_core, cpu_time_used);
 
+	mc_pool.cpu_time_used_insertion[port_id][core_id] = cpu_time_used;
 	return flows_list;
 }
 
-static inline void
-flows_handler(void)
+static void
+flows_handler(uint8_t core_id)
 {
 	struct rte_flow **flows_list;
 	uint16_t nr_ports;
-	int64_t alloc, last_alloc;
-	int flow_size_in_bytes;
 	int port_id;
 
 	nr_ports = rte_eth_dev_count_avail();
@@ -1016,21 +1058,148 @@ flows_handler(void)
 			continue;
 
 		/* Insertion part. */
-		last_alloc = (int64_t)dump_socket_mem(stdout);
-		flows_list = insert_flows(port_id);
-		alloc = (int64_t)dump_socket_mem(stdout);
+		mc_pool.last_alloc[core_id] = (int64_t)dump_socket_mem(stdout);
+		flows_list = insert_flows(port_id, core_id);
+		if (flows_list == NULL)
+			rte_exit(EXIT_FAILURE, "Error: Insertion Failed!\n");
+		mc_pool.current_alloc[core_id] = (int64_t)dump_socket_mem(stdout);
 
 		/* Deletion part. */
 		if (delete_flag)
-			destroy_flows(port_id, flows_list);
+			destroy_flows(port_id, core_id, flows_list);
+	}
+}
+
+static int
+run_rte_flow_handler_cores(void *data __rte_unused)
+{
+	uint16_t port;
+	/* Latency: total count of rte rules divided
+	 * over max time used by thread between all
+	 * threads time.
+	 *
+	 * Throughput: total count of rte rules divided
+	 * over the average of the time cosumed by all
+	 * threads time.
+	 */
+	double insertion_latency_time;
+	double insertion_throughput_time;
+	double deletion_latency_time;
+	double deletion_throughput_time;
+	double insertion_latency, insertion_throughput;
+	double deletion_latency, deletion_throughput;
+	int64_t last_alloc, current_alloc;
+	int flow_size_in_bytes;
+	int lcore_counter = 0;
+	int lcore_id = rte_lcore_id();
+	int i;
+
+	RTE_LCORE_FOREACH(i) {
+		/*  If core not needed return. */
+		if (lcore_id == i) {
+			printf(":: lcore %d mapped with index %d\n", lcore_id, lcore_counter);
+			if (lcore_counter >= (int) mc_pool.cores_count)
+				return 0;
+			break;
+		}
+		lcore_counter++;
+	}
+	lcore_id = lcore_counter;
+
+	if (lcore_id >= (int) mc_pool.cores_count)
+		return 0;
+
+	mc_pool.rules_count = rules_count;
 
-		/* Report rte_flow size in huge pages. */
-		if (last_alloc) {
-			flow_size_in_bytes = (alloc - last_alloc) / rules_count;
-			printf("\n:: rte_flow size in DPDK layer: %d Bytes",
-				flow_size_in_bytes);
+	flows_handler(lcore_id);
+
+	/* Only main core to print total results. */
+	if (lcore_id != 0)
+		return 0;
+
+	/* Make sure all cores finished insertion/deletion process. */
+	rte_eal_mp_wait_lcore();
+
+	/* Save first insertion/deletion rates from first thread.
+	 * Start comparing with all threads, if any thread used
+	 * time more than current saved, replace it.
+	 *
+	 * Thus in the end we will have the max time used for
+	 * insertion/deletion by one thread.
+	 *
+	 * As for memory consumption, save the min of all threads
+	 * of last alloc, and save the max for all threads for
+	 * current alloc.
+	 */
+	RTE_ETH_FOREACH_DEV(port) {
+		last_alloc = mc_pool.last_alloc[0];
+		current_alloc = mc_pool.current_alloc[0];
+
+		insertion_latency_time = mc_pool.cpu_time_used_insertion[port][0];
+		deletion_latency_time = mc_pool.cpu_time_used_deletion[port][0];
+		insertion_throughput_time = mc_pool.cpu_time_used_insertion[port][0];
+		deletion_throughput_time = mc_pool.cpu_time_used_deletion[port][0];
+		i = mc_pool.cores_count;
+		while (i-- > 1) {
+			insertion_throughput_time += mc_pool.cpu_time_used_insertion[port][i];
+			deletion_throughput_time += mc_pool.cpu_time_used_deletion[port][i];
+			if (insertion_latency_time < mc_pool.cpu_time_used_insertion[port][i])
+				insertion_latency_time = mc_pool.cpu_time_used_insertion[port][i];
+			if (deletion_latency_time < mc_pool.cpu_time_used_deletion[port][i])
+				deletion_latency_time = mc_pool.cpu_time_used_deletion[port][i];
+			if (last_alloc > mc_pool.last_alloc[i])
+				last_alloc = mc_pool.last_alloc[i];
+			if (current_alloc < mc_pool.current_alloc[i])
+				current_alloc = mc_pool.current_alloc[i];
 		}
+
+		flow_size_in_bytes = (current_alloc - last_alloc) / mc_pool.rules_count;
+
+		insertion_latency = ((double) (mc_pool.rules_count / insertion_latency_time) / 1000);
+		deletion_latency = ((double) (mc_pool.rules_count / deletion_latency_time) / 1000);
+
+		insertion_throughput_time /= mc_pool.cores_count;
+		deletion_throughput_time /= mc_pool.cores_count;
+		insertion_throughput = ((double) (mc_pool.rules_count / insertion_throughput_time) / 1000);
+		deletion_throughput = ((double) (mc_pool.rules_count / deletion_throughput_time) / 1000);
+
+		/* Latency stats */
+		printf("\n:: [Latency | Insertion] All Cores :: Port %d :: ", port);
+		printf("Total flows insertion rate -> %f K Rules/Sec\n",
+			insertion_latency);
+		printf(":: [Latency | Insertion] All Cores :: Port %d :: ", port);
+		printf("The time for creating %d rules is %f seconds\n",
+			mc_pool.rules_count, insertion_latency_time);
+
+		/* Throughput stats */
+		printf(":: [Throughput | Insertion] All Cores :: Port %d :: ", port);
+		printf("Total flows insertion rate -> %f K Rules/Sec\n",
+			insertion_throughput);
+		printf(":: [Throughput | Insertion] All Cores :: Port %d :: ", port);
+		printf("The average time for creating %d rules is %f seconds\n",
+			mc_pool.rules_count, insertion_throughput_time);
+
+		if (delete_flag) {
+			/* Latency stats */
+			printf(":: [Latency | Deletion] All Cores :: Port %d :: Total flows "
+				"deletion rate -> %f K Rules/Sec\n",
+				port, deletion_latency);
+			printf(":: [Latency | Deletion] All Cores :: Port %d :: ", port);
+			printf("The time for deleting %d rules is %f seconds\n",
+			mc_pool.rules_count, deletion_latency_time);
+
+			/* Throughput stats */
+			printf(":: [Throughput | Deletion] All Cores :: Port %d :: Total flows "
+				"deletion rate -> %f K Rules/Sec\n", port, deletion_throughput);
+			printf(":: [Throughput | Deletion] All Cores :: Port %d :: ", port);
+			printf("The average time for deleting %d rules is %f seconds\n",
+			mc_pool.rules_count, deletion_throughput_time);
+		}
+		printf("\n:: Port %d :: rte_flow size in DPDK layer: %d Bytes\n",
+			port, flow_size_in_bytes);
 	}
+
+	return 0;
 }
 
 static void
@@ -1107,12 +1276,12 @@ packet_per_second_stats(void)
 	int i;
 
 	old = rte_zmalloc("old",
-		sizeof(struct lcore_info) * MAX_LCORES, 0);
+		sizeof(struct lcore_info) * RTE_MAX_LCORE, 0);
 	if (old == NULL)
 		rte_exit(EXIT_FAILURE, "No Memory available!");
 
 	memcpy(old, lcore_infos,
-		sizeof(struct lcore_info) * MAX_LCORES);
+		sizeof(struct lcore_info) * RTE_MAX_LCORE);
 
 	while (!force_quit) {
 		uint64_t total_tx_pkts = 0;
@@ -1135,7 +1304,7 @@ packet_per_second_stats(void)
 		printf("%6s %16s %16s %16s\n", "------", "----------------",
 			"----------------", "----------------");
 		nr_lines = 3;
-		for (i = 0; i < MAX_LCORES; i++) {
+		for (i = 0; i < RTE_MAX_LCORE; i++) {
 			li  = &lcore_infos[i];
 			oli = &old[i];
 			if (li->mode != LCORE_MODE_PKT)
@@ -1166,7 +1335,7 @@ packet_per_second_stats(void)
 		}
 
 		memcpy(old, lcore_infos,
-			sizeof(struct lcore_info) * MAX_LCORES);
+			sizeof(struct lcore_info) * RTE_MAX_LCORE);
 	}
 }
 
@@ -1227,7 +1396,7 @@ init_lcore_info(void)
 	 * This means that this stream is not used, or not set
 	 * yet.
 	 */
-	for (i = 0; i < MAX_LCORES; i++)
+	for (i = 0; i < RTE_MAX_LCORE; i++)
 		for (j = 0; j < MAX_STREAMS; j++) {
 			lcore_infos[i].streams[j].tx_port = -1;
 			lcore_infos[i].streams[j].rx_port = -1;
@@ -1289,7 +1458,7 @@ init_lcore_info(void)
 
 	/* Print all streams */
 	printf(":: Stream -> core id[N]: (rx_port, rx_queue)->(tx_port, tx_queue)\n");
-	for (i = 0; i < MAX_LCORES; i++)
+	for (i = 0; i < RTE_MAX_LCORE; i++)
 		for (j = 0; j < MAX_STREAMS; j++) {
 			/* No streams for this core */
 			if (lcore_infos[i].streams[j].tx_port == -1)
@@ -1470,7 +1639,10 @@ main(int argc, char **argv)
 	if (nb_lcores <= 1)
 		rte_exit(EXIT_FAILURE, "This app needs at least two cores\n");
 
-	flows_handler();
+
+	printf(":: Flows Count per port: %d\n\n", rules_count);
+
+	rte_eal_mp_remote_launch(run_rte_flow_handler_cores, NULL, CALL_MAIN);
 
 	if (enable_fwd) {
 		init_lcore_info();
diff --git a/doc/guides/tools/flow-perf.rst b/doc/guides/tools/flow-perf.rst
index 634009ccee..40d157e8cb 100644
--- a/doc/guides/tools/flow-perf.rst
+++ b/doc/guides/tools/flow-perf.rst
@@ -25,15 +25,8 @@ computes an average time across all windows.
 The application also provides the ability to measure rte flow deletion rate,
 in addition to memory consumption before and after the flow rules' creation.
 
-The app supports single and multi core performance measurements.
-
-
-Known Limitations
------------------
-
-The current version has limitations which can be removed in future:
-
-* Single core insertion only.
+The app supports single and multiple core performance measurements, and
+support multiple cores insertion/deletion as well.
 
 
 Compiling the Application
@@ -103,6 +96,9 @@ The command line options are:
 *	``--portmask=N``
 	hexadecimal bitmask of ports to be used.
 
+*	``--cores=N``
+	Set the number of needed cores to insert/delete rte_flow rules.
+	Default cores count is 1.
 
 Attributes:
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH 3/4] app/flow-perf: change clock measurement functions
  2020-11-26 11:15 [dpdk-dev] [PATCH 0/4] app/flow-perf: add multi threaded support Wisam Jaddo
  2020-11-26 11:15 ` [dpdk-dev] [PATCH 1/4] app/flow-perf: refactor flows handler Wisam Jaddo
  2020-11-26 11:15 ` [dpdk-dev] [PATCH 2/4] app/flow-perf: add multiple cores insertion and deletion Wisam Jaddo
@ 2020-11-26 11:15 ` Wisam Jaddo
  2021-01-07 14:49   ` Thomas Monjalon
  2020-11-26 11:15 ` [dpdk-dev] [PATCH 4/4] app/flow-perf: remove redundant items memset and vars Wisam Jaddo
  2021-01-07 14:59 ` [dpdk-dev] [PATCH 0/4] app/flow-perf: add multi threaded support Thomas Monjalon
  4 siblings, 1 reply; 7+ messages in thread
From: Wisam Jaddo @ 2020-11-26 11:15 UTC (permalink / raw)
  To: thomas, arybchenko, suanmingm, akozyrev; +Cc: dev

The clock() function is not good practice to use for multiple
cores/threads, since it measures the CPU time used by the process
and not the wall clock time, while when running through multiple
cores/threads simultaneously, we can burn through CPU time much
faster.

As a result this commit will change the way of measurement to use
rd_tsc, and the results will be divided by the processor frequency.

Signed-off-by: Wisam Jaddo <wisamm@nvidia.com>
Reviewed-by: Alexander Kozyrev <akozyrev@nvidia.com>
Reviewed-by: Suanming Mou <suanmingm@nvidia.com>
---
 app/test-flow-perf/main.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
index 663b2e9bae..3a0e4c1951 100644
--- a/app/test-flow-perf/main.c
+++ b/app/test-flow-perf/main.c
@@ -889,7 +889,7 @@ destroy_flows(int port_id, uint8_t core_id, struct rte_flow **flows_list)
 
 	rules_count_per_core = rules_count / mc_pool.cores_count;
 
-	start_batch = clock();
+	start_batch = rte_rdtsc();
 	for (i = 0; i < (uint32_t) rules_count_per_core; i++) {
 		if (flows_list[i] == 0)
 			break;
@@ -907,12 +907,12 @@ destroy_flows(int port_id, uint8_t core_id, struct rte_flow **flows_list)
 		 * for this batch.
 		 */
 		if (!((i + 1) % rules_batch)) {
-			end_batch = clock();
+			end_batch = rte_rdtsc();
 			delta = (double) (end_batch - start_batch);
 			rules_batch_idx = ((i + 1) / rules_batch) - 1;
-			cpu_time_per_batch[rules_batch_idx] = delta / CLOCKS_PER_SEC;
+			cpu_time_per_batch[rules_batch_idx] = delta / rte_get_tsc_hz();
 			cpu_time_used += cpu_time_per_batch[rules_batch_idx];
-			start_batch = clock();
+			start_batch = rte_rdtsc();
 		}
 	}
 
@@ -985,7 +985,7 @@ insert_flows(int port_id, uint8_t core_id)
 		flows_list[flow_index++] = flow;
 	}
 
-	start_batch = clock();
+	start_batch = rte_rdtsc();
 	for (counter = start_counter; counter < end_counter; counter++) {
 		flow = generate_flow(port_id, flow_group,
 			flow_attrs, flow_items, flow_actions,
@@ -1011,12 +1011,12 @@ insert_flows(int port_id, uint8_t core_id)
 		 * for this batch.
 		 */
 		if (!((counter + 1) % rules_batch)) {
-			end_batch = clock();
+			end_batch = rte_rdtsc();
 			delta = (double) (end_batch - start_batch);
 			rules_batch_idx = ((counter + 1) / rules_batch) - 1;
-			cpu_time_per_batch[rules_batch_idx] = delta / CLOCKS_PER_SEC;
+			cpu_time_per_batch[rules_batch_idx] = delta / rte_get_tsc_hz();
 			cpu_time_used += cpu_time_per_batch[rules_batch_idx];
-			start_batch = clock();
+			start_batch = rte_rdtsc();
 		}
 	}
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH 4/4] app/flow-perf: remove redundant items memset and vars
  2020-11-26 11:15 [dpdk-dev] [PATCH 0/4] app/flow-perf: add multi threaded support Wisam Jaddo
                   ` (2 preceding siblings ...)
  2020-11-26 11:15 ` [dpdk-dev] [PATCH 3/4] app/flow-perf: change clock measurement functions Wisam Jaddo
@ 2020-11-26 11:15 ` Wisam Jaddo
  2021-01-07 14:59 ` [dpdk-dev] [PATCH 0/4] app/flow-perf: add multi threaded support Thomas Monjalon
  4 siblings, 0 replies; 7+ messages in thread
From: Wisam Jaddo @ 2020-11-26 11:15 UTC (permalink / raw)
  To: thomas, arybchenko, suanmingm, akozyrev; +Cc: dev, wisamm, stable

Since items are static then the default values will be zero,
thus the memset to zero value is just a redundant code.

Also remove the all not needed variables, that can be replaced
with direct set to the structure itself.

Fixes: bf3688f1e816 ("app/flow-perf: add insertion rate calculation")
Cc: wisamm@mellanox.com
Cc: stable@dpdk.org

Signed-off-by: Wisam Jaddo <wisamm@nvidia.com>
Reviewed-by: Alexander Kozyrev <akozyrev@nvidia.com>
Reviewed-by: Suanming Mou <suanmingm@nvidia.com>
---
 app/test-flow-perf/actions_gen.c |  30 +++-----
 app/test-flow-perf/items_gen.c   | 123 ++++++++-----------------------
 2 files changed, 44 insertions(+), 109 deletions(-)

diff --git a/app/test-flow-perf/actions_gen.c b/app/test-flow-perf/actions_gen.c
index 1364407056..c3545ba32f 100644
--- a/app/test-flow-perf/actions_gen.c
+++ b/app/test-flow-perf/actions_gen.c
@@ -143,12 +143,10 @@ add_set_meta(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_meta meta_action;
-
-	do {
-		meta_action.data = RTE_BE32(META_DATA);
-		meta_action.mask = RTE_BE32(0xffffffff);
-	} while (0);
+	static struct rte_flow_action_set_meta meta_action = {
+		.data = RTE_BE32(META_DATA),
+		.mask = RTE_BE32(0xffffffff),
+	};
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_META;
 	actions[actions_counter].conf = &meta_action;
@@ -159,13 +157,11 @@ add_set_tag(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_set_tag tag_action;
-
-	do {
-		tag_action.data = RTE_BE32(META_DATA);
-		tag_action.mask = RTE_BE32(0xffffffff);
-		tag_action.index = TAG_INDEX;
-	} while (0);
+	static struct rte_flow_action_set_tag tag_action = {
+		.data = RTE_BE32(META_DATA),
+		.mask = RTE_BE32(0xffffffff),
+		.index = TAG_INDEX,
+	};
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_SET_TAG;
 	actions[actions_counter].conf = &tag_action;
@@ -176,11 +172,9 @@ add_port_id(struct rte_flow_action *actions,
 	uint8_t actions_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_action_port_id port_id;
-
-	do {
-		port_id.id = PORT_ID_DST;
-	} while (0);
+	static struct rte_flow_action_port_id port_id = {
+		.id = PORT_ID_DST,
+	};
 
 	actions[actions_counter].type = RTE_FLOW_ACTION_TYPE_PORT_ID;
 	actions[actions_counter].conf = &port_id;
diff --git a/app/test-flow-perf/items_gen.c b/app/test-flow-perf/items_gen.c
index 0950023608..ccebc08b39 100644
--- a/app/test-flow-perf/items_gen.c
+++ b/app/test-flow-perf/items_gen.c
@@ -26,9 +26,6 @@ add_ether(struct rte_flow_item *items,
 	static struct rte_flow_item_eth eth_spec;
 	static struct rte_flow_item_eth eth_mask;
 
-	memset(&eth_spec, 0, sizeof(struct rte_flow_item_eth));
-	memset(&eth_mask, 0, sizeof(struct rte_flow_item_eth));
-
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_ETH;
 	items[items_counter].spec = &eth_spec;
 	items[items_counter].mask = &eth_mask;
@@ -39,16 +36,12 @@ add_vlan(struct rte_flow_item *items,
 	uint8_t items_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_item_vlan vlan_spec;
-	static struct rte_flow_item_vlan vlan_mask;
-
-	uint16_t vlan_value = VLAN_VALUE;
-
-	memset(&vlan_spec, 0, sizeof(struct rte_flow_item_vlan));
-	memset(&vlan_mask, 0, sizeof(struct rte_flow_item_vlan));
-
-	vlan_spec.tci = RTE_BE16(vlan_value);
-	vlan_mask.tci = RTE_BE16(0xffff);
+	static struct rte_flow_item_vlan vlan_spec = {
+		.tci = RTE_BE16(VLAN_VALUE),
+	};
+	static struct rte_flow_item_vlan vlan_mask = {
+		.tci = RTE_BE16(0xffff),
+	};
 
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_VLAN;
 	items[items_counter].spec = &vlan_spec;
@@ -63,9 +56,6 @@ add_ipv4(struct rte_flow_item *items,
 	static struct rte_flow_item_ipv4 ipv4_masks[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint8_t ti = para.core_idx;
 
-	memset(&ipv4_specs[ti], 0, sizeof(struct rte_flow_item_ipv4));
-	memset(&ipv4_masks[ti], 0, sizeof(struct rte_flow_item_ipv4));
-
 	ipv4_specs[ti].hdr.src_addr = RTE_BE32(para.src_ip);
 	ipv4_masks[ti].hdr.src_addr = RTE_BE32(0xffffffff);
 
@@ -83,9 +73,6 @@ add_ipv6(struct rte_flow_item *items,
 	static struct rte_flow_item_ipv6 ipv6_masks[RTE_MAX_LCORE] __rte_cache_aligned;
 	uint8_t ti = para.core_idx;
 
-	memset(&ipv6_specs[ti], 0, sizeof(struct rte_flow_item_ipv6));
-	memset(&ipv6_masks[ti], 0, sizeof(struct rte_flow_item_ipv6));
-
 	/** Set ipv6 src **/
 	memset(&ipv6_specs[ti].hdr.src_addr, para.src_ip,
 		sizeof(ipv6_specs->hdr.src_addr) / 2);
@@ -107,9 +94,6 @@ add_tcp(struct rte_flow_item *items,
 	static struct rte_flow_item_tcp tcp_spec;
 	static struct rte_flow_item_tcp tcp_mask;
 
-	memset(&tcp_spec, 0, sizeof(struct rte_flow_item_tcp));
-	memset(&tcp_mask, 0, sizeof(struct rte_flow_item_tcp));
-
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_TCP;
 	items[items_counter].spec = &tcp_spec;
 	items[items_counter].mask = &tcp_mask;
@@ -123,9 +107,6 @@ add_udp(struct rte_flow_item *items,
 	static struct rte_flow_item_udp udp_spec;
 	static struct rte_flow_item_udp udp_mask;
 
-	memset(&udp_spec, 0, sizeof(struct rte_flow_item_udp));
-	memset(&udp_mask, 0, sizeof(struct rte_flow_item_udp));
-
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_UDP;
 	items[items_counter].spec = &udp_spec;
 	items[items_counter].mask = &udp_mask;
@@ -144,9 +125,6 @@ add_vxlan(struct rte_flow_item *items,
 
 	vni_value = VNI_VALUE;
 
-	memset(&vxlan_specs[ti], 0, sizeof(struct rte_flow_item_vxlan));
-	memset(&vxlan_masks[ti], 0, sizeof(struct rte_flow_item_vxlan));
-
 	/* Set standard vxlan vni */
 	for (i = 0; i < 3; i++) {
 		vxlan_specs[ti].vni[2 - i] = vni_value >> (i * 8);
@@ -174,9 +152,6 @@ add_vxlan_gpe(struct rte_flow_item *items,
 
 	vni_value = VNI_VALUE;
 
-	memset(&vxlan_gpe_specs[ti], 0, sizeof(struct rte_flow_item_vxlan_gpe));
-	memset(&vxlan_gpe_masks[ti], 0, sizeof(struct rte_flow_item_vxlan_gpe));
-
 	/* Set vxlan-gpe vni */
 	for (i = 0; i < 3; i++) {
 		vxlan_gpe_specs[ti].vni[2 - i] = vni_value >> (i * 8);
@@ -196,18 +171,12 @@ add_gre(struct rte_flow_item *items,
 	uint8_t items_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_item_gre gre_spec;
-	static struct rte_flow_item_gre gre_mask;
-
-	uint16_t proto;
-
-	proto = RTE_ETHER_TYPE_TEB;
-
-	memset(&gre_spec, 0, sizeof(struct rte_flow_item_gre));
-	memset(&gre_mask, 0, sizeof(struct rte_flow_item_gre));
-
-	gre_spec.protocol = RTE_BE16(proto);
-	gre_mask.protocol = RTE_BE16(0xffff);
+	static struct rte_flow_item_gre gre_spec = {
+		.protocol = RTE_BE16(RTE_ETHER_TYPE_TEB),
+	};
+	static struct rte_flow_item_gre gre_mask = {
+		.protocol = RTE_BE16(0xffff),
+	};
 
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_GRE;
 	items[items_counter].spec = &gre_spec;
@@ -227,9 +196,6 @@ add_geneve(struct rte_flow_item *items,
 
 	vni_value = VNI_VALUE;
 
-	memset(&geneve_specs[ti], 0, sizeof(struct rte_flow_item_geneve));
-	memset(&geneve_masks[ti], 0, sizeof(struct rte_flow_item_geneve));
-
 	for (i = 0; i < 3; i++) {
 		geneve_specs[ti].vni[2 - i] = vni_value >> (i * 8);
 		geneve_masks[ti].vni[2 - i] = 0xff;
@@ -245,18 +211,12 @@ add_gtp(struct rte_flow_item *items,
 	uint8_t items_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_item_gtp gtp_spec;
-	static struct rte_flow_item_gtp gtp_mask;
-
-	uint32_t teid_value;
-
-	teid_value = TEID_VALUE;
-
-	memset(&gtp_spec, 0, sizeof(struct rte_flow_item_gtp));
-	memset(&gtp_mask, 0, sizeof(struct rte_flow_item_gtp));
-
-	gtp_spec.teid = RTE_BE32(teid_value);
-	gtp_mask.teid = RTE_BE32(0xffffffff);
+	static struct rte_flow_item_gtp gtp_spec = {
+		.teid = RTE_BE32(TEID_VALUE),
+	};
+	static struct rte_flow_item_gtp gtp_mask = {
+		.teid = RTE_BE32(0xffffffff),
+	};
 
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_GTP;
 	items[items_counter].spec = &gtp_spec;
@@ -268,18 +228,12 @@ add_meta_data(struct rte_flow_item *items,
 	uint8_t items_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_item_meta meta_spec;
-	static struct rte_flow_item_meta meta_mask;
-
-	uint32_t data;
-
-	data = META_DATA;
-
-	memset(&meta_spec, 0, sizeof(struct rte_flow_item_meta));
-	memset(&meta_mask, 0, sizeof(struct rte_flow_item_meta));
-
-	meta_spec.data = RTE_BE32(data);
-	meta_mask.data = RTE_BE32(0xffffffff);
+	static struct rte_flow_item_meta meta_spec = {
+		.data = RTE_BE32(META_DATA),
+	};
+	static struct rte_flow_item_meta meta_mask = {
+		.data = RTE_BE32(0xffffffff),
+	};
 
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_META;
 	items[items_counter].spec = &meta_spec;
@@ -292,21 +246,14 @@ add_meta_tag(struct rte_flow_item *items,
 	uint8_t items_counter,
 	__rte_unused struct additional_para para)
 {
-	static struct rte_flow_item_tag tag_spec;
-	static struct rte_flow_item_tag tag_mask;
-	uint32_t data;
-	uint8_t index;
-
-	data = META_DATA;
-	index = TAG_INDEX;
-
-	memset(&tag_spec, 0, sizeof(struct rte_flow_item_tag));
-	memset(&tag_mask, 0, sizeof(struct rte_flow_item_tag));
-
-	tag_spec.data = RTE_BE32(data);
-	tag_mask.data = RTE_BE32(0xffffffff);
-	tag_spec.index = index;
-	tag_mask.index = 0xff;
+	static struct rte_flow_item_tag tag_spec = {
+		.data = RTE_BE32(META_DATA),
+		.index = TAG_INDEX,
+	};
+	static struct rte_flow_item_tag tag_mask = {
+		.data = RTE_BE32(0xffffffff),
+		.index = 0xff,
+	};
 
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_TAG;
 	items[items_counter].spec = &tag_spec;
@@ -321,9 +268,6 @@ add_icmpv4(struct rte_flow_item *items,
 	static struct rte_flow_item_icmp icmpv4_spec;
 	static struct rte_flow_item_icmp icmpv4_mask;
 
-	memset(&icmpv4_spec, 0, sizeof(struct rte_flow_item_icmp));
-	memset(&icmpv4_mask, 0, sizeof(struct rte_flow_item_icmp));
-
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_ICMP;
 	items[items_counter].spec = &icmpv4_spec;
 	items[items_counter].mask = &icmpv4_mask;
@@ -337,9 +281,6 @@ add_icmpv6(struct rte_flow_item *items,
 	static struct rte_flow_item_icmp6 icmpv6_spec;
 	static struct rte_flow_item_icmp6 icmpv6_mask;
 
-	memset(&icmpv6_spec, 0, sizeof(struct rte_flow_item_icmp6));
-	memset(&icmpv6_mask, 0, sizeof(struct rte_flow_item_icmp6));
-
 	items[items_counter].type = RTE_FLOW_ITEM_TYPE_ICMP6;
 	items[items_counter].spec = &icmpv6_spec;
 	items[items_counter].mask = &icmpv6_mask;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] app/flow-perf: change clock measurement functions
  2020-11-26 11:15 ` [dpdk-dev] [PATCH 3/4] app/flow-perf: change clock measurement functions Wisam Jaddo
@ 2021-01-07 14:49   ` Thomas Monjalon
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Monjalon @ 2021-01-07 14:49 UTC (permalink / raw)
  To: Wisam Jaddo; +Cc: arybchenko, suanmingm, akozyrev, dev, david.marchand

26/11/2020 12:15, Wisam Jaddo:
> The clock() function is not good practice to use for multiple
> cores/threads, since it measures the CPU time used by the process
> and not the wall clock time, while when running through multiple
> cores/threads simultaneously, we can burn through CPU time much
> faster.
> 
> As a result this commit will change the way of measurement to use
> rd_tsc, and the results will be divided by the processor frequency.
> 
> Signed-off-by: Wisam Jaddo <wisamm@nvidia.com>
> Reviewed-by: Alexander Kozyrev <akozyrev@nvidia.com>
> Reviewed-by: Suanming Mou <suanmingm@nvidia.com>
> ---
> -	start_batch = clock();
> +	start_batch = rte_rdtsc();

Please could you try the generic wrapper rte_get_timer_cycles?
It should be the same (inline wrapper) when HPET is disabled.
rdtsc refer to an x86 instruction so I prefer a more generic API.

Can be a separate patch.
While at it, I believe more apps could be converted.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] app/flow-perf: add multi threaded support
  2020-11-26 11:15 [dpdk-dev] [PATCH 0/4] app/flow-perf: add multi threaded support Wisam Jaddo
                   ` (3 preceding siblings ...)
  2020-11-26 11:15 ` [dpdk-dev] [PATCH 4/4] app/flow-perf: remove redundant items memset and vars Wisam Jaddo
@ 2021-01-07 14:59 ` Thomas Monjalon
  4 siblings, 0 replies; 7+ messages in thread
From: Thomas Monjalon @ 2021-01-07 14:59 UTC (permalink / raw)
  To: Wisam Jaddo; +Cc: arybchenko, suanmingm, akozyrev, dev

26/11/2020 12:15, Wisam Jaddo:
> After this series the application will start supporting testing
> multiple threaded insertion and deletion rates.
> 
> Also it will provide the latency & throughput rates of all threads.
> 
> 
> Wisam Jaddo (4):
>   app/flow-perf: refactor flows handler
>   app/flow-perf: add multiple cores insertion and deletion
>   app/flow-perf: change clock measurement functions
>   app/flow-perf: remove redundant items memset and vars

Pending for 6 weeks without any comment.
Applied, thanks.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-07 14:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-26 11:15 [dpdk-dev] [PATCH 0/4] app/flow-perf: add multi threaded support Wisam Jaddo
2020-11-26 11:15 ` [dpdk-dev] [PATCH 1/4] app/flow-perf: refactor flows handler Wisam Jaddo
2020-11-26 11:15 ` [dpdk-dev] [PATCH 2/4] app/flow-perf: add multiple cores insertion and deletion Wisam Jaddo
2020-11-26 11:15 ` [dpdk-dev] [PATCH 3/4] app/flow-perf: change clock measurement functions Wisam Jaddo
2021-01-07 14:49   ` Thomas Monjalon
2020-11-26 11:15 ` [dpdk-dev] [PATCH 4/4] app/flow-perf: remove redundant items memset and vars Wisam Jaddo
2021-01-07 14:59 ` [dpdk-dev] [PATCH 0/4] app/flow-perf: add multi threaded support Thomas Monjalon

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git