DPDK patches and discussions
 help / color / mirror / Atom feed
* RE: [v10] app/dma-perf: introduce dma-perf application
       [not found] <mailman.14854.1687917486.886.dev@dpdk.org>
@ 2023-06-28 23:38 ` Zhang, Yuying
  0 siblings, 0 replies; only message in thread
From: Zhang, Yuying @ 2023-06-28 23:38 UTC (permalink / raw)
  To: dev, Jiang, Cheng1

Hi Cheng,

LGTM.

> -----Original Message-----
> Date: Wed, 28 Jun 2023 01:20:34 +0000
> From: Cheng Jiang <cheng1.jiang@intel.com>
> To: thomas@monjalon.net, bruce.richardson@intel.com,
> 	mb@smartsharesystems.com, chenbo.xia@intel.com,
> 	amitprakashs@marvell.com, anoobj@marvell.com,
> huangdengdui@huawei.com,
> 	kevin.laatz@intel.com, fengchengwen@huawei.com,
> jerinj@marvell.com
> Cc: dev@dpdk.org, jiayu.hu@intel.com, xuan.ding@intel.com,
> 	wenwux.ma@intel.com, yuanx.wang@intel.com,
> xingguang.he@intel.com,
> 	weix.ling@intel.com, Cheng Jiang <cheng1.jiang@intel.com>
> Subject: [PATCH v10] app/dma-perf: introduce dma-perf application
> Message-ID: <20230628012034.49016-1-cheng1.jiang@intel.com>
> Content-Type: text/plain; charset=UTF-8
> 
> There are many high-performance DMA devices supported in DPDK now, and
> these DMA devices can also be integrated into other modules of DPDK as
> accelerators, such as Vhost. Before integrating DMA into applications,
> developers need to know the performance of these DMA devices in various
> scenarios and the performance of CPUs in the same scenario, such as
> different buffer lengths. Only in this way can we know the target
> performance of the application accelerated by using them. This patch
> introduces a high-performance testing tool, which supports comparing the
> performance of CPU and DMA in different scenarios automatically with a
> pre-set config file. Memory Copy performance test are supported for now.
> 
> Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Acked-by: Morten Br?rup <mb@smartsharesystems.com>
> Acked-by: Chenbo Xia <chenbo.xia@intel.com>

Acked-by: Yuying Zhang <yuying.zhang@intel.com>

> ---
> v10:
>   rebased code from 23.07-rc2;
> v9:
>   improved error handling;
>   improved lcore_params structure;
>   improved mbuf api calling;
>   improved exit process;
>   fixed some typos;
>   added scenario summary data display;
>   removed unnecessary include;
> v8:
>   fixed string copy issue in parse_lcore();
>   improved some data display format;
>   added doc in doc/guides/tools;
>   updated release notes;
> v7:
>   fixed some strcpy issues;
>   removed cache setup in calling rte_pktmbuf_pool_create();
>   fixed some typos;
>   added some memory free and null set operations;
>   improved result calculation;
> v6:
>   improved code based on Anoob's comments;
>   fixed some code structure issues;
> v5:
>   fixed some LONG_LINE warnings;
> v4:
>   fixed inaccuracy of the memory footprint display;
> v3:
>   fixed some typos;
> v2:
>   added lcore/dmadev designation;
>   added error case process;
>   removed worker_threads parameter from config.ini;
>   improved the logs;
>   improved config file;
> 
>  app/meson.build                        |   1 +
>  app/test-dma-perf/benchmark.c          | 508 ++++++++++++++++++++
>  app/test-dma-perf/config.ini           |  61 +++
>  app/test-dma-perf/main.c               | 616 +++++++++++++++++++++++++
>  app/test-dma-perf/main.h               |  64 +++
>  app/test-dma-perf/meson.build          |  17 +
>  doc/guides/rel_notes/release_23_07.rst |   6 +
>  doc/guides/tools/dmaperf.rst           | 103 +++++
>  doc/guides/tools/index.rst             |   1 +
>  9 files changed, 1377 insertions(+)
>  create mode 100644 app/test-dma-perf/benchmark.c  create mode 100644
> app/test-dma-perf/config.ini  create mode 100644 app/test-dma-perf/main.c
> create mode 100644 app/test-dma-perf/main.h  create mode 100644
> app/test-dma-perf/meson.build  create mode 100644
> doc/guides/tools/dmaperf.rst
> 
> diff --git a/app/meson.build b/app/meson.build index
> 74d2420f67..4fc1a83eba 100644
> --- a/app/meson.build
> +++ b/app/meson.build
> @@ -19,6 +19,7 @@ apps = [
>          'test-cmdline',
>          'test-compress-perf',
>          'test-crypto-perf',
> +        'test-dma-perf',
>          'test-eventdev',
>          'test-fib',
>          'test-flow-perf',
> diff --git a/app/test-dma-perf/benchmark.c b/app/test-dma-
> perf/benchmark.c new file mode 100644 index 0000000000..0601e0d171
> --- /dev/null
> +++ b/app/test-dma-perf/benchmark.c
> @@ -0,0 +1,508 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2023 Intel Corporation
> + */
> +
> +#include <inttypes.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +
> +#include <rte_time.h>
> +#include <rte_mbuf.h>
> +#include <rte_dmadev.h>
> +#include <rte_malloc.h>
> +#include <rte_lcore.h>
> +
> +#include "main.h"
> +
> +#define MAX_DMA_CPL_NB 255
> +
> +#define TEST_WAIT_U_SECOND 10000
> +#define POLL_MAX 1000
> +
> +#define CSV_LINE_DMA_FMT "Scenario %u,%u,%s,%u,%u,%u,%u,%.2lf,%"
> PRIu64 ",%.3lf,%.3lf\n"
> +#define CSV_LINE_CPU_FMT "Scenario %u,%u,NA,NA,NA,%u,%u,%.2lf,%"
> PRIu64 ",%.3lf,%.3lf\n"
> +
> +#define CSV_TOTAL_LINE_FMT "Scenario %u
> Summary, , , , , ,%u,%.2lf,%u,%.3lf,%.3lf\n"
> +
> +struct worker_info {
> +	bool ready_flag;
> +	bool start_flag;
> +	bool stop_flag;
> +	uint32_t total_cpl;
> +	uint32_t test_cpl;
> +};
> +
> +struct lcore_params {
> +	uint8_t scenario_id;
> +	unsigned int lcore_id;
> +	char *dma_name;
> +	uint16_t worker_id;
> +	uint16_t dev_id;
> +	uint32_t nr_buf;
> +	uint16_t kick_batch;
> +	uint32_t buf_size;
> +	uint16_t test_secs;
> +	struct rte_mbuf **srcs;
> +	struct rte_mbuf **dsts;
> +	volatile struct worker_info worker_info; };
> +
> +static struct rte_mempool *src_pool;
> +static struct rte_mempool *dst_pool;
> +
> +static struct lcore_params *lcores[MAX_WORKER_NB];
> +
> +#define PRINT_ERR(...) print_err(__func__, __LINE__, __VA_ARGS__)
> +
> +static inline int
> +__rte_format_printf(3, 4)
> +print_err(const char *func, int lineno, const char *format, ...) {
> +	va_list ap;
> +	int ret;
> +
> +	ret = fprintf(stderr, "In %s:%d - ", func, lineno);
> +	va_start(ap, format);
> +	ret += vfprintf(stderr, format, ap);
> +	va_end(ap);
> +
> +	return ret;
> +}
> +
> +static inline void
> +calc_result(uint32_t buf_size, uint32_t nr_buf, uint16_t nb_workers,
> uint16_t test_secs,
> +				uint32_t total_cnt, float *memory, uint32_t
> *ave_cycle,
> +				float *bandwidth, float *mops)
> +{
> +	float ops;
> +
> +	*memory = (float)(buf_size * (nr_buf / nb_workers) * 2) / (1024 *
> 1024);
> +	*ave_cycle = test_secs * rte_get_timer_hz() / total_cnt;
> +	ops = (float)total_cnt / test_secs;
> +	*mops = ops / (1000 * 1000);
> +	*bandwidth = (ops * buf_size * 8) / (1000 * 1000 * 1000); }
> +
> +static void
> +output_result(uint8_t scenario_id, uint32_t lcore_id, char *dma_name,
> uint16_t ring_size,
> +			uint16_t kick_batch, uint64_t ave_cycle, uint32_t
> buf_size, uint32_t nr_buf,
> +			float memory, float bandwidth, float mops, bool
> is_dma) {
> +	if (is_dma)
> +		printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch
> Size: %u.\n",
> +				lcore_id, dma_name, ring_size, kick_batch);
> +	else
> +		printf("lcore %u\n", lcore_id);
> +
> +	printf("Average Cycles/op: %" PRIu64 ", Buffer Size: %u B, Buffer
> Number: %u, Memory: %.2lf MB, Frequency: %.3lf Ghz.\n",
> +			ave_cycle, buf_size, nr_buf, memory,
> rte_get_timer_hz()/1000000000.0);
> +	printf("Average Bandwidth: %.3lf Gbps, MOps: %.3lf\n", bandwidth,
> +mops);
> +
> +	if (is_dma)
> +		snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN,
> CSV_LINE_DMA_FMT,
> +			scenario_id, lcore_id, dma_name, ring_size,
> kick_batch, buf_size,
> +			nr_buf, memory, ave_cycle, bandwidth, mops);
> +	else
> +		snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN,
> CSV_LINE_CPU_FMT,
> +			scenario_id, lcore_id, buf_size,
> +			nr_buf, memory, ave_cycle, bandwidth, mops); }
> +
> +static inline void
> +cache_flush_buf(__rte_unused struct rte_mbuf **array,
> +		__rte_unused uint32_t buf_size,
> +		__rte_unused uint32_t nr_buf)
> +{
> +#ifdef RTE_ARCH_X86_64
> +	char *data;
> +	struct rte_mbuf **srcs = array;
> +	uint32_t i, offset;
> +
> +	for (i = 0; i < nr_buf; i++) {
> +		data = rte_pktmbuf_mtod(srcs[i], char *);
> +		for (offset = 0; offset < buf_size; offset += 64)
> +			__builtin_ia32_clflush(data + offset);
> +	}
> +#endif
> +}
> +
> +/* Configuration of device. */
> +static void
> +configure_dmadev_queue(uint32_t dev_id, uint32_t ring_size) {
> +	uint16_t vchan = 0;
> +	struct rte_dma_info info;
> +	struct rte_dma_conf dev_config = { .nb_vchans = 1 };
> +	struct rte_dma_vchan_conf qconf = {
> +		.direction = RTE_DMA_DIR_MEM_TO_MEM,
> +		.nb_desc = ring_size
> +	};
> +
> +	if (rte_dma_configure(dev_id, &dev_config) != 0)
> +		rte_exit(EXIT_FAILURE, "Error with dma configure.\n");
> +
> +	if (rte_dma_vchan_setup(dev_id, vchan, &qconf) != 0)
> +		rte_exit(EXIT_FAILURE, "Error with queue configuration.\n");
> +
> +	if (rte_dma_info_get(dev_id, &info) != 0)
> +		rte_exit(EXIT_FAILURE, "Error with getting device info.\n");
> +
> +	if (info.nb_vchans != 1)
> +		rte_exit(EXIT_FAILURE, "Error, no configured queues
> reported on device id. %u\n",
> +				dev_id);
> +
> +	if (rte_dma_start(dev_id) != 0)
> +		rte_exit(EXIT_FAILURE, "Error with dma start.\n"); }
> +
> +static int
> +config_dmadevs(struct test_configure *cfg) {
> +	uint32_t ring_size = cfg->ring_size.cur;
> +	struct lcore_dma_map_t *ldm = &cfg->lcore_dma_map;
> +	uint32_t nb_workers = ldm->cnt;
> +	uint32_t i;
> +	int dev_id;
> +	uint16_t nb_dmadevs = 0;
> +	char *dma_name;
> +
> +	for (i = 0; i < ldm->cnt; i++) {
> +		dma_name = ldm->dma_names[i];
> +		dev_id = rte_dma_get_dev_id_by_name(dma_name);
> +		if (dev_id < 0) {
> +			fprintf(stderr, "Error: Fail to find DMA %s.\n",
> dma_name);
> +			goto end;
> +		}
> +
> +		ldm->dma_ids[i] = dev_id;
> +		configure_dmadev_queue(dev_id, ring_size);
> +		++nb_dmadevs;
> +	}
> +
> +end:
> +	if (nb_dmadevs < nb_workers) {
> +		printf("Not enough dmadevs (%u) for all workers (%u).\n",
> nb_dmadevs, nb_workers);
> +		return -1;
> +	}
> +
> +	printf("Number of used dmadevs: %u.\n", nb_dmadevs);
> +
> +	return 0;
> +}
> +
> +static void
> +error_exit(int dev_id)
> +{
> +	rte_dma_stop(dev_id);
> +	rte_dma_close(dev_id);
> +	rte_exit(EXIT_FAILURE, "DMA error\n"); }
> +
> +static inline void
> +do_dma_submit_and_poll(uint16_t dev_id, uint64_t *async_cnt,
> +			volatile struct worker_info *worker_info) {
> +	int ret;
> +	uint16_t nr_cpl;
> +
> +	ret = rte_dma_submit(dev_id, 0);
> +	if (ret < 0)
> +		error_exit(dev_id);
> +
> +	nr_cpl = rte_dma_completed(dev_id, 0, MAX_DMA_CPL_NB, NULL,
> NULL);
> +	*async_cnt -= nr_cpl;
> +	worker_info->total_cpl += nr_cpl;
> +}
> +
> +static inline int
> +do_dma_mem_copy(void *p)
> +{
> +	struct lcore_params *para = (struct lcore_params *)p;
> +	volatile struct worker_info *worker_info = &(para->worker_info);
> +	const uint16_t dev_id = para->dev_id;
> +	const uint32_t nr_buf = para->nr_buf;
> +	const uint16_t kick_batch = para->kick_batch;
> +	const uint32_t buf_size = para->buf_size;
> +	struct rte_mbuf **srcs = para->srcs;
> +	struct rte_mbuf **dsts = para->dsts;
> +	uint16_t nr_cpl;
> +	uint64_t async_cnt = 0;
> +	uint32_t i;
> +	uint32_t poll_cnt = 0;
> +	int ret;
> +
> +	worker_info->stop_flag = false;
> +	worker_info->ready_flag = true;
> +
> +	while (!worker_info->start_flag)
> +		;
> +
> +	while (1) {
> +		for (i = 0; i < nr_buf; i++) {
> +dma_copy:
> +			ret = rte_dma_copy(dev_id, 0,
> rte_mbuf_data_iova(srcs[i]),
> +				rte_mbuf_data_iova(dsts[i]), buf_size, 0);
> +			if (unlikely(ret < 0)) {
> +				if (ret == -ENOSPC) {
> +					do_dma_submit_and_poll(dev_id,
> &async_cnt, worker_info);
> +					goto dma_copy;
> +				} else
> +					error_exit(dev_id);
> +			}
> +			async_cnt++;
> +
> +			if ((async_cnt % kick_batch) == 0)
> +				do_dma_submit_and_poll(dev_id,
> &async_cnt, worker_info);
> +		}
> +
> +		if (worker_info->stop_flag)
> +			break;
> +	}
> +
> +	rte_dma_submit(dev_id, 0);
> +	while ((async_cnt > 0) && (poll_cnt++ < POLL_MAX)) {
> +		nr_cpl = rte_dma_completed(dev_id, 0, MAX_DMA_CPL_NB,
> NULL, NULL);
> +		async_cnt -= nr_cpl;
> +	}
> +
> +	return 0;
> +}
> +
> +static inline int
> +do_cpu_mem_copy(void *p)
> +{
> +	struct lcore_params *para = (struct lcore_params *)p;
> +	volatile struct worker_info *worker_info = &(para->worker_info);
> +	const uint32_t nr_buf = para->nr_buf;
> +	const uint32_t buf_size = para->buf_size;
> +	struct rte_mbuf **srcs = para->srcs;
> +	struct rte_mbuf **dsts = para->dsts;
> +	uint32_t i;
> +
> +	worker_info->stop_flag = false;
> +	worker_info->ready_flag = true;
> +
> +	while (!worker_info->start_flag)
> +		;
> +
> +	while (1) {
> +		for (i = 0; i < nr_buf; i++) {
> +			/* copy buffer form src to dst */
> +			rte_memcpy((void
> *)(uintptr_t)rte_mbuf_data_iova(dsts[i]),
> +				(void
> *)(uintptr_t)rte_mbuf_data_iova(srcs[i]),
> +				(size_t)buf_size);
> +			worker_info->total_cpl++;
> +		}
> +		if (worker_info->stop_flag)
> +			break;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +setup_memory_env(struct test_configure *cfg, struct rte_mbuf ***srcs,
> +			struct rte_mbuf ***dsts)
> +{
> +	unsigned int buf_size = cfg->buf_size.cur;
> +	unsigned int nr_sockets;
> +	uint32_t nr_buf = cfg->nr_buf;
> +
> +	nr_sockets = rte_socket_count();
> +	if (cfg->src_numa_node >= nr_sockets ||
> +		cfg->dst_numa_node >= nr_sockets) {
> +		printf("Error: Source or destination numa exceeds the acture
> numa nodes.\n");
> +		return -1;
> +	}
> +
> +	src_pool = rte_pktmbuf_pool_create("Benchmark_DMA_SRC",
> +			nr_buf,
> +			0,
> +			0,
> +			buf_size + RTE_PKTMBUF_HEADROOM,
> +			cfg->src_numa_node);
> +	if (src_pool == NULL) {
> +		PRINT_ERR("Error with source mempool creation.\n");
> +		return -1;
> +	}
> +
> +	dst_pool = rte_pktmbuf_pool_create("Benchmark_DMA_DST",
> +			nr_buf,
> +			0,
> +			0,
> +			buf_size + RTE_PKTMBUF_HEADROOM,
> +			cfg->dst_numa_node);
> +	if (dst_pool == NULL) {
> +		PRINT_ERR("Error with destination mempool creation.\n");
> +		return -1;
> +	}
> +
> +	*srcs = rte_malloc(NULL, nr_buf * sizeof(struct rte_mbuf *), 0);
> +	if (*srcs == NULL) {
> +		printf("Error: srcs malloc failed.\n");
> +		return -1;
> +	}
> +
> +	*dsts = rte_malloc(NULL, nr_buf * sizeof(struct rte_mbuf *), 0);
> +	if (*dsts == NULL) {
> +		printf("Error: dsts malloc failed.\n");
> +		return -1;
> +	}
> +
> +	if (rte_pktmbuf_alloc_bulk(src_pool, *srcs, nr_buf) != 0) {
> +		printf("alloc src mbufs failed.\n");
> +		return -1;
> +	}
> +
> +	if (rte_pktmbuf_alloc_bulk(dst_pool, *dsts, nr_buf) != 0) {
> +		printf("alloc dst mbufs failed.\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +void
> +mem_copy_benchmark(struct test_configure *cfg, bool is_dma) {
> +	uint16_t i;
> +	uint32_t offset;
> +	unsigned int lcore_id = 0;
> +	struct rte_mbuf **srcs = NULL, **dsts = NULL;
> +	struct lcore_dma_map_t *ldm = &cfg->lcore_dma_map;
> +	unsigned int buf_size = cfg->buf_size.cur;
> +	uint16_t kick_batch = cfg->kick_batch.cur;
> +	uint32_t nr_buf = cfg->nr_buf = (cfg->mem_size.cur * 1024 * 1024) /
> (cfg->buf_size.cur * 2);
> +	uint16_t nb_workers = ldm->cnt;
> +	uint16_t test_secs = cfg->test_secs;
> +	float memory = 0;
> +	uint32_t avg_cycles = 0;
> +	uint32_t avg_cycles_total;
> +	float mops, mops_total;
> +	float bandwidth, bandwidth_total;
> +
> +	if (setup_memory_env(cfg, &srcs, &dsts) < 0)
> +		goto out;
> +
> +	if (is_dma)
> +		if (config_dmadevs(cfg) < 0)
> +			goto out;
> +
> +	if (cfg->cache_flush == 1) {
> +		cache_flush_buf(srcs, buf_size, nr_buf);
> +		cache_flush_buf(dsts, buf_size, nr_buf);
> +		rte_mb();
> +	}
> +
> +	printf("Start testing....\n");
> +
> +	for (i = 0; i < nb_workers; i++) {
> +		lcore_id = ldm->lcores[i];
> +		offset = nr_buf / nb_workers * i;
> +		lcores[i] = rte_malloc(NULL, sizeof(struct lcore_params), 0);
> +		if (lcores[i] == NULL) {
> +			printf("lcore parameters malloc failure for
> lcore %d\n", lcore_id);
> +			break;
> +		}
> +		if (is_dma) {
> +			lcores[i]->dma_name = ldm->dma_names[i];
> +			lcores[i]->dev_id = ldm->dma_ids[i];
> +			lcores[i]->kick_batch = kick_batch;
> +		}
> +		lcores[i]->worker_id = i;
> +		lcores[i]->nr_buf = (uint32_t)(nr_buf / nb_workers);
> +		lcores[i]->buf_size = buf_size;
> +		lcores[i]->test_secs = test_secs;
> +		lcores[i]->srcs = srcs + offset;
> +		lcores[i]->dsts = dsts + offset;
> +		lcores[i]->scenario_id = cfg->scenario_id;
> +		lcores[i]->lcore_id = lcore_id;
> +
> +		if (is_dma)
> +			rte_eal_remote_launch(do_dma_mem_copy, (void
> *)(lcores[i]), lcore_id);
> +		else
> +			rte_eal_remote_launch(do_cpu_mem_copy, (void
> *)(lcores[i]), lcore_id);
> +	}
> +
> +	while (1) {
> +		bool ready = true;
> +		for (i = 0; i < nb_workers; i++) {
> +			if (lcores[i]->worker_info.ready_flag == false) {
> +				ready = 0;
> +				break;
> +			}
> +		}
> +		if (ready)
> +			break;
> +	}
> +
> +	for (i = 0; i < nb_workers; i++)
> +		lcores[i]->worker_info.start_flag = true;
> +
> +	usleep(TEST_WAIT_U_SECOND);
> +	for (i = 0; i < nb_workers; i++)
> +		lcores[i]->worker_info.test_cpl = lcores[i]-
> >worker_info.total_cpl;
> +
> +	usleep(test_secs * 1000 * 1000);
> +	for (i = 0; i < nb_workers; i++)
> +		lcores[i]->worker_info.test_cpl = lcores[i]-
> >worker_info.total_cpl -
> +						lcores[i]-
> >worker_info.test_cpl;
> +
> +	for (i = 0; i < nb_workers; i++)
> +		lcores[i]->worker_info.stop_flag = true;
> +
> +	rte_eal_mp_wait_lcore();
> +
> +	mops_total = 0;
> +	bandwidth_total = 0;
> +	avg_cycles_total = 0;
> +	for (i = 0; i < nb_workers; i++) {
> +		calc_result(buf_size, nr_buf, nb_workers, test_secs,
> +			lcores[i]->worker_info.test_cpl,
> +			&memory, &avg_cycles, &bandwidth, &mops);
> +		output_result(cfg->scenario_id, lcores[i]->lcore_id,
> +					lcores[i]->dma_name, cfg-
> >ring_size.cur, kick_batch,
> +					avg_cycles, buf_size, nr_buf /
> nb_workers, memory,
> +					bandwidth, mops, is_dma);
> +		mops_total += mops;
> +		bandwidth_total += bandwidth;
> +		avg_cycles_total += avg_cycles;
> +	}
> +	printf("\nTotal Bandwidth: %.3lf Gbps, Total MOps: %.3lf\n",
> bandwidth_total, mops_total);
> +	snprintf(output_str[MAX_WORKER_NB], MAX_OUTPUT_STR_LEN,
> CSV_TOTAL_LINE_FMT,
> +			cfg->scenario_id, nr_buf, memory * nb_workers,
> +			avg_cycles_total / nb_workers, bandwidth_total,
> mops_total);
> +
> +out:
> +	/* free mbufs used in the test */
> +	if (srcs != NULL)
> +		rte_pktmbuf_free_bulk(srcs, nr_buf);
> +	if (dsts != NULL)
> +		rte_pktmbuf_free_bulk(dsts, nr_buf);
> +
> +	/* free the points for the mbufs */
> +	rte_free(srcs);
> +	srcs = NULL;
> +	rte_free(dsts);
> +	dsts = NULL;
> +
> +	rte_mempool_free(src_pool);
> +	src_pool = NULL;
> +
> +	rte_mempool_free(dst_pool);
> +	dst_pool = NULL;
> +
> +	/* free the worker parameters */
> +	for (i = 0; i < nb_workers; i++) {
> +		rte_free(lcores[i]);
> +		lcores[i] = NULL;
> +	}
> +
> +	if (is_dma) {
> +		for (i = 0; i < nb_workers; i++) {
> +			printf("Stopping dmadev %d\n", ldm->dma_ids[i]);
> +			rte_dma_stop(ldm->dma_ids[i]);
> +		}
> +	}
> +}
> diff --git a/app/test-dma-perf/config.ini b/app/test-dma-perf/config.ini new
> file mode 100644 index 0000000000..b550f4b23f
> --- /dev/null
> +++ b/app/test-dma-perf/config.ini
> @@ -0,0 +1,61 @@
> +
> +; This is an example configuration file for dma-perf, which details the
> +meanings of each parameter ; and instructions on how to use dma-perf.
> +
> +; Supported test types are DMA_MEM_COPY and CPU_MEM_COPY.
> +
> +; Parameters:
> +; "mem_size" denotes the size of the memory footprint.
> +; "buf_size" denotes the memory size of a single operation.
> +; "dma_ring_size" denotes the dma ring buffer size. It should be must
> +be a power of two, and between ;  64 and 4096.
> +; "kick_batch" denotes the dma operation batch size, and should be greater
> than 1 normally.
> +
> +; The format for variables is variable=first,last,increment,ADD|MUL.
> +
> +; src_numa_node is used to control the numa node where the source
> memory is allocated.
> +; dst_numa_node is used to control the numa node where the destination
> memory is allocated.
> +
> +; cache_flush is used to determine whether or not the cache should be
> +flushed, with 1 indicating to ; flush and 0 indicating to not flush.
> +
> +; test_seconds controls the test time of the whole case.
> +
> +; To use DMA for a test, please specify the "lcore_dma" parameter.
> +; If you have already set the "-l" and "-a" parameters using EAL, ;
> +make sure that the value of "lcore_dma" falls within their range of the
> values.
> +; We have to ensure a 1:1 mapping between the core and DMA device.
> +
> +; To use CPU for a test, please specify the "lcore" parameter.
> +; If you have already set the "-l" and "-a" parameters using EAL, ;
> +make sure that the value of "lcore" falls within their range of values.
> +
> +; To specify a configuration file, use the "--config" flag followed by the path
> to the file.
> +
> +; To specify a result file, use the "--result" flag followed by the path to the
> file.
> +; If you do not specify a result file, one will be generated with the
> +same name as the configuration ; file, with the addition of "_result.csv" at
> the end.
> +
> +[case1]
> +type=DMA_MEM_COPY
> +mem_size=10
> +buf_size=64,8192,2,MUL
> +dma_ring_size=1024
> +kick_batch=32
> +src_numa_node=0
> +dst_numa_node=0
> +cache_flush=0
> +test_seconds=2
> +lcore_dma=lcore10@0000:00:04.2, lcore11@0000:00:04.3
> +eal_args=--in-memory --file-prefix=test
> +
> +[case2]
> +type=CPU_MEM_COPY
> +mem_size=10
> +buf_size=64,8192,2,MUL
> +src_numa_node=0
> +dst_numa_node=1
> +cache_flush=0
> +test_seconds=2
> +lcore = 3, 4
> +eal_args=--in-memory --no-pci
> diff --git a/app/test-dma-perf/main.c b/app/test-dma-perf/main.c new file
> mode 100644 index 0000000000..de37120df6
> --- /dev/null
> +++ b/app/test-dma-perf/main.c
> @@ -0,0 +1,616 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2023 Intel Corporation
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <getopt.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +#include <unistd.h>
> +#include <sys/wait.h>
> +#include <inttypes.h>
> +#include <libgen.h>
> +
> +#include <rte_eal.h>
> +#include <rte_cfgfile.h>
> +#include <rte_string_fns.h>
> +#include <rte_lcore.h>
> +
> +#include "main.h"
> +
> +#define CSV_HDR_FMT "Case %u : %s,lcore,DMA,DMA ring size,kick batch
> size,buffer size(B),number of buffers,memory(MB),average
> cycle,bandwidth(Gbps),MOps\n"
> +
> +#define MAX_EAL_PARAM_NB 100
> +#define MAX_EAL_PARAM_LEN 1024
> +
> +#define DMA_MEM_COPY "DMA_MEM_COPY"
> +#define CPU_MEM_COPY "CPU_MEM_COPY"
> +
> +#define CMDLINE_CONFIG_ARG "--config"
> +#define CMDLINE_RESULT_ARG "--result"
> +
> +#define MAX_PARAMS_PER_ENTRY 4
> +
> +#define MAX_LONG_OPT_SZ 64
> +
> +enum {
> +	TEST_TYPE_NONE = 0,
> +	TEST_TYPE_DMA_MEM_COPY,
> +	TEST_TYPE_CPU_MEM_COPY
> +};
> +
> +#define MAX_TEST_CASES 16
> +static struct test_configure test_cases[MAX_TEST_CASES];
> +
> +char output_str[MAX_WORKER_NB + 1][MAX_OUTPUT_STR_LEN];
> +
> +static FILE *fd;
> +
> +static void
> +output_csv(bool need_blankline)
> +{
> +	uint32_t i;
> +
> +	if (need_blankline) {
> +		fprintf(fd, ",,,,,,,,\n");
> +		fprintf(fd, ",,,,,,,,\n");
> +	}
> +
> +	for (i = 0; i < RTE_DIM(output_str); i++) {
> +		if (output_str[i][0]) {
> +			fprintf(fd, "%s", output_str[i]);
> +			output_str[i][0] = '\0';
> +		}
> +	}
> +
> +	fflush(fd);
> +}
> +
> +static void
> +output_env_info(void)
> +{
> +	snprintf(output_str[0], MAX_OUTPUT_STR_LEN, "Test
> Environment:\n");
> +	snprintf(output_str[1], MAX_OUTPUT_STR_LEN, "CPU
> frequency,%.3lf Ghz",
> +			rte_get_timer_hz() / 1000000000.0);
> +
> +	output_csv(true);
> +}
> +
> +static void
> +output_header(uint32_t case_id, struct test_configure *case_cfg) {
> +	snprintf(output_str[0], MAX_OUTPUT_STR_LEN,
> +			CSV_HDR_FMT, case_id, case_cfg->test_type_str);
> +
> +	output_csv(true);
> +}
> +
> +static void
> +run_test_case(struct test_configure *case_cfg) {
> +	switch (case_cfg->test_type) {
> +	case TEST_TYPE_DMA_MEM_COPY:
> +		mem_copy_benchmark(case_cfg, true);
> +		break;
> +	case TEST_TYPE_CPU_MEM_COPY:
> +		mem_copy_benchmark(case_cfg, false);
> +		break;
> +	default:
> +		printf("Unknown test type. %s\n", case_cfg->test_type_str);
> +		break;
> +	}
> +}
> +
> +static void
> +run_test(uint32_t case_id, struct test_configure *case_cfg) {
> +	uint32_t i;
> +	uint32_t nb_lcores = rte_lcore_count();
> +	struct test_configure_entry *mem_size = &case_cfg->mem_size;
> +	struct test_configure_entry *buf_size = &case_cfg->buf_size;
> +	struct test_configure_entry *ring_size = &case_cfg->ring_size;
> +	struct test_configure_entry *kick_batch = &case_cfg->kick_batch;
> +	struct test_configure_entry dummy = { 0 };
> +	struct test_configure_entry *var_entry = &dummy;
> +
> +	for (i = 0; i < RTE_DIM(output_str); i++)
> +		memset(output_str[i], 0, MAX_OUTPUT_STR_LEN);
> +
> +	if (nb_lcores <= case_cfg->lcore_dma_map.cnt) {
> +		printf("Case %u: Not enough lcores.\n", case_id);
> +		return;
> +	}
> +
> +	printf("Number of used lcores: %u.\n", nb_lcores);
> +
> +	if (mem_size->incr != 0)
> +		var_entry = mem_size;
> +
> +	if (buf_size->incr != 0)
> +		var_entry = buf_size;
> +
> +	if (ring_size->incr != 0)
> +		var_entry = ring_size;
> +
> +	if (kick_batch->incr != 0)
> +		var_entry = kick_batch;
> +
> +	case_cfg->scenario_id = 0;
> +
> +	output_header(case_id, case_cfg);
> +
> +	for (var_entry->cur = var_entry->first; var_entry->cur <= var_entry-
> >last;) {
> +		case_cfg->scenario_id++;
> +		printf("\nRunning scenario %d\n", case_cfg->scenario_id);
> +
> +		run_test_case(case_cfg);
> +		output_csv(false);
> +
> +		if (var_entry->op == OP_ADD)
> +			var_entry->cur += var_entry->incr;
> +		else if (var_entry->op == OP_MUL)
> +			var_entry->cur *= var_entry->incr;
> +		else {
> +			printf("No proper operation for variable entry.\n");
> +			break;
> +		}
> +	}
> +}
> +
> +static int
> +parse_lcore(struct test_configure *test_case, const char *value) {
> +	uint16_t len;
> +	char *input;
> +	struct lcore_dma_map_t *lcore_dma_map;
> +
> +	if (test_case == NULL || value == NULL)
> +		return -1;
> +
> +	len = strlen(value);
> +	input = (char *)malloc((len + 1) * sizeof(char));
> +	strlcpy(input, value, len + 1);
> +	lcore_dma_map = &(test_case->lcore_dma_map);
> +
> +	memset(lcore_dma_map, 0, sizeof(struct lcore_dma_map_t));
> +
> +	char *token = strtok(input, ", ");
> +	while (token != NULL) {
> +		if (lcore_dma_map->cnt >= MAX_LCORE_NB) {
> +			free(input);
> +			return -1;
> +		}
> +
> +		uint16_t lcore_id = atoi(token);
> +		lcore_dma_map->lcores[lcore_dma_map->cnt++] = lcore_id;
> +
> +		token = strtok(NULL, ", ");
> +	}
> +
> +	free(input);
> +	return 0;
> +}
> +
> +static int
> +parse_lcore_dma(struct test_configure *test_case, const char *value) {
> +	struct lcore_dma_map_t *lcore_dma_map;
> +	char *input, *addrs;
> +	char *ptrs[2];
> +	char *start, *end, *substr;
> +	uint16_t lcore_id;
> +	int ret = 0;
> +
> +	if (test_case == NULL || value == NULL)
> +		return -1;
> +
> +	input = strndup(value, strlen(value) + 1);
> +	addrs = input;
> +
> +	while (*addrs == '\0')
> +		addrs++;
> +	if (*addrs == '\0') {
> +		fprintf(stderr, "No input DMA addresses\n");
> +		ret = -1;
> +		goto out;
> +	}
> +
> +	substr = strtok(addrs, ",");
> +	if (substr == NULL) {
> +		fprintf(stderr, "No input DMA address\n");
> +		ret = -1;
> +		goto out;
> +	}
> +
> +	memset(&test_case->lcore_dma_map, 0, sizeof(struct
> lcore_dma_map_t));
> +
> +	do {
> +		if (rte_strsplit(substr, strlen(substr), ptrs, 2, '@') < 0) {
> +			fprintf(stderr, "Illegal DMA address\n");
> +			ret = -1;
> +			break;
> +		}
> +
> +		start = strstr(ptrs[0], "lcore");
> +		if (start == NULL) {
> +			fprintf(stderr, "Illegal lcore\n");
> +			ret = -1;
> +			break;
> +		}
> +
> +		start += 5;
> +		lcore_id = strtol(start, &end, 0);
> +		if (end == start) {
> +			fprintf(stderr, "No input lcore ID or ID %d is wrong\n",
> lcore_id);
> +			ret = -1;
> +			break;
> +		}
> +
> +		lcore_dma_map = &test_case->lcore_dma_map;
> +		if (lcore_dma_map->cnt >= MAX_LCORE_NB) {
> +			fprintf(stderr, "lcores count error\n");
> +			ret = -1;
> +			break;
> +		}
> +
> +		lcore_dma_map->lcores[lcore_dma_map->cnt] = lcore_id;
> +		strlcpy(lcore_dma_map->dma_names[lcore_dma_map->cnt],
> ptrs[1],
> +				RTE_DEV_NAME_MAX_LEN);
> +		lcore_dma_map->cnt++;
> +		substr = strtok(NULL, ",");
> +	} while (substr != NULL);
> +
> +out:
> +	free(input);
> +	return ret;
> +}
> +
> +static int
> +parse_entry(const char *value, struct test_configure_entry *entry) {
> +	char input[255] = {0};
> +	char *args[MAX_PARAMS_PER_ENTRY];
> +	int args_nr = -1;
> +	int ret;
> +
> +	if (value == NULL || entry == NULL)
> +		goto out;
> +
> +	strncpy(input, value, 254);
> +	if (*input == '\0')
> +		goto out;
> +
> +	ret = rte_strsplit(input, strlen(input), args, MAX_PARAMS_PER_ENTRY,
> ',');
> +	if (ret != 1 && ret != 4)
> +		goto out;
> +
> +	entry->cur = entry->first = (uint32_t)atoi(args[0]);
> +
> +	if (ret == 4) {
> +		args_nr = 4;
> +		entry->last = (uint32_t)atoi(args[1]);
> +		entry->incr = (uint32_t)atoi(args[2]);
> +		if (!strcmp(args[3], "MUL"))
> +			entry->op = OP_MUL;
> +		else if (!strcmp(args[3], "ADD"))
> +			entry->op = OP_ADD;
> +		else {
> +			args_nr = -1;
> +			printf("Invalid op %s.\n", args[3]);
> +		}
> +
> +	} else {
> +		args_nr = 1;
> +		entry->op = OP_NONE;
> +		entry->last = 0;
> +		entry->incr = 0;
> +	}
> +out:
> +	return args_nr;
> +}
> +
> +static uint16_t
> +load_configs(const char *path)
> +{
> +	struct rte_cfgfile *cfgfile;
> +	int nb_sections, i;
> +	struct test_configure *test_case;
> +	char section_name[CFG_NAME_LEN];
> +	const char *case_type;
> +	const char *lcore_dma;
> +	const char *mem_size_str, *buf_size_str, *ring_size_str,
> *kick_batch_str;
> +	int args_nr, nb_vp;
> +	bool is_dma;
> +
> +	printf("config file parsing...\n");
> +	cfgfile = rte_cfgfile_load(path, 0);
> +	if (!cfgfile) {
> +		printf("Open configure file error.\n");
> +		exit(1);
> +	}
> +
> +	nb_sections = rte_cfgfile_num_sections(cfgfile, NULL, 0);
> +	if (nb_sections > MAX_TEST_CASES) {
> +		printf("Error: The maximum number of cases is %d.\n",
> MAX_TEST_CASES);
> +		exit(1);
> +	}
> +
> +	for (i = 0; i < nb_sections; i++) {
> +		snprintf(section_name, CFG_NAME_LEN, "case%d", i + 1);
> +		test_case = &test_cases[i];
> +		case_type = rte_cfgfile_get_entry(cfgfile, section_name,
> "type");
> +		if (case_type == NULL) {
> +			printf("Error: No case type in case %d, the test will be
> finished here.\n",
> +				i + 1);
> +			test_case->is_valid = false;
> +			continue;
> +		}
> +
> +		if (strcmp(case_type, DMA_MEM_COPY) == 0) {
> +			test_case->test_type = TEST_TYPE_DMA_MEM_COPY;
> +			test_case->test_type_str = DMA_MEM_COPY;
> +			is_dma = true;
> +		} else if (strcmp(case_type, CPU_MEM_COPY) == 0) {
> +			test_case->test_type = TEST_TYPE_CPU_MEM_COPY;
> +			test_case->test_type_str = CPU_MEM_COPY;
> +			is_dma = false;
> +		} else {
> +			printf("Error: Wrong test case type %s in case%d.\n",
> case_type, i + 1);
> +			test_case->is_valid = false;
> +			continue;
> +		}
> +
> +		test_case->src_numa_node =
> (int)atoi(rte_cfgfile_get_entry(cfgfile,
> +								section_name,
> "src_numa_node"));
> +		test_case->dst_numa_node =
> (int)atoi(rte_cfgfile_get_entry(cfgfile,
> +								section_name,
> "dst_numa_node"));
> +		nb_vp = 0;
> +		mem_size_str = rte_cfgfile_get_entry(cfgfile, section_name,
> "mem_size");
> +		args_nr = parse_entry(mem_size_str, &test_case-
> >mem_size);
> +		if (args_nr < 0) {
> +			printf("parse error in case %d.\n", i + 1);
> +			test_case->is_valid = false;
> +			continue;
> +		} else if (args_nr == 4)
> +			nb_vp++;
> +
> +		buf_size_str = rte_cfgfile_get_entry(cfgfile, section_name,
> "buf_size");
> +		args_nr = parse_entry(buf_size_str, &test_case->buf_size);
> +		if (args_nr < 0) {
> +			printf("parse error in case %d.\n", i + 1);
> +			test_case->is_valid = false;
> +			continue;
> +		} else if (args_nr == 4)
> +			nb_vp++;
> +
> +		if (is_dma) {
> +			ring_size_str = rte_cfgfile_get_entry(cfgfile,
> section_name,
> +
> 	"dma_ring_size");
> +			args_nr = parse_entry(ring_size_str, &test_case-
> >ring_size);
> +			if (args_nr < 0) {
> +				printf("parse error in case %d.\n", i + 1);
> +				test_case->is_valid = false;
> +				continue;
> +			} else if (args_nr == 4)
> +				nb_vp++;
> +
> +			kick_batch_str = rte_cfgfile_get_entry(cfgfile,
> section_name, "kick_batch");
> +			args_nr = parse_entry(kick_batch_str, &test_case-
> >kick_batch);
> +			if (args_nr < 0) {
> +				printf("parse error in case %d.\n", i + 1);
> +				test_case->is_valid = false;
> +				continue;
> +			} else if (args_nr == 4)
> +				nb_vp++;
> +
> +			lcore_dma = rte_cfgfile_get_entry(cfgfile,
> section_name, "lcore_dma");
> +			int lcore_ret = parse_lcore_dma(test_case,
> lcore_dma);
> +			if (lcore_ret < 0) {
> +				printf("parse lcore dma error in case %d.\n",
> i + 1);
> +				test_case->is_valid = false;
> +				continue;
> +			}
> +		} else {
> +			lcore_dma = rte_cfgfile_get_entry(cfgfile,
> section_name, "lcore");
> +			int lcore_ret = parse_lcore(test_case, lcore_dma);
> +			if (lcore_ret < 0) {
> +				printf("parse lcore error in case %d.\n", i + 1);
> +				test_case->is_valid = false;
> +				continue;
> +			}
> +		}
> +
> +		if (nb_vp > 1) {
> +			printf("Case %d error, each section can only have a
> single variable parameter.\n",
> +					i + 1);
> +			test_case->is_valid = false;
> +			continue;
> +		}
> +
> +		test_case->cache_flush =
> +			(uint8_t)atoi(rte_cfgfile_get_entry(cfgfile,
> section_name, "cache_flush"));
> +		test_case->test_secs =
> (uint16_t)atoi(rte_cfgfile_get_entry(cfgfile,
> +					section_name, "test_seconds"));
> +
> +		test_case->eal_args = rte_cfgfile_get_entry(cfgfile,
> section_name, "eal_args");
> +		test_case->is_valid = true;
> +	}
> +
> +	rte_cfgfile_close(cfgfile);
> +	printf("config file parsing complete.\n\n");
> +	return i;
> +}
> +
> +/* Parse the argument given in the command line of the application */
> +static int append_eal_args(int argc, char **argv, const char *eal_args,
> +char **new_argv) {
> +	int i;
> +	char *tokens[MAX_EAL_PARAM_NB];
> +	char args[MAX_EAL_PARAM_LEN] = {0};
> +	int token_nb, new_argc = 0;
> +
> +	for (i = 0; i < argc; i++) {
> +		if ((strcmp(argv[i], CMDLINE_CONFIG_ARG) == 0) ||
> +				(strcmp(argv[i], CMDLINE_RESULT_ARG) == 0))
> {
> +			i++;
> +			continue;
> +		}
> +		strlcpy(new_argv[new_argc], argv[i], MAX_EAL_PARAM_LEN);
> +		new_argc++;
> +	}
> +
> +	if (eal_args) {
> +		strlcpy(args, eal_args, MAX_EAL_PARAM_LEN);
> +		token_nb = rte_strsplit(args, strlen(args),
> +					tokens, MAX_EAL_PARAM_NB, ' ');
> +		for (i = 0; i < token_nb; i++)
> +			strlcpy(new_argv[new_argc++], tokens[i],
> MAX_EAL_PARAM_LEN);
> +	}
> +
> +	return new_argc;
> +}
> +
> +int
> +main(int argc, char *argv[])
> +{
> +	int ret;
> +	uint16_t case_nb;
> +	uint32_t i, nb_lcores;
> +	pid_t cpid, wpid;
> +	int wstatus;
> +	char args[MAX_EAL_PARAM_NB][MAX_EAL_PARAM_LEN];
> +	char *pargs[MAX_EAL_PARAM_NB];
> +	char *cfg_path_ptr = NULL;
> +	char *rst_path_ptr = NULL;
> +	char rst_path[PATH_MAX];
> +	int new_argc;
> +
> +	memset(args, 0, sizeof(args));
> +
> +	for (i = 0; i < RTE_DIM(pargs); i++)
> +		pargs[i] = args[i];
> +
> +	for (i = 0; i < (uint32_t)argc; i++) {
> +		if (strncmp(argv[i], CMDLINE_CONFIG_ARG,
> MAX_LONG_OPT_SZ) == 0)
> +			cfg_path_ptr = argv[i + 1];
> +		if (strncmp(argv[i], CMDLINE_RESULT_ARG,
> MAX_LONG_OPT_SZ) == 0)
> +			rst_path_ptr = argv[i + 1];
> +	}
> +	if (cfg_path_ptr == NULL) {
> +		printf("Config file not assigned.\n");
> +		return -1;
> +	}
> +	if (rst_path_ptr == NULL) {
> +		strlcpy(rst_path, cfg_path_ptr, PATH_MAX);
> +		char *token = strtok(basename(rst_path), ".");
> +		if (token == NULL) {
> +			printf("Config file error.\n");
> +			return -1;
> +		}
> +		strcat(token, "_result.csv");
> +		rst_path_ptr = rst_path;
> +	}
> +
> +	case_nb = load_configs(cfg_path_ptr);
> +	fd = fopen(rst_path_ptr, "w");
> +	if (fd == NULL) {
> +		printf("Open output CSV file error.\n");
> +		return -1;
> +	}
> +	fclose(fd);
> +
> +	printf("Running cases...\n");
> +	for (i = 0; i < case_nb; i++) {
> +		if (!test_cases[i].is_valid) {
> +			printf("Invalid test case %d.\n\n", i + 1);
> +			snprintf(output_str[0], MAX_OUTPUT_STR_LEN,
> "Invalid case %d\n", i +
> +1);
> +
> +			fd = fopen(rst_path_ptr, "a");
> +			if (!fd) {
> +				printf("Open output CSV file error.\n");
> +				return 0;
> +			}
> +			output_csv(true);
> +			fclose(fd);
> +			continue;
> +		}
> +
> +		if (test_cases[i].test_type == TEST_TYPE_NONE) {
> +			printf("No valid test type in test case %d.\n\n", i + 1);
> +			snprintf(output_str[0], MAX_OUTPUT_STR_LEN,
> "Invalid case %d\n", i +
> +1);
> +
> +			fd = fopen(rst_path_ptr, "a");
> +			if (!fd) {
> +				printf("Open output CSV file error.\n");
> +				return 0;
> +			}
> +			output_csv(true);
> +			fclose(fd);
> +			continue;
> +		}
> +
> +		cpid = fork();
> +		if (cpid < 0) {
> +			printf("Fork case %d failed.\n", i + 1);
> +			exit(EXIT_FAILURE);
> +		} else if (cpid == 0) {
> +			printf("\nRunning case %u\n\n", i + 1);
> +
> +			new_argc = append_eal_args(argc, argv,
> test_cases[i].eal_args, pargs);
> +			ret = rte_eal_init(new_argc, pargs);
> +			if (ret < 0)
> +				rte_exit(EXIT_FAILURE, "Invalid EAL
> arguments\n");
> +
> +			/* Check lcores. */
> +			nb_lcores = rte_lcore_count();
> +			if (nb_lcores < 2)
> +				rte_exit(EXIT_FAILURE,
> +					"There should be at least 2 worker
> lcores.\n");
> +
> +			fd = fopen(rst_path_ptr, "a");
> +			if (!fd) {
> +				printf("Open output CSV file error.\n");
> +				return 0;
> +			}
> +
> +			output_env_info();
> +
> +			run_test(i + 1, &test_cases[i]);
> +
> +			/* clean up the EAL */
> +			rte_eal_cleanup();
> +
> +			fclose(fd);
> +
> +			printf("\nCase %u completed.\n\n", i + 1);
> +
> +			exit(EXIT_SUCCESS);
> +		} else {
> +			wpid = waitpid(cpid, &wstatus, 0);
> +			if (wpid == -1) {
> +				printf("waitpid error.\n");
> +				exit(EXIT_FAILURE);
> +			}
> +
> +			if (WIFEXITED(wstatus))
> +				printf("Case process exited. status %d\n\n",
> +					WEXITSTATUS(wstatus));
> +			else if (WIFSIGNALED(wstatus))
> +				printf("Case process killed by signal %d\n\n",
> +					WTERMSIG(wstatus));
> +			else if (WIFSTOPPED(wstatus))
> +				printf("Case process stopped by
> signal %d\n\n",
> +					WSTOPSIG(wstatus));
> +			else if (WIFCONTINUED(wstatus))
> +				printf("Case process continued.\n\n");
> +			else
> +				printf("Case process unknown
> terminated.\n\n");
> +		}
> +	}
> +
> +	printf("Bye...\n");
> +	return 0;
> +}
> +
> diff --git a/app/test-dma-perf/main.h b/app/test-dma-perf/main.h new file
> mode 100644 index 0000000000..12bc3f4e3f
> --- /dev/null
> +++ b/app/test-dma-perf/main.h
> @@ -0,0 +1,64 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2023 Intel Corporation
> + */
> +
> +#ifndef _MAIN_H_
> +#define _MAIN_H_
> +
> +
> +#include <rte_common.h>
> +#include <rte_cycles.h>
> +#include <rte_dev.h>
> +
> +#define MAX_WORKER_NB 128
> +#define MAX_OUTPUT_STR_LEN 512
> +
> +#define MAX_DMA_NB 128
> +#define MAX_LCORE_NB 256
> +
> +extern char output_str[MAX_WORKER_NB + 1][MAX_OUTPUT_STR_LEN];
> +
> +typedef enum {
> +	OP_NONE = 0,
> +	OP_ADD,
> +	OP_MUL
> +} alg_op_type;
> +
> +struct test_configure_entry {
> +	uint32_t first;
> +	uint32_t last;
> +	uint32_t incr;
> +	alg_op_type op;
> +	uint32_t cur;
> +};
> +
> +struct lcore_dma_map_t {
> +	uint32_t lcores[MAX_WORKER_NB];
> +	char dma_names[MAX_WORKER_NB][RTE_DEV_NAME_MAX_LEN];
> +	int16_t dma_ids[MAX_WORKER_NB];
> +	uint16_t cnt;
> +};
> +
> +struct test_configure {
> +	bool is_valid;
> +	uint8_t test_type;
> +	const char *test_type_str;
> +	uint16_t src_numa_node;
> +	uint16_t dst_numa_node;
> +	uint16_t opcode;
> +	bool is_dma;
> +	struct lcore_dma_map_t lcore_dma_map;
> +	struct test_configure_entry mem_size;
> +	struct test_configure_entry buf_size;
> +	struct test_configure_entry ring_size;
> +	struct test_configure_entry kick_batch;
> +	uint8_t cache_flush;
> +	uint32_t nr_buf;
> +	uint16_t test_secs;
> +	const char *eal_args;
> +	uint8_t scenario_id;
> +};
> +
> +void mem_copy_benchmark(struct test_configure *cfg, bool is_dma);
> +
> +#endif /* _MAIN_H_ */
> diff --git a/app/test-dma-perf/meson.build b/app/test-dma-
> perf/meson.build new file mode 100644 index 0000000000..bd6c264002
> --- /dev/null
> +++ b/app/test-dma-perf/meson.build
> @@ -0,0 +1,17 @@
> +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2019-2023 Intel
> +Corporation
> +
> +# meson file, for building this app as part of a main DPDK build.
> +
> +if is_windows
> +    build = false
> +    reason = 'not supported on Windows'
> +    subdir_done()
> +endif
> +
> +deps += ['dmadev', 'mbuf', 'cfgfile']
> +
> +sources = files(
> +        'main.c',
> +        'benchmark.c',
> +)
> diff --git a/doc/guides/rel_notes/release_23_07.rst
> b/doc/guides/rel_notes/release_23_07.rst
> index 4459144140..796cc5517d 100644
> --- a/doc/guides/rel_notes/release_23_07.rst
> +++ b/doc/guides/rel_notes/release_23_07.rst
> @@ -200,6 +200,12 @@ New Features
> 
>    Enhanced the GRO library to support TCP packets over IPv6 network.
> 
> +* **Added DMA device performance test application.**
> +
> +  Added an new application to test the performance of DMA device and CPU.
> +
> +  See the :doc:`../tools/dmaperf` for more details.
> +
> 
>  Removed Items
>  -------------
> diff --git a/doc/guides/tools/dmaperf.rst b/doc/guides/tools/dmaperf.rst
> new file mode 100644 index 0000000000..c5f8a9406f
> --- /dev/null
> +++ b/doc/guides/tools/dmaperf.rst
> @@ -0,0 +1,103 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2023 Intel Corporation.
> +
> +dpdk-test-dma-perf Application
> +==============================
> +
> +The ``dpdk-test-dma-perf`` tool is a Data Plane Development Kit (DPDK)
> +application that enables testing the performance of DMA (Direct Memory
> +Access) devices available within DPDK. It provides a test framework to
> +assess the performance of CPU and DMA devices under various scenarios,
> +such as varying buffer lengths. Doing so provides insight into the
> +potential performance when using these DMA devices for acceleration in
> +DPDK applications. It supports memory copy performance tests for now,
> comparing the performance of CPU and DMA automatically in various
> conditions with the help of a pre-set configuration file.
> +
> +
> +Configuration
> +-------------
> +This application uses inherent DPDK EAL command-line options as well as
> +custom command-line options in the application. An example
> +configuration file for the application is provided and gives the meanings for
> each parameter.
> +
> +Here is an extracted sample from the configuration file (the complete
> +sample can be found in the application source directory):
> +
> +.. code-block:: ini
> +
> +   [case1]
> +   type=DMA_MEM_COPY
> +   mem_size=10
> +   buf_size=64,8192,2,MUL
> +   dma_ring_size=1024
> +   kick_batch=32
> +   src_numa_node=0
> +   dst_numa_node=0
> +   cache_flush=0
> +   test_seconds=2
> +   lcore_dma=lcore10@0000:00:04.2, lcore11@0000:00:04.3
> +   eal_args=--in-memory --file-prefix=test
> +
> +   [case2]
> +   type=CPU_MEM_COPY
> +   mem_size=10
> +   buf_size=64,8192,2,MUL
> +   src_numa_node=0
> +   dst_numa_node=1
> +   cache_flush=0
> +   test_seconds=2
> +   lcore = 3, 4
> +   eal_args=--in-memory --no-pci
> +
> +The configuration file is divided into multiple sections, each section
> represents a test case.
> +The four variables mem_size, buf_size, dma_ring_size, and kick_batch can
> vary in each test case.
> +The format for this is ``variable=first,last,increment,ADD\|MUL``. This
> +means that the first value of the variable is 'first', the last value
> +is 'last', 'increment' is the step size, and ADD|MUL indicates whether
> +the change is by addition or multiplication. Each case can only have one
> variable change, and each change will generate a scenario, so each case can
> have multiple scenarios.
> +
> +Parameter Definitions
> +---------------------
> +
> +- **type**: The type of the test. Currently supported types are
> `DMA_MEM_COPY` and `CPU_MEM_COPY`.
> +- **mem_size**: The size of the memory footprint.
> +- **buf_size**: The memory size of a single operation.
> +- **dma_ring_size**: The DMA ring buffer size. Must be a power of two,
> and between 64 and 4096.
> +- **kick_batch**: The DMA operation batch size, should be greater than 1
> normally.
> +- **src_numa_node**: Controls the NUMA node where the source memory
> is allocated.
> +- **dst_numa_node**: Controls the NUMA node where the destination
> memory is allocated.
> +- **cache_flush**: Determines whether the cache should be flushed. `1`
> indicates to flush and `0` to not flush.
> +- **test_seconds**: Controls the test time for each scenario.
> +- **lcore_dma**: Specifies the lcore/DMA mapping.
> +- **lcore**: Specifies the lcore for CPU testing.
> +- **eal_args**: Specifies the EAL arguments.
> +
> +.. Note::
> +
> +	The mapping of lcore to DMA must be one-to-one and cannot be
> duplicated.
> +
> +To specify a configuration file, use the "\-\-config" flag followed by the path
> to the file.
> +
> +To specify a result file, use the "\-\-result" flag followed by the
> +path to the file. If you do not specify a result file, one will be
> +generated with the same name as the configuration file, with the addition
> of "_result.csv" at the end.
> +
> +
> +Running the Application
> +-----------------------
> +
> +Typical command-line invocation to execute the application:
> +
> +.. code-block:: console
> +
> +   dpdk-test-dma-perf --config=./config_dma.ini --result=./res_dma.csv
> +
> +Where `config_dma.ini` is the configuration file, and `res_dma.csv`
> +will be the generated result file.
> +
> +After the tests, you can find the results in the `res_dma.csv` file.
> +
> +Limitations
> +-----------
> +
> +Currently, this tool only supports memory copy performance tests.
> +Additional enhancements are possible in the future to support more types
> of tests for DMA devices and CPUs.
> diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst index
> 6f84fc31ff..857572da96 100644
> --- a/doc/guides/tools/index.rst
> +++ b/doc/guides/tools/index.rst
> @@ -23,3 +23,4 @@ DPDK Tools User Guides
>      testregex
>      testmldev
>      dts
> +    dmaperf
> --
> 2.40.1
> 
> 
> 
> End of dev Digest, Vol 462, Issue 27
> ************************************

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-06-28 23:38 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.14854.1687917486.886.dev@dpdk.org>
2023-06-28 23:38 ` [v10] app/dma-perf: introduce dma-perf application Zhang, Yuying

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).