Hi Cheng, LGTM. > -----Original Message----- > Date: Wed, 28 Jun 2023 01:20:34 +0000 > From: Cheng Jiang > > To: thomas@monjalon.net, bruce.richardson@intel.com, > mb@smartsharesystems.com, chenbo.xia@intel.com, > amitprakashs@marvell.com, anoobj@marvell.com, > huangdengdui@huawei.com, > kevin.laatz@intel.com, fengchengwen@huawei.com, jerinj@marvell.com > Cc: dev@dpdk.org, jiayu.hu@intel.com, xuan.ding@intel.com, > wenwux.ma@intel.com, yuanx.wang@intel.com, xingguang.he@intel.com, > weix.ling@intel.com, Cheng Jiang > > Subject: [PATCH v10] app/dma-perf: introduce dma-perf application > Message-ID: <20230628012034.49016-1-cheng1.jiang@intel.com> > Content-Type: text/plain; charset=UTF-8 > > There are many high-performance DMA devices supported in DPDK now, and > these DMA devices can also be integrated into other modules of DPDK as > accelerators, such as Vhost. Before integrating DMA into applications, > developers need to know the performance of these DMA devices in > various scenarios and the performance of CPUs in the same scenario, > such as different buffer lengths. Only in this way can we know the > target performance of the application accelerated by using them. This > patch introduces a high-performance testing tool, which supports > comparing the performance of CPU and DMA in different scenarios > automatically with a pre-set config file. Memory Copy performance test are supported for now. > > Signed-off-by: Cheng Jiang > > Signed-off-by: Jiayu Hu > > Signed-off-by: Yuan Wang > > Acked-by: Morten Br?rup > > Acked-by: Chenbo Xia > Acked-by: Yuying Zhang > > --- > v10: > rebased code from 23.07-rc2; > v9: > improved error handling; > improved lcore_params structure; > improved mbuf api calling; > improved exit process; > fixed some typos; > added scenario summary data display; > removed unnecessary include; > v8: > fixed string copy issue in parse_lcore(); > improved some data display format; > added doc in doc/guides/tools; > updated release notes; > v7: > fixed some strcpy issues; > removed cache setup in calling rte_pktmbuf_pool_create(); > fixed some typos; > added some memory free and null set operations; > improved result calculation; > v6: > improved code based on Anoob's comments; > fixed some code structure issues; > v5: > fixed some LONG_LINE warnings; > v4: > fixed inaccuracy of the memory footprint display; > v3: > fixed some typos; > v2: > added lcore/dmadev designation; > added error case process; > removed worker_threads parameter from config.ini; > improved the logs; > improved config file; > > app/meson.build | 1 + > app/test-dma-perf/benchmark.c | 508 ++++++++++++++++++++ > app/test-dma-perf/config.ini | 61 +++ > app/test-dma-perf/main.c | 616 +++++++++++++++++++++++++ > app/test-dma-perf/main.h | 64 +++ > app/test-dma-perf/meson.build | 17 + > doc/guides/rel_notes/release_23_07.rst | 6 + > doc/guides/tools/dmaperf.rst | 103 +++++ > doc/guides/tools/index.rst | 1 + > 9 files changed, 1377 insertions(+) > create mode 100644 app/test-dma-perf/benchmark.c create mode 100644 > app/test-dma-perf/config.ini create mode 100644 > app/test-dma-perf/main.c create mode 100644 app/test-dma-perf/main.h > create mode 100644 app/test-dma-perf/meson.build create mode 100644 > doc/guides/tools/dmaperf.rst > > diff --git a/app/meson.build b/app/meson.build index > 74d2420f67..4fc1a83eba 100644 > --- a/app/meson.build > +++ b/app/meson.build > @@ -19,6 +19,7 @@ apps = [ > 'test-cmdline', > 'test-compress-perf', > 'test-crypto-perf', > + 'test-dma-perf', > 'test-eventdev', > 'test-fib', > 'test-flow-perf', > diff --git a/app/test-dma-perf/benchmark.c b/app/test-dma- > perf/benchmark.c new file mode 100644 index 0000000000..0601e0d171 > --- /dev/null > +++ b/app/test-dma-perf/benchmark.c > @@ -0,0 +1,508 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2023 Intel Corporation */ > + > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > + > +#include "main.h" > + > +#define MAX_DMA_CPL_NB 255 > + > +#define TEST_WAIT_U_SECOND 10000 > +#define POLL_MAX 1000 > + > +#define CSV_LINE_DMA_FMT "Scenario %u,%u,%s,%u,%u,%u,%u,%.2lf,%" > PRIu64 ",%.3lf,%.3lf\n" > +#define CSV_LINE_CPU_FMT "Scenario %u,%u,NA,NA,NA,%u,%u,%.2lf,%" > PRIu64 ",%.3lf,%.3lf\n" > + > +#define CSV_TOTAL_LINE_FMT "Scenario %u > Summary, , , , , ,%u,%.2lf,%u,%.3lf,%.3lf\n" > + > +struct worker_info { > + bool ready_flag; > + bool start_flag; > + bool stop_flag; > + uint32_t total_cpl; > + uint32_t test_cpl; > +}; > + > +struct lcore_params { > + uint8_t scenario_id; > + unsigned int lcore_id; > + char *dma_name; > + uint16_t worker_id; > + uint16_t dev_id; > + uint32_t nr_buf; > + uint16_t kick_batch; > + uint32_t buf_size; > + uint16_t test_secs; > + struct rte_mbuf **srcs; > + struct rte_mbuf **dsts; > + volatile struct worker_info worker_info; }; > + > +static struct rte_mempool *src_pool; > +static struct rte_mempool *dst_pool; > + > +static struct lcore_params *lcores[MAX_WORKER_NB]; > + > +#define PRINT_ERR(...) print_err(__func__, __LINE__, __VA_ARGS__) > + > +static inline int > +__rte_format_printf(3, 4) > +print_err(const char *func, int lineno, const char *format, ...) { > + va_list ap; > + int ret; > + > + ret = fprintf(stderr, "In %s:%d - ", func, lineno); > + va_start(ap, format); > + ret += vfprintf(stderr, format, ap); > + va_end(ap); > + > + return ret; > +} > + > +static inline void > +calc_result(uint32_t buf_size, uint32_t nr_buf, uint16_t nb_workers, > uint16_t test_secs, > + uint32_t total_cnt, float *memory, uint32_t > *ave_cycle, > + float *bandwidth, float *mops) > +{ > + float ops; > + > + *memory = (float)(buf_size * (nr_buf / nb_workers) * 2) / (1024 * > 1024); > + *ave_cycle = test_secs * rte_get_timer_hz() / total_cnt; > + ops = (float)total_cnt / test_secs; > + *mops = ops / (1000 * 1000); > + *bandwidth = (ops * buf_size * 8) / (1000 * 1000 * 1000); } > + > +static void > +output_result(uint8_t scenario_id, uint32_t lcore_id, char *dma_name, > uint16_t ring_size, > + uint16_t kick_batch, uint64_t ave_cycle, uint32_t > buf_size, uint32_t nr_buf, > + float memory, float bandwidth, float mops, bool > is_dma) { > + if (is_dma) > + printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch > Size: %u.\n", > + lcore_id, dma_name, ring_size, kick_batch); > + else > + printf("lcore %u\n", lcore_id); > + > + printf("Average Cycles/op: %" PRIu64 ", Buffer Size: %u B, Buffer > Number: %u, Memory: %.2lf MB, Frequency: %.3lf Ghz.\n", > + ave_cycle, buf_size, nr_buf, memory, > rte_get_timer_hz()/1000000000.0); > + printf("Average Bandwidth: %.3lf Gbps, MOps: %.3lf\n", bandwidth, > +mops); > + > + if (is_dma) > + snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN, > CSV_LINE_DMA_FMT, > + scenario_id, lcore_id, dma_name, ring_size, > kick_batch, buf_size, > + nr_buf, memory, ave_cycle, bandwidth, mops); > + else > + snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN, > CSV_LINE_CPU_FMT, > + scenario_id, lcore_id, buf_size, > + nr_buf, memory, ave_cycle, bandwidth, mops); } > + > +static inline void > +cache_flush_buf(__rte_unused struct rte_mbuf **array, > + __rte_unused uint32_t buf_size, > + __rte_unused uint32_t nr_buf) > +{ > +#ifdef RTE_ARCH_X86_64 > + char *data; > + struct rte_mbuf **srcs = array; > + uint32_t i, offset; > + > + for (i = 0; i < nr_buf; i++) { > + data = rte_pktmbuf_mtod(srcs[i], char *); > + for (offset = 0; offset < buf_size; offset += 64) > + __builtin_ia32_clflush(data + offset); > + } > +#endif > +} > + > +/* Configuration of device. */ > +static void > +configure_dmadev_queue(uint32_t dev_id, uint32_t ring_size) { > + uint16_t vchan = 0; > + struct rte_dma_info info; > + struct rte_dma_conf dev_config = { .nb_vchans = 1 }; > + struct rte_dma_vchan_conf qconf = { > + .direction = RTE_DMA_DIR_MEM_TO_MEM, > + .nb_desc = ring_size > + }; > + > + if (rte_dma_configure(dev_id, &dev_config) != 0) > + rte_exit(EXIT_FAILURE, "Error with dma configure.\n"); > + > + if (rte_dma_vchan_setup(dev_id, vchan, &qconf) != 0) > + rte_exit(EXIT_FAILURE, "Error with queue configuration.\n"); > + > + if (rte_dma_info_get(dev_id, &info) != 0) > + rte_exit(EXIT_FAILURE, "Error with getting device info.\n"); > + > + if (info.nb_vchans != 1) > + rte_exit(EXIT_FAILURE, "Error, no configured queues > reported on device id. %u\n", > + dev_id); > + > + if (rte_dma_start(dev_id) != 0) > + rte_exit(EXIT_FAILURE, "Error with dma start.\n"); } > + > +static int > +config_dmadevs(struct test_configure *cfg) { > + uint32_t ring_size = cfg->ring_size.cur; > + struct lcore_dma_map_t *ldm = &cfg->lcore_dma_map; > + uint32_t nb_workers = ldm->cnt; > + uint32_t i; > + int dev_id; > + uint16_t nb_dmadevs = 0; > + char *dma_name; > + > + for (i = 0; i < ldm->cnt; i++) { > + dma_name = ldm->dma_names[i]; > + dev_id = rte_dma_get_dev_id_by_name(dma_name); > + if (dev_id < 0) { > + fprintf(stderr, "Error: Fail to find DMA %s.\n", > dma_name); > + goto end; > + } > + > + ldm->dma_ids[i] = dev_id; > + configure_dmadev_queue(dev_id, ring_size); > + ++nb_dmadevs; > + } > + > +end: > + if (nb_dmadevs < nb_workers) { > + printf("Not enough dmadevs (%u) for all workers (%u).\n", > nb_dmadevs, nb_workers); > + return -1; > + } > + > + printf("Number of used dmadevs: %u.\n", nb_dmadevs); > + > + return 0; > +} > + > +static void > +error_exit(int dev_id) > +{ > + rte_dma_stop(dev_id); > + rte_dma_close(dev_id); > + rte_exit(EXIT_FAILURE, "DMA error\n"); } > + > +static inline void > +do_dma_submit_and_poll(uint16_t dev_id, uint64_t *async_cnt, > + volatile struct worker_info *worker_info) { > + int ret; > + uint16_t nr_cpl; > + > + ret = rte_dma_submit(dev_id, 0); > + if (ret < 0) > + error_exit(dev_id); > + > + nr_cpl = rte_dma_completed(dev_id, 0, MAX_DMA_CPL_NB, NULL, > NULL); > + *async_cnt -= nr_cpl; > + worker_info->total_cpl += nr_cpl; > +} > + > +static inline int > +do_dma_mem_copy(void *p) > +{ > + struct lcore_params *para = (struct lcore_params *)p; > + volatile struct worker_info *worker_info = &(para->worker_info); > + const uint16_t dev_id = para->dev_id; > + const uint32_t nr_buf = para->nr_buf; > + const uint16_t kick_batch = para->kick_batch; > + const uint32_t buf_size = para->buf_size; > + struct rte_mbuf **srcs = para->srcs; > + struct rte_mbuf **dsts = para->dsts; > + uint16_t nr_cpl; > + uint64_t async_cnt = 0; > + uint32_t i; > + uint32_t poll_cnt = 0; > + int ret; > + > + worker_info->stop_flag = false; > + worker_info->ready_flag = true; > + > + while (!worker_info->start_flag) > + ; > + > + while (1) { > + for (i = 0; i < nr_buf; i++) { > +dma_copy: > + ret = rte_dma_copy(dev_id, 0, > rte_mbuf_data_iova(srcs[i]), > + rte_mbuf_data_iova(dsts[i]), buf_size, 0); > + if (unlikely(ret < 0)) { > + if (ret == -ENOSPC) { > + do_dma_submit_and_poll(dev_id, > &async_cnt, worker_info); > + goto dma_copy; > + } else > + error_exit(dev_id); > + } > + async_cnt++; > + > + if ((async_cnt % kick_batch) == 0) > + do_dma_submit_and_poll(dev_id, > &async_cnt, worker_info); > + } > + > + if (worker_info->stop_flag) > + break; > + } > + > + rte_dma_submit(dev_id, 0); > + while ((async_cnt > 0) && (poll_cnt++ < POLL_MAX)) { > + nr_cpl = rte_dma_completed(dev_id, 0, MAX_DMA_CPL_NB, > NULL, NULL); > + async_cnt -= nr_cpl; > + } > + > + return 0; > +} > + > +static inline int > +do_cpu_mem_copy(void *p) > +{ > + struct lcore_params *para = (struct lcore_params *)p; > + volatile struct worker_info *worker_info = &(para->worker_info); > + const uint32_t nr_buf = para->nr_buf; > + const uint32_t buf_size = para->buf_size; > + struct rte_mbuf **srcs = para->srcs; > + struct rte_mbuf **dsts = para->dsts; > + uint32_t i; > + > + worker_info->stop_flag = false; > + worker_info->ready_flag = true; > + > + while (!worker_info->start_flag) > + ; > + > + while (1) { > + for (i = 0; i < nr_buf; i++) { > + /* copy buffer form src to dst */ > + rte_memcpy((void > *)(uintptr_t)rte_mbuf_data_iova(dsts[i]), > + (void > *)(uintptr_t)rte_mbuf_data_iova(srcs[i]), > + (size_t)buf_size); > + worker_info->total_cpl++; > + } > + if (worker_info->stop_flag) > + break; > + } > + > + return 0; > +} > + > +static int > +setup_memory_env(struct test_configure *cfg, struct rte_mbuf ***srcs, > + struct rte_mbuf ***dsts) > +{ > + unsigned int buf_size = cfg->buf_size.cur; > + unsigned int nr_sockets; > + uint32_t nr_buf = cfg->nr_buf; > + > + nr_sockets = rte_socket_count(); > + if (cfg->src_numa_node >= nr_sockets || > + cfg->dst_numa_node >= nr_sockets) { > + printf("Error: Source or destination numa exceeds the acture > numa nodes.\n"); > + return -1; > + } > + > + src_pool = rte_pktmbuf_pool_create("Benchmark_DMA_SRC", > + nr_buf, > + 0, > + 0, > + buf_size + RTE_PKTMBUF_HEADROOM, > + cfg->src_numa_node); > + if (src_pool == NULL) { > + PRINT_ERR("Error with source mempool creation.\n"); > + return -1; > + } > + > + dst_pool = rte_pktmbuf_pool_create("Benchmark_DMA_DST", > + nr_buf, > + 0, > + 0, > + buf_size + RTE_PKTMBUF_HEADROOM, > + cfg->dst_numa_node); > + if (dst_pool == NULL) { > + PRINT_ERR("Error with destination mempool creation.\n"); > + return -1; > + } > + > + *srcs = rte_malloc(NULL, nr_buf * sizeof(struct rte_mbuf *), 0); > + if (*srcs == NULL) { > + printf("Error: srcs malloc failed.\n"); > + return -1; > + } > + > + *dsts = rte_malloc(NULL, nr_buf * sizeof(struct rte_mbuf *), 0); > + if (*dsts == NULL) { > + printf("Error: dsts malloc failed.\n"); > + return -1; > + } > + > + if (rte_pktmbuf_alloc_bulk(src_pool, *srcs, nr_buf) != 0) { > + printf("alloc src mbufs failed.\n"); > + return -1; > + } > + > + if (rte_pktmbuf_alloc_bulk(dst_pool, *dsts, nr_buf) != 0) { > + printf("alloc dst mbufs failed.\n"); > + return -1; > + } > + > + return 0; > +} > + > +void > +mem_copy_benchmark(struct test_configure *cfg, bool is_dma) { > + uint16_t i; > + uint32_t offset; > + unsigned int lcore_id = 0; > + struct rte_mbuf **srcs = NULL, **dsts = NULL; > + struct lcore_dma_map_t *ldm = &cfg->lcore_dma_map; > + unsigned int buf_size = cfg->buf_size.cur; > + uint16_t kick_batch = cfg->kick_batch.cur; > + uint32_t nr_buf = cfg->nr_buf = (cfg->mem_size.cur * 1024 * 1024) / > (cfg->buf_size.cur * 2); > + uint16_t nb_workers = ldm->cnt; > + uint16_t test_secs = cfg->test_secs; > + float memory = 0; > + uint32_t avg_cycles = 0; > + uint32_t avg_cycles_total; > + float mops, mops_total; > + float bandwidth, bandwidth_total; > + > + if (setup_memory_env(cfg, &srcs, &dsts) < 0) > + goto out; > + > + if (is_dma) > + if (config_dmadevs(cfg) < 0) > + goto out; > + > + if (cfg->cache_flush == 1) { > + cache_flush_buf(srcs, buf_size, nr_buf); > + cache_flush_buf(dsts, buf_size, nr_buf); > + rte_mb(); > + } > + > + printf("Start testing....\n"); > + > + for (i = 0; i < nb_workers; i++) { > + lcore_id = ldm->lcores[i]; > + offset = nr_buf / nb_workers * i; > + lcores[i] = rte_malloc(NULL, sizeof(struct lcore_params), 0); > + if (lcores[i] == NULL) { > + printf("lcore parameters malloc failure for > lcore %d\n", lcore_id); > + break; > + } > + if (is_dma) { > + lcores[i]->dma_name = ldm->dma_names[i]; > + lcores[i]->dev_id = ldm->dma_ids[i]; > + lcores[i]->kick_batch = kick_batch; > + } > + lcores[i]->worker_id = i; > + lcores[i]->nr_buf = (uint32_t)(nr_buf / nb_workers); > + lcores[i]->buf_size = buf_size; > + lcores[i]->test_secs = test_secs; > + lcores[i]->srcs = srcs + offset; > + lcores[i]->dsts = dsts + offset; > + lcores[i]->scenario_id = cfg->scenario_id; > + lcores[i]->lcore_id = lcore_id; > + > + if (is_dma) > + rte_eal_remote_launch(do_dma_mem_copy, (void > *)(lcores[i]), lcore_id); > + else > + rte_eal_remote_launch(do_cpu_mem_copy, (void > *)(lcores[i]), lcore_id); > + } > + > + while (1) { > + bool ready = true; > + for (i = 0; i < nb_workers; i++) { > + if (lcores[i]->worker_info.ready_flag == false) { > + ready = 0; > + break; > + } > + } > + if (ready) > + break; > + } > + > + for (i = 0; i < nb_workers; i++) > + lcores[i]->worker_info.start_flag = true; > + > + usleep(TEST_WAIT_U_SECOND); > + for (i = 0; i < nb_workers; i++) > + lcores[i]->worker_info.test_cpl = lcores[i]- > >worker_info.total_cpl; > + > + usleep(test_secs * 1000 * 1000); > + for (i = 0; i < nb_workers; i++) > + lcores[i]->worker_info.test_cpl = lcores[i]- > >worker_info.total_cpl - > + lcores[i]- > >worker_info.test_cpl; > + > + for (i = 0; i < nb_workers; i++) > + lcores[i]->worker_info.stop_flag = true; > + > + rte_eal_mp_wait_lcore(); > + > + mops_total = 0; > + bandwidth_total = 0; > + avg_cycles_total = 0; > + for (i = 0; i < nb_workers; i++) { > + calc_result(buf_size, nr_buf, nb_workers, test_secs, > + lcores[i]->worker_info.test_cpl, > + &memory, &avg_cycles, &bandwidth, &mops); > + output_result(cfg->scenario_id, lcores[i]->lcore_id, > + lcores[i]->dma_name, cfg- > >ring_size.cur, kick_batch, > + avg_cycles, buf_size, nr_buf / > nb_workers, memory, > + bandwidth, mops, is_dma); > + mops_total += mops; > + bandwidth_total += bandwidth; > + avg_cycles_total += avg_cycles; > + } > + printf("\nTotal Bandwidth: %.3lf Gbps, Total MOps: %.3lf\n", > bandwidth_total, mops_total); > + snprintf(output_str[MAX_WORKER_NB], MAX_OUTPUT_STR_LEN, > CSV_TOTAL_LINE_FMT, > + cfg->scenario_id, nr_buf, memory * nb_workers, > + avg_cycles_total / nb_workers, bandwidth_total, > mops_total); > + > +out: > + /* free mbufs used in the test */ > + if (srcs != NULL) > + rte_pktmbuf_free_bulk(srcs, nr_buf); > + if (dsts != NULL) > + rte_pktmbuf_free_bulk(dsts, nr_buf); > + > + /* free the points for the mbufs */ > + rte_free(srcs); > + srcs = NULL; > + rte_free(dsts); > + dsts = NULL; > + > + rte_mempool_free(src_pool); > + src_pool = NULL; > + > + rte_mempool_free(dst_pool); > + dst_pool = NULL; > + > + /* free the worker parameters */ > + for (i = 0; i < nb_workers; i++) { > + rte_free(lcores[i]); > + lcores[i] = NULL; > + } > + > + if (is_dma) { > + for (i = 0; i < nb_workers; i++) { > + printf("Stopping dmadev %d\n", ldm->dma_ids[i]); > + rte_dma_stop(ldm->dma_ids[i]); > + } > + } > +} > diff --git a/app/test-dma-perf/config.ini > b/app/test-dma-perf/config.ini new file mode 100644 index > 0000000000..b550f4b23f > --- /dev/null > +++ b/app/test-dma-perf/config.ini > @@ -0,0 +1,61 @@ > + > +; This is an example configuration file for dma-perf, which details > +the meanings of each parameter ; and instructions on how to use dma-perf. > + > +; Supported test types are DMA_MEM_COPY and CPU_MEM_COPY. > + > +; Parameters: > +; "mem_size" denotes the size of the memory footprint. > +; "buf_size" denotes the memory size of a single operation. > +; "dma_ring_size" denotes the dma ring buffer size. It should be must > +be a power of two, and between ; 64 and 4096. > +; "kick_batch" denotes the dma operation batch size, and should be > +greater > than 1 normally. > + > +; The format for variables is variable=first,last,increment,ADD|MUL. > + > +; src_numa_node is used to control the numa node where the source > memory is allocated. > +; dst_numa_node is used to control the numa node where the > +destination > memory is allocated. > + > +; cache_flush is used to determine whether or not the cache should be > +flushed, with 1 indicating to ; flush and 0 indicating to not flush. > + > +; test_seconds controls the test time of the whole case. > + > +; To use DMA for a test, please specify the "lcore_dma" parameter. > +; If you have already set the "-l" and "-a" parameters using EAL, ; > +make sure that the value of "lcore_dma" falls within their range of > +the > values. > +; We have to ensure a 1:1 mapping between the core and DMA device. > + > +; To use CPU for a test, please specify the "lcore" parameter. > +; If you have already set the "-l" and "-a" parameters using EAL, ; > +make sure that the value of "lcore" falls within their range of values. > + > +; To specify a configuration file, use the "--config" flag followed > +by the path > to the file. > + > +; To specify a result file, use the "--result" flag followed by the > +path to the > file. > +; If you do not specify a result file, one will be generated with the > +same name as the configuration ; file, with the addition of > +"_result.csv" at > the end. > + > +[case1] > +type=DMA_MEM_COPY > +mem_size=10 > +buf_size=64,8192,2,MUL > +dma_ring_size=1024 > +kick_batch=32 > +src_numa_node=0 > +dst_numa_node=0 > +cache_flush=0 > +test_seconds=2 > +lcore_dma=lcore10@0000:00:04.2, lcore11@0000:00:04.3 > +eal_args=--in-memory --file-prefix=test > + > +[case2] > +type=CPU_MEM_COPY > +mem_size=10 > +buf_size=64,8192,2,MUL > +src_numa_node=0 > +dst_numa_node=1 > +cache_flush=0 > +test_seconds=2 > +lcore = 3, 4 > +eal_args=--in-memory --no-pci > diff --git a/app/test-dma-perf/main.c b/app/test-dma-perf/main.c new > file mode 100644 index 0000000000..de37120df6 > --- /dev/null > +++ b/app/test-dma-perf/main.c > @@ -0,0 +1,616 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2023 Intel Corporation */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > + > +#include "main.h" > + > +#define CSV_HDR_FMT "Case %u : %s,lcore,DMA,DMA ring size,kick batch > size,buffer size(B),number of buffers,memory(MB),average > cycle,bandwidth(Gbps),MOps\n" > + > +#define MAX_EAL_PARAM_NB 100 > +#define MAX_EAL_PARAM_LEN 1024 > + > +#define DMA_MEM_COPY "DMA_MEM_COPY" > +#define CPU_MEM_COPY "CPU_MEM_COPY" > + > +#define CMDLINE_CONFIG_ARG "--config" > +#define CMDLINE_RESULT_ARG "--result" > + > +#define MAX_PARAMS_PER_ENTRY 4 > + > +#define MAX_LONG_OPT_SZ 64 > + > +enum { > + TEST_TYPE_NONE = 0, > + TEST_TYPE_DMA_MEM_COPY, > + TEST_TYPE_CPU_MEM_COPY > +}; > + > +#define MAX_TEST_CASES 16 > +static struct test_configure test_cases[MAX_TEST_CASES]; > + > +char output_str[MAX_WORKER_NB + 1][MAX_OUTPUT_STR_LEN]; > + > +static FILE *fd; > + > +static void > +output_csv(bool need_blankline) > +{ > + uint32_t i; > + > + if (need_blankline) { > + fprintf(fd, ",,,,,,,,\n"); > + fprintf(fd, ",,,,,,,,\n"); > + } > + > + for (i = 0; i < RTE_DIM(output_str); i++) { > + if (output_str[i][0]) { > + fprintf(fd, "%s", output_str[i]); > + output_str[i][0] = '\0'; > + } > + } > + > + fflush(fd); > +} > + > +static void > +output_env_info(void) > +{ > + snprintf(output_str[0], MAX_OUTPUT_STR_LEN, "Test > Environment:\n"); > + snprintf(output_str[1], MAX_OUTPUT_STR_LEN, "CPU > frequency,%.3lf Ghz", > + rte_get_timer_hz() / 1000000000.0); > + > + output_csv(true); > +} > + > +static void > +output_header(uint32_t case_id, struct test_configure *case_cfg) { > + snprintf(output_str[0], MAX_OUTPUT_STR_LEN, > + CSV_HDR_FMT, case_id, case_cfg->test_type_str); > + > + output_csv(true); > +} > + > +static void > +run_test_case(struct test_configure *case_cfg) { > + switch (case_cfg->test_type) { > + case TEST_TYPE_DMA_MEM_COPY: > + mem_copy_benchmark(case_cfg, true); > + break; > + case TEST_TYPE_CPU_MEM_COPY: > + mem_copy_benchmark(case_cfg, false); > + break; > + default: > + printf("Unknown test type. %s\n", case_cfg->test_type_str); > + break; > + } > +} > + > +static void > +run_test(uint32_t case_id, struct test_configure *case_cfg) { > + uint32_t i; > + uint32_t nb_lcores = rte_lcore_count(); > + struct test_configure_entry *mem_size = &case_cfg->mem_size; > + struct test_configure_entry *buf_size = &case_cfg->buf_size; > + struct test_configure_entry *ring_size = &case_cfg->ring_size; > + struct test_configure_entry *kick_batch = &case_cfg->kick_batch; > + struct test_configure_entry dummy = { 0 }; > + struct test_configure_entry *var_entry = &dummy; > + > + for (i = 0; i < RTE_DIM(output_str); i++) > + memset(output_str[i], 0, MAX_OUTPUT_STR_LEN); > + > + if (nb_lcores <= case_cfg->lcore_dma_map.cnt) { > + printf("Case %u: Not enough lcores.\n", case_id); > + return; > + } > + > + printf("Number of used lcores: %u.\n", nb_lcores); > + > + if (mem_size->incr != 0) > + var_entry = mem_size; > + > + if (buf_size->incr != 0) > + var_entry = buf_size; > + > + if (ring_size->incr != 0) > + var_entry = ring_size; > + > + if (kick_batch->incr != 0) > + var_entry = kick_batch; > + > + case_cfg->scenario_id = 0; > + > + output_header(case_id, case_cfg); > + > + for (var_entry->cur = var_entry->first; var_entry->cur <= var_entry- > >last;) { > + case_cfg->scenario_id++; > + printf("\nRunning scenario %d\n", case_cfg->scenario_id); > + > + run_test_case(case_cfg); > + output_csv(false); > + > + if (var_entry->op == OP_ADD) > + var_entry->cur += var_entry->incr; > + else if (var_entry->op == OP_MUL) > + var_entry->cur *= var_entry->incr; > + else { > + printf("No proper operation for variable entry.\n"); > + break; > + } > + } > +} > + > +static int > +parse_lcore(struct test_configure *test_case, const char *value) { > + uint16_t len; > + char *input; > + struct lcore_dma_map_t *lcore_dma_map; > + > + if (test_case == NULL || value == NULL) > + return -1; > + > + len = strlen(value); > + input = (char *)malloc((len + 1) * sizeof(char)); > + strlcpy(input, value, len + 1); > + lcore_dma_map = &(test_case->lcore_dma_map); > + > + memset(lcore_dma_map, 0, sizeof(struct lcore_dma_map_t)); > + > + char *token = strtok(input, ", "); > + while (token != NULL) { > + if (lcore_dma_map->cnt >= MAX_LCORE_NB) { > + free(input); > + return -1; > + } > + > + uint16_t lcore_id = atoi(token); > + lcore_dma_map->lcores[lcore_dma_map->cnt++] = lcore_id; > + > + token = strtok(NULL, ", "); > + } > + > + free(input); > + return 0; > +} > + > +static int > +parse_lcore_dma(struct test_configure *test_case, const char *value) { > + struct lcore_dma_map_t *lcore_dma_map; > + char *input, *addrs; > + char *ptrs[2]; > + char *start, *end, *substr; > + uint16_t lcore_id; > + int ret = 0; > + > + if (test_case == NULL || value == NULL) > + return -1; > + > + input = strndup(value, strlen(value) + 1); > + addrs = input; > + > + while (*addrs == '\0') > + addrs++; > + if (*addrs == '\0') { > + fprintf(stderr, "No input DMA addresses\n"); > + ret = -1; > + goto out; > + } > + > + substr = strtok(addrs, ","); > + if (substr == NULL) { > + fprintf(stderr, "No input DMA address\n"); > + ret = -1; > + goto out; > + } > + > + memset(&test_case->lcore_dma_map, 0, sizeof(struct > lcore_dma_map_t)); > + > + do { > + if (rte_strsplit(substr, strlen(substr), ptrs, 2, '@') < 0) { > + fprintf(stderr, "Illegal DMA address\n"); > + ret = -1; > + break; > + } > + > + start = strstr(ptrs[0], "lcore"); > + if (start == NULL) { > + fprintf(stderr, "Illegal lcore\n"); > + ret = -1; > + break; > + } > + > + start += 5; > + lcore_id = strtol(start, &end, 0); > + if (end == start) { > + fprintf(stderr, "No input lcore ID or ID %d is wrong\n", > lcore_id); > + ret = -1; > + break; > + } > + > + lcore_dma_map = &test_case->lcore_dma_map; > + if (lcore_dma_map->cnt >= MAX_LCORE_NB) { > + fprintf(stderr, "lcores count error\n"); > + ret = -1; > + break; > + } > + > + lcore_dma_map->lcores[lcore_dma_map->cnt] = lcore_id; > + strlcpy(lcore_dma_map->dma_names[lcore_dma_map->cnt], > ptrs[1], > + RTE_DEV_NAME_MAX_LEN); > + lcore_dma_map->cnt++; > + substr = strtok(NULL, ","); > + } while (substr != NULL); > + > +out: > + free(input); > + return ret; > +} > + > +static int > +parse_entry(const char *value, struct test_configure_entry *entry) { > + char input[255] = {0}; > + char *args[MAX_PARAMS_PER_ENTRY]; > + int args_nr = -1; > + int ret; > + > + if (value == NULL || entry == NULL) > + goto out; > + > + strncpy(input, value, 254); > + if (*input == '\0') > + goto out; > + > + ret = rte_strsplit(input, strlen(input), args, MAX_PARAMS_PER_ENTRY, > ','); > + if (ret != 1 && ret != 4) > + goto out; > + > + entry->cur = entry->first = (uint32_t)atoi(args[0]); > + > + if (ret == 4) { > + args_nr = 4; > + entry->last = (uint32_t)atoi(args[1]); > + entry->incr = (uint32_t)atoi(args[2]); > + if (!strcmp(args[3], "MUL")) > + entry->op = OP_MUL; > + else if (!strcmp(args[3], "ADD")) > + entry->op = OP_ADD; > + else { > + args_nr = -1; > + printf("Invalid op %s.\n", args[3]); > + } > + > + } else { > + args_nr = 1; > + entry->op = OP_NONE; > + entry->last = 0; > + entry->incr = 0; > + } > +out: > + return args_nr; > +} > + > +static uint16_t > +load_configs(const char *path) > +{ > + struct rte_cfgfile *cfgfile; > + int nb_sections, i; > + struct test_configure *test_case; > + char section_name[CFG_NAME_LEN]; > + const char *case_type; > + const char *lcore_dma; > + const char *mem_size_str, *buf_size_str, *ring_size_str, > *kick_batch_str; > + int args_nr, nb_vp; > + bool is_dma; > + > + printf("config file parsing...\n"); > + cfgfile = rte_cfgfile_load(path, 0); > + if (!cfgfile) { > + printf("Open configure file error.\n"); > + exit(1); > + } > + > + nb_sections = rte_cfgfile_num_sections(cfgfile, NULL, 0); > + if (nb_sections > MAX_TEST_CASES) { > + printf("Error: The maximum number of cases is %d.\n", > MAX_TEST_CASES); > + exit(1); > + } > + > + for (i = 0; i < nb_sections; i++) { > + snprintf(section_name, CFG_NAME_LEN, "case%d", i + 1); > + test_case = &test_cases[i]; > + case_type = rte_cfgfile_get_entry(cfgfile, section_name, > "type"); > + if (case_type == NULL) { > + printf("Error: No case type in case %d, the test will be > finished here.\n", > + i + 1); > + test_case->is_valid = false; > + continue; > + } > + > + if (strcmp(case_type, DMA_MEM_COPY) == 0) { > + test_case->test_type = TEST_TYPE_DMA_MEM_COPY; > + test_case->test_type_str = DMA_MEM_COPY; > + is_dma = true; > + } else if (strcmp(case_type, CPU_MEM_COPY) == 0) { > + test_case->test_type = TEST_TYPE_CPU_MEM_COPY; > + test_case->test_type_str = CPU_MEM_COPY; > + is_dma = false; > + } else { > + printf("Error: Wrong test case type %s in case%d.\n", > case_type, i + 1); > + test_case->is_valid = false; > + continue; > + } > + > + test_case->src_numa_node = > (int)atoi(rte_cfgfile_get_entry(cfgfile, > + section_name, > "src_numa_node")); > + test_case->dst_numa_node = > (int)atoi(rte_cfgfile_get_entry(cfgfile, > + section_name, > "dst_numa_node")); > + nb_vp = 0; > + mem_size_str = rte_cfgfile_get_entry(cfgfile, section_name, > "mem_size"); > + args_nr = parse_entry(mem_size_str, &test_case- > >mem_size); > + if (args_nr < 0) { > + printf("parse error in case %d.\n", i + 1); > + test_case->is_valid = false; > + continue; > + } else if (args_nr == 4) > + nb_vp++; > + > + buf_size_str = rte_cfgfile_get_entry(cfgfile, section_name, > "buf_size"); > + args_nr = parse_entry(buf_size_str, &test_case->buf_size); > + if (args_nr < 0) { > + printf("parse error in case %d.\n", i + 1); > + test_case->is_valid = false; > + continue; > + } else if (args_nr == 4) > + nb_vp++; > + > + if (is_dma) { > + ring_size_str = rte_cfgfile_get_entry(cfgfile, > section_name, > + > "dma_ring_size"); > + args_nr = parse_entry(ring_size_str, &test_case- > >ring_size); > + if (args_nr < 0) { > + printf("parse error in case %d.\n", i + 1); > + test_case->is_valid = false; > + continue; > + } else if (args_nr == 4) > + nb_vp++; > + > + kick_batch_str = rte_cfgfile_get_entry(cfgfile, > section_name, "kick_batch"); > + args_nr = parse_entry(kick_batch_str, &test_case- > >kick_batch); > + if (args_nr < 0) { > + printf("parse error in case %d.\n", i + 1); > + test_case->is_valid = false; > + continue; > + } else if (args_nr == 4) > + nb_vp++; > + > + lcore_dma = rte_cfgfile_get_entry(cfgfile, > section_name, "lcore_dma"); > + int lcore_ret = parse_lcore_dma(test_case, > lcore_dma); > + if (lcore_ret < 0) { > + printf("parse lcore dma error in case %d.\n", > i + 1); > + test_case->is_valid = false; > + continue; > + } > + } else { > + lcore_dma = rte_cfgfile_get_entry(cfgfile, > section_name, "lcore"); > + int lcore_ret = parse_lcore(test_case, lcore_dma); > + if (lcore_ret < 0) { > + printf("parse lcore error in case %d.\n", i + 1); > + test_case->is_valid = false; > + continue; > + } > + } > + > + if (nb_vp > 1) { > + printf("Case %d error, each section can only have a > single variable parameter.\n", > + i + 1); > + test_case->is_valid = false; > + continue; > + } > + > + test_case->cache_flush = > + (uint8_t)atoi(rte_cfgfile_get_entry(cfgfile, > section_name, "cache_flush")); > + test_case->test_secs = > (uint16_t)atoi(rte_cfgfile_get_entry(cfgfile, > + section_name, "test_seconds")); > + > + test_case->eal_args = rte_cfgfile_get_entry(cfgfile, > section_name, "eal_args"); > + test_case->is_valid = true; > + } > + > + rte_cfgfile_close(cfgfile); > + printf("config file parsing complete.\n\n"); > + return i; > +} > + > +/* Parse the argument given in the command line of the application */ > +static int append_eal_args(int argc, char **argv, const char > +*eal_args, char **new_argv) { > + int i; > + char *tokens[MAX_EAL_PARAM_NB]; > + char args[MAX_EAL_PARAM_LEN] = {0}; > + int token_nb, new_argc = 0; > + > + for (i = 0; i < argc; i++) { > + if ((strcmp(argv[i], CMDLINE_CONFIG_ARG) == 0) || > + (strcmp(argv[i], CMDLINE_RESULT_ARG) == 0)) > { > + i++; > + continue; > + } > + strlcpy(new_argv[new_argc], argv[i], MAX_EAL_PARAM_LEN); > + new_argc++; > + } > + > + if (eal_args) { > + strlcpy(args, eal_args, MAX_EAL_PARAM_LEN); > + token_nb = rte_strsplit(args, strlen(args), > + tokens, MAX_EAL_PARAM_NB, ' '); > + for (i = 0; i < token_nb; i++) > + strlcpy(new_argv[new_argc++], tokens[i], > MAX_EAL_PARAM_LEN); > + } > + > + return new_argc; > +} > + > +int > +main(int argc, char *argv[]) > +{ > + int ret; > + uint16_t case_nb; > + uint32_t i, nb_lcores; > + pid_t cpid, wpid; > + int wstatus; > + char args[MAX_EAL_PARAM_NB][MAX_EAL_PARAM_LEN]; > + char *pargs[MAX_EAL_PARAM_NB]; > + char *cfg_path_ptr = NULL; > + char *rst_path_ptr = NULL; > + char rst_path[PATH_MAX]; > + int new_argc; > + > + memset(args, 0, sizeof(args)); > + > + for (i = 0; i < RTE_DIM(pargs); i++) > + pargs[i] = args[i]; > + > + for (i = 0; i < (uint32_t)argc; i++) { > + if (strncmp(argv[i], CMDLINE_CONFIG_ARG, > MAX_LONG_OPT_SZ) == 0) > + cfg_path_ptr = argv[i + 1]; > + if (strncmp(argv[i], CMDLINE_RESULT_ARG, > MAX_LONG_OPT_SZ) == 0) > + rst_path_ptr = argv[i + 1]; > + } > + if (cfg_path_ptr == NULL) { > + printf("Config file not assigned.\n"); > + return -1; > + } > + if (rst_path_ptr == NULL) { > + strlcpy(rst_path, cfg_path_ptr, PATH_MAX); > + char *token = strtok(basename(rst_path), "."); > + if (token == NULL) { > + printf("Config file error.\n"); > + return -1; > + } > + strcat(token, "_result.csv"); > + rst_path_ptr = rst_path; > + } > + > + case_nb = load_configs(cfg_path_ptr); > + fd = fopen(rst_path_ptr, "w"); > + if (fd == NULL) { > + printf("Open output CSV file error.\n"); > + return -1; > + } > + fclose(fd); > + > + printf("Running cases...\n"); > + for (i = 0; i < case_nb; i++) { > + if (!test_cases[i].is_valid) { > + printf("Invalid test case %d.\n\n", i + 1); > + snprintf(output_str[0], MAX_OUTPUT_STR_LEN, > "Invalid case %d\n", i + > +1); > + > + fd = fopen(rst_path_ptr, "a"); > + if (!fd) { > + printf("Open output CSV file error.\n"); > + return 0; > + } > + output_csv(true); > + fclose(fd); > + continue; > + } > + > + if (test_cases[i].test_type == TEST_TYPE_NONE) { > + printf("No valid test type in test case %d.\n\n", i + 1); > + snprintf(output_str[0], MAX_OUTPUT_STR_LEN, > "Invalid case %d\n", i + > +1); > + > + fd = fopen(rst_path_ptr, "a"); > + if (!fd) { > + printf("Open output CSV file error.\n"); > + return 0; > + } > + output_csv(true); > + fclose(fd); > + continue; > + } > + > + cpid = fork(); > + if (cpid < 0) { > + printf("Fork case %d failed.\n", i + 1); > + exit(EXIT_FAILURE); > + } else if (cpid == 0) { > + printf("\nRunning case %u\n\n", i + 1); > + > + new_argc = append_eal_args(argc, argv, > test_cases[i].eal_args, pargs); > + ret = rte_eal_init(new_argc, pargs); > + if (ret < 0) > + rte_exit(EXIT_FAILURE, "Invalid EAL > arguments\n"); > + > + /* Check lcores. */ > + nb_lcores = rte_lcore_count(); > + if (nb_lcores < 2) > + rte_exit(EXIT_FAILURE, > + "There should be at least 2 worker > lcores.\n"); > + > + fd = fopen(rst_path_ptr, "a"); > + if (!fd) { > + printf("Open output CSV file error.\n"); > + return 0; > + } > + > + output_env_info(); > + > + run_test(i + 1, &test_cases[i]); > + > + /* clean up the EAL */ > + rte_eal_cleanup(); > + > + fclose(fd); > + > + printf("\nCase %u completed.\n\n", i + 1); > + > + exit(EXIT_SUCCESS); > + } else { > + wpid = waitpid(cpid, &wstatus, 0); > + if (wpid == -1) { > + printf("waitpid error.\n"); > + exit(EXIT_FAILURE); > + } > + > + if (WIFEXITED(wstatus)) > + printf("Case process exited. status %d\n\n", > + WEXITSTATUS(wstatus)); > + else if (WIFSIGNALED(wstatus)) > + printf("Case process killed by signal %d\n\n", > + WTERMSIG(wstatus)); > + else if (WIFSTOPPED(wstatus)) > + printf("Case process stopped by > signal %d\n\n", > + WSTOPSIG(wstatus)); > + else if (WIFCONTINUED(wstatus)) > + printf("Case process continued.\n\n"); > + else > + printf("Case process unknown > terminated.\n\n"); > + } > + } > + > + printf("Bye...\n"); > + return 0; > +} > + > diff --git a/app/test-dma-perf/main.h b/app/test-dma-perf/main.h new > file mode 100644 index 0000000000..12bc3f4e3f > --- /dev/null > +++ b/app/test-dma-perf/main.h > @@ -0,0 +1,64 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2023 Intel Corporation */ > + > +#ifndef _MAIN_H_ > +#define _MAIN_H_ > + > + > +#include > +#include > +#include > + > +#define MAX_WORKER_NB 128 > +#define MAX_OUTPUT_STR_LEN 512 > + > +#define MAX_DMA_NB 128 > +#define MAX_LCORE_NB 256 > + > +extern char output_str[MAX_WORKER_NB + 1][MAX_OUTPUT_STR_LEN]; > + > +typedef enum { > + OP_NONE = 0, > + OP_ADD, > + OP_MUL > +} alg_op_type; > + > +struct test_configure_entry { > + uint32_t first; > + uint32_t last; > + uint32_t incr; > + alg_op_type op; > + uint32_t cur; > +}; > + > +struct lcore_dma_map_t { > + uint32_t lcores[MAX_WORKER_NB]; > + char dma_names[MAX_WORKER_NB][RTE_DEV_NAME_MAX_LEN]; > + int16_t dma_ids[MAX_WORKER_NB]; > + uint16_t cnt; > +}; > + > +struct test_configure { > + bool is_valid; > + uint8_t test_type; > + const char *test_type_str; > + uint16_t src_numa_node; > + uint16_t dst_numa_node; > + uint16_t opcode; > + bool is_dma; > + struct lcore_dma_map_t lcore_dma_map; > + struct test_configure_entry mem_size; > + struct test_configure_entry buf_size; > + struct test_configure_entry ring_size; > + struct test_configure_entry kick_batch; > + uint8_t cache_flush; > + uint32_t nr_buf; > + uint16_t test_secs; > + const char *eal_args; > + uint8_t scenario_id; > +}; > + > +void mem_copy_benchmark(struct test_configure *cfg, bool is_dma); > + > +#endif /* _MAIN_H_ */ > diff --git a/app/test-dma-perf/meson.build b/app/test-dma- > perf/meson.build new file mode 100644 index 0000000000..bd6c264002 > --- /dev/null > +++ b/app/test-dma-perf/meson.build > @@ -0,0 +1,17 @@ > +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2019-2023 > +Intel Corporation > + > +# meson file, for building this app as part of a main DPDK build. > + > +if is_windows > + build = false > + reason = 'not supported on Windows' > + subdir_done() > +endif > + > +deps += ['dmadev', 'mbuf', 'cfgfile'] > + > +sources = files( > + 'main.c', > + 'benchmark.c', > +) > diff --git a/doc/guides/rel_notes/release_23_07.rst > b/doc/guides/rel_notes/release_23_07.rst > index 4459144140..796cc5517d 100644 > --- a/doc/guides/rel_notes/release_23_07.rst > +++ b/doc/guides/rel_notes/release_23_07.rst > @@ -200,6 +200,12 @@ New Features > > Enhanced the GRO library to support TCP packets over IPv6 network. > > +* **Added DMA device performance test application.** > + > + Added an new application to test the performance of DMA device and CPU. > + > + See the :doc:`../tools/dmaperf` for more details. > + > > Removed Items > ------------- > diff --git a/doc/guides/tools/dmaperf.rst > b/doc/guides/tools/dmaperf.rst new file mode 100644 index > 0000000000..c5f8a9406f > --- /dev/null > +++ b/doc/guides/tools/dmaperf.rst > @@ -0,0 +1,103 @@ > +.. SPDX-License-Identifier: BSD-3-Clause > + Copyright(c) 2023 Intel Corporation. > + > +dpdk-test-dma-perf Application > +============================== > + > +The ``dpdk-test-dma-perf`` tool is a Data Plane Development Kit > +(DPDK) application that enables testing the performance of DMA > +(Direct Memory > +Access) devices available within DPDK. It provides a test framework > +to assess the performance of CPU and DMA devices under various > +scenarios, such as varying buffer lengths. Doing so provides insight > +into the potential performance when using these DMA devices for > +acceleration in DPDK applications. It supports memory copy > +performance tests for now, > comparing the performance of CPU and DMA automatically in various > conditions with the help of a pre-set configuration file. > + > + > +Configuration > +------------- > +This application uses inherent DPDK EAL command-line options as well > +as custom command-line options in the application. An example > +configuration file for the application is provided and gives the > +meanings for > each parameter. > + > +Here is an extracted sample from the configuration file (the complete > +sample can be found in the application source directory): > + > +.. code-block:: ini > + > + [case1] > + type=DMA_MEM_COPY > + mem_size=10 > + buf_size=64,8192,2,MUL > + dma_ring_size=1024 > + kick_batch=32 > + src_numa_node=0 > + dst_numa_node=0 > + cache_flush=0 > + test_seconds=2 > + lcore_dma=lcore10@0000:00:04.2, lcore11@0000:00:04.3 > + eal_args=--in-memory --file-prefix=test > + > + [case2] > + type=CPU_MEM_COPY > + mem_size=10 > + buf_size=64,8192,2,MUL > + src_numa_node=0 > + dst_numa_node=1 > + cache_flush=0 > + test_seconds=2 > + lcore = 3, 4 > + eal_args=--in-memory --no-pci > + > +The configuration file is divided into multiple sections, each > +section > represents a test case. > +The four variables mem_size, buf_size, dma_ring_size, and kick_batch > +can > vary in each test case. > +The format for this is ``variable=first,last,increment,ADD\|MUL``. > +This means that the first value of the variable is 'first', the last > +value is 'last', 'increment' is the step size, and ADD|MUL indicates > +whether the change is by addition or multiplication. Each case can > +only have one > variable change, and each change will generate a scenario, so each > case can have multiple scenarios. > + > +Parameter Definitions > +--------------------- > + > +- **type**: The type of the test. Currently supported types are > `DMA_MEM_COPY` and `CPU_MEM_COPY`. > +- **mem_size**: The size of the memory footprint. > +- **buf_size**: The memory size of a single operation. > +- **dma_ring_size**: The DMA ring buffer size. Must be a power of > +two, > and between 64 and 4096. > +- **kick_batch**: The DMA operation batch size, should be greater > +than 1 > normally. > +- **src_numa_node**: Controls the NUMA node where the source memory > is allocated. > +- **dst_numa_node**: Controls the NUMA node where the destination > memory is allocated. > +- **cache_flush**: Determines whether the cache should be flushed. > +`1` > indicates to flush and `0` to not flush. > +- **test_seconds**: Controls the test time for each scenario. > +- **lcore_dma**: Specifies the lcore/DMA mapping. > +- **lcore**: Specifies the lcore for CPU testing. > +- **eal_args**: Specifies the EAL arguments. > + > +.. Note:: > + > + The mapping of lcore to DMA must be one-to-one and cannot be > duplicated. > + > +To specify a configuration file, use the "\-\-config" flag followed > +by the path > to the file. > + > +To specify a result file, use the "\-\-result" flag followed by the > +path to the file. If you do not specify a result file, one will be > +generated with the same name as the configuration file, with the > +addition > of "_result.csv" at the end. > + > + > +Running the Application > +----------------------- > + > +Typical command-line invocation to execute the application: > + > +.. code-block:: console > + > + dpdk-test-dma-perf --config=./config_dma.ini > + --result=./res_dma.csv > + > +Where `config_dma.ini` is the configuration file, and `res_dma.csv` > +will be the generated result file. > + > +After the tests, you can find the results in the `res_dma.csv` file. > + > +Limitations > +----------- > + > +Currently, this tool only supports memory copy performance tests. > +Additional enhancements are possible in the future to support more > +types > of tests for DMA devices and CPUs. > diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst > index > 6f84fc31ff..857572da96 100644 > --- a/doc/guides/tools/index.rst > +++ b/doc/guides/tools/index.rst > @@ -23,3 +23,4 @@ DPDK Tools User Guides > testregex > testmldev > dts > + dmaperf > -- > 2.40.1 > > > > End of dev Digest, Vol 462, Issue 27 > ************************************