Hi Cheng,

 

LGTM.

 

> -----Original Message-----

> Date: Wed, 28 Jun 2023 01:20:34 +0000

> From: Cheng Jiang <cheng1.jiang@intel.com>

> To: thomas@monjalon.net, bruce.richardson@intel.com,

>            mb@smartsharesystems.com, chenbo.xia@intel.com,

>            amitprakashs@marvell.com, anoobj@marvell.com,

> huangdengdui@huawei.com,

>            kevin.laatz@intel.com, fengchengwen@huawei.com, jerinj@marvell.com

> Cc: dev@dpdk.org, jiayu.hu@intel.com, xuan.ding@intel.com,

>            wenwux.ma@intel.com, yuanx.wang@intel.com, xingguang.he@intel.com,

>            weix.ling@intel.com, Cheng Jiang <cheng1.jiang@intel.com>

> Subject: [PATCH v10] app/dma-perf: introduce dma-perf application

> Message-ID: <20230628012034.49016-1-cheng1.jiang@intel.com>

> Content-Type: text/plain; charset=UTF-8

>

> There are many high-performance DMA devices supported in DPDK now, and

> these DMA devices can also be integrated into other modules of DPDK as

> accelerators, such as Vhost. Before integrating DMA into applications,

> developers need to know the performance of these DMA devices in

> various scenarios and the performance of CPUs in the same scenario,

> such as different buffer lengths. Only in this way can we know the

> target performance of the application accelerated by using them. This

> patch introduces a high-performance testing tool, which supports

> comparing the performance of CPU and DMA in different scenarios

> automatically with a pre-set config file. Memory Copy performance test are supported for now.

>

> Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>

> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>

> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>

> Acked-by: Morten Br?rup <mb@smartsharesystems.com>

> Acked-by: Chenbo Xia <chenbo.xia@intel.com>

 

Acked-by: Yuying Zhang <yuying.zhang@intel.com>

 

> ---

> v10:

>   rebased code from 23.07-rc2;

> v9:

>   improved error handling;

>   improved lcore_params structure;

>   improved mbuf api calling;

>   improved exit process;

>   fixed some typos;

>   added scenario summary data display;

>   removed unnecessary include;

> v8:

>   fixed string copy issue in parse_lcore();

>   improved some data display format;

>   added doc in doc/guides/tools;

>   updated release notes;

> v7:

>   fixed some strcpy issues;

>   removed cache setup in calling rte_pktmbuf_pool_create();

>   fixed some typos;

>   added some memory free and null set operations;

>   improved result calculation;

> v6:

>   improved code based on Anoob's comments;

>   fixed some code structure issues;

> v5:

>   fixed some LONG_LINE warnings;

> v4:

>   fixed inaccuracy of the memory footprint display;

> v3:

>   fixed some typos;

> v2:

>   added lcore/dmadev designation;

>   added error case process;

>   removed worker_threads parameter from config.ini;

>   improved the logs;

>   improved config file;

>

>  app/meson.build                        |   1 +

>  app/test-dma-perf/benchmark.c          | 508 ++++++++++++++++++++

>  app/test-dma-perf/config.ini           |  61 +++

>  app/test-dma-perf/main.c               | 616 +++++++++++++++++++++++++

>  app/test-dma-perf/main.h               |  64 +++

>  app/test-dma-perf/meson.build          |  17 +

>  doc/guides/rel_notes/release_23_07.rst |   6 +

>  doc/guides/tools/dmaperf.rst           | 103 +++++

>  doc/guides/tools/index.rst             |   1 +

>  9 files changed, 1377 insertions(+)

>  create mode 100644 app/test-dma-perf/benchmark.c  create mode 100644

> app/test-dma-perf/config.ini  create mode 100644

> app/test-dma-perf/main.c create mode 100644 app/test-dma-perf/main.h 

> create mode 100644 app/test-dma-perf/meson.build  create mode 100644

> doc/guides/tools/dmaperf.rst

>

> diff --git a/app/meson.build b/app/meson.build index

> 74d2420f67..4fc1a83eba 100644

> --- a/app/meson.build

> +++ b/app/meson.build

> @@ -19,6 +19,7 @@ apps = [

>          'test-cmdline',

>          'test-compress-perf',

>          'test-crypto-perf',

> +        'test-dma-perf',

>          'test-eventdev',

>          'test-fib',

>          'test-flow-perf',

> diff --git a/app/test-dma-perf/benchmark.c b/app/test-dma-

> perf/benchmark.c new file mode 100644 index 0000000000..0601e0d171

> --- /dev/null

> +++ b/app/test-dma-perf/benchmark.c

> @@ -0,0 +1,508 @@

> +/* SPDX-License-Identifier: BSD-3-Clause

> + * Copyright(c) 2023 Intel Corporation  */

> +

> +#include <inttypes.h>

> +#include <stdio.h>

> +#include <stdlib.h>

> +#include <unistd.h>

> +

> +#include <rte_time.h>

> +#include <rte_mbuf.h>

> +#include <rte_dmadev.h>

> +#include <rte_malloc.h>

> +#include <rte_lcore.h>

> +

> +#include "main.h"

> +

> +#define MAX_DMA_CPL_NB 255

> +

> +#define TEST_WAIT_U_SECOND 10000

> +#define POLL_MAX 1000

> +

> +#define CSV_LINE_DMA_FMT "Scenario %u,%u,%s,%u,%u,%u,%u,%.2lf,%"

> PRIu64 ",%.3lf,%.3lf\n"

> +#define CSV_LINE_CPU_FMT "Scenario %u,%u,NA,NA,NA,%u,%u,%.2lf,%"

> PRIu64 ",%.3lf,%.3lf\n"

> +

> +#define CSV_TOTAL_LINE_FMT "Scenario %u

> Summary, , , , , ,%u,%.2lf,%u,%.3lf,%.3lf\n"

> +

> +struct worker_info {

> +         bool ready_flag;

> +         bool start_flag;

> +         bool stop_flag;

> +         uint32_t total_cpl;

> +         uint32_t test_cpl;

> +};

> +

> +struct lcore_params {

> +         uint8_t scenario_id;

> +         unsigned int lcore_id;

> +         char *dma_name;

> +         uint16_t worker_id;

> +         uint16_t dev_id;

> +         uint32_t nr_buf;

> +         uint16_t kick_batch;

> +         uint32_t buf_size;

> +         uint16_t test_secs;

> +         struct rte_mbuf **srcs;

> +         struct rte_mbuf **dsts;

> +         volatile struct worker_info worker_info; };

> +

> +static struct rte_mempool *src_pool;

> +static struct rte_mempool *dst_pool;

> +

> +static struct lcore_params *lcores[MAX_WORKER_NB];

> +

> +#define PRINT_ERR(...) print_err(__func__, __LINE__, __VA_ARGS__)

> +

> +static inline int

> +__rte_format_printf(3, 4)

> +print_err(const char *func, int lineno, const char *format, ...) {

> +         va_list ap;

> +         int ret;

> +

> +         ret = fprintf(stderr, "In %s:%d - ", func, lineno);

> +         va_start(ap, format);

> +         ret += vfprintf(stderr, format, ap);

> +         va_end(ap);

> +

> +         return ret;

> +}

> +

> +static inline void

> +calc_result(uint32_t buf_size, uint32_t nr_buf, uint16_t nb_workers,

> uint16_t test_secs,

> +                                                    uint32_t total_cnt, float *memory, uint32_t

> *ave_cycle,

> +                                                    float *bandwidth, float *mops)

> +{

> +         float ops;

> +

> +         *memory = (float)(buf_size * (nr_buf / nb_workers) * 2) / (1024 *

> 1024);

> +         *ave_cycle = test_secs * rte_get_timer_hz() / total_cnt;

> +         ops = (float)total_cnt / test_secs;

> +         *mops = ops / (1000 * 1000);

> +         *bandwidth = (ops * buf_size * 8) / (1000 * 1000 * 1000); }

> +

> +static void

> +output_result(uint8_t scenario_id, uint32_t lcore_id, char *dma_name,

> uint16_t ring_size,

> +                                      uint16_t kick_batch, uint64_t ave_cycle, uint32_t

> buf_size, uint32_t nr_buf,

> +                                      float memory, float bandwidth, float mops, bool

> is_dma) {

> +         if (is_dma)

> +                       printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch

> Size: %u.\n",

> +                                                    lcore_id, dma_name, ring_size, kick_batch);

> +         else

> +                       printf("lcore %u\n", lcore_id);

> +

> +         printf("Average Cycles/op: %" PRIu64 ", Buffer Size: %u B, Buffer

> Number: %u, Memory: %.2lf MB, Frequency: %.3lf Ghz.\n",

> +                                      ave_cycle, buf_size, nr_buf, memory,

> rte_get_timer_hz()/1000000000.0);

> +         printf("Average Bandwidth: %.3lf Gbps, MOps: %.3lf\n", bandwidth,

> +mops);

> +

> +         if (is_dma)

> +                       snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN,

> CSV_LINE_DMA_FMT,

> +                                      scenario_id, lcore_id, dma_name, ring_size,

> kick_batch, buf_size,

> +                                      nr_buf, memory, ave_cycle, bandwidth, mops);

> +         else

> +                       snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN,

> CSV_LINE_CPU_FMT,

> +                                      scenario_id, lcore_id, buf_size,

> +                                      nr_buf, memory, ave_cycle, bandwidth, mops); }

> +

> +static inline void

> +cache_flush_buf(__rte_unused struct rte_mbuf **array,

> +                       __rte_unused uint32_t buf_size,

> +                       __rte_unused uint32_t nr_buf)

> +{

> +#ifdef RTE_ARCH_X86_64

> +         char *data;

> +         struct rte_mbuf **srcs = array;

> +         uint32_t i, offset;

> +

> +         for (i = 0; i < nr_buf; i++) {

> +                       data = rte_pktmbuf_mtod(srcs[i], char *);

> +                       for (offset = 0; offset < buf_size; offset += 64)

> +                                      __builtin_ia32_clflush(data + offset);

> +         }

> +#endif

> +}

> +

> +/* Configuration of device. */

> +static void

> +configure_dmadev_queue(uint32_t dev_id, uint32_t ring_size) {

> +         uint16_t vchan = 0;

> +         struct rte_dma_info info;

> +         struct rte_dma_conf dev_config = { .nb_vchans = 1 };

> +         struct rte_dma_vchan_conf qconf = {

> +                       .direction = RTE_DMA_DIR_MEM_TO_MEM,

> +                       .nb_desc = ring_size

> +         };

> +

> +         if (rte_dma_configure(dev_id, &dev_config) != 0)

> +                       rte_exit(EXIT_FAILURE, "Error with dma configure.\n");

> +

> +         if (rte_dma_vchan_setup(dev_id, vchan, &qconf) != 0)

> +                       rte_exit(EXIT_FAILURE, "Error with queue configuration.\n");

> +

> +         if (rte_dma_info_get(dev_id, &info) != 0)

> +                       rte_exit(EXIT_FAILURE, "Error with getting device info.\n");

> +

> +         if (info.nb_vchans != 1)

> +                       rte_exit(EXIT_FAILURE, "Error, no configured queues

> reported on device id. %u\n",

> +                                                    dev_id);

> +

> +         if (rte_dma_start(dev_id) != 0)

> +                       rte_exit(EXIT_FAILURE, "Error with dma start.\n"); }

> +

> +static int

> +config_dmadevs(struct test_configure *cfg) {

> +         uint32_t ring_size = cfg->ring_size.cur;

> +         struct lcore_dma_map_t *ldm = &cfg->lcore_dma_map;

> +         uint32_t nb_workers = ldm->cnt;

> +         uint32_t i;

> +         int dev_id;

> +         uint16_t nb_dmadevs = 0;

> +         char *dma_name;

> +

> +         for (i = 0; i < ldm->cnt; i++) {

> +                       dma_name = ldm->dma_names[i];

> +                       dev_id = rte_dma_get_dev_id_by_name(dma_name);

> +                       if (dev_id < 0) {

> +                                      fprintf(stderr, "Error: Fail to find DMA %s.\n",

> dma_name);

> +                                      goto end;

> +                       }

> +

> +                       ldm->dma_ids[i] = dev_id;

> +                       configure_dmadev_queue(dev_id, ring_size);

> +                       ++nb_dmadevs;

> +         }

> +

> +end:

> +         if (nb_dmadevs < nb_workers) {

> +                       printf("Not enough dmadevs (%u) for all workers (%u).\n",

> nb_dmadevs, nb_workers);

> +                       return -1;

> +         }

> +

> +         printf("Number of used dmadevs: %u.\n", nb_dmadevs);

> +

> +         return 0;

> +}

> +

> +static void

> +error_exit(int dev_id)

> +{

> +         rte_dma_stop(dev_id);

> +         rte_dma_close(dev_id);

> +         rte_exit(EXIT_FAILURE, "DMA error\n"); }

> +

> +static inline void

> +do_dma_submit_and_poll(uint16_t dev_id, uint64_t *async_cnt,

> +                                      volatile struct worker_info *worker_info) {

> +         int ret;

> +         uint16_t nr_cpl;

> +

> +         ret = rte_dma_submit(dev_id, 0);

> +         if (ret < 0)

> +                       error_exit(dev_id);

> +

> +         nr_cpl = rte_dma_completed(dev_id, 0, MAX_DMA_CPL_NB, NULL,

> NULL);

> +         *async_cnt -= nr_cpl;

> +         worker_info->total_cpl += nr_cpl;

> +}

> +

> +static inline int

> +do_dma_mem_copy(void *p)

> +{

> +         struct lcore_params *para = (struct lcore_params *)p;

> +         volatile struct worker_info *worker_info = &(para->worker_info);

> +         const uint16_t dev_id = para->dev_id;

> +         const uint32_t nr_buf = para->nr_buf;

> +         const uint16_t kick_batch = para->kick_batch;

> +         const uint32_t buf_size = para->buf_size;

> +         struct rte_mbuf **srcs = para->srcs;

> +         struct rte_mbuf **dsts = para->dsts;

> +         uint16_t nr_cpl;

> +         uint64_t async_cnt = 0;

> +         uint32_t i;

> +         uint32_t poll_cnt = 0;

> +         int ret;

> +

> +         worker_info->stop_flag = false;

> +         worker_info->ready_flag = true;

> +

> +         while (!worker_info->start_flag)

> +                       ;

> +

> +         while (1) {

> +                       for (i = 0; i < nr_buf; i++) {

> +dma_copy:

> +                                      ret = rte_dma_copy(dev_id, 0,

> rte_mbuf_data_iova(srcs[i]),

> +                                                    rte_mbuf_data_iova(dsts[i]), buf_size, 0);

> +                                      if (unlikely(ret < 0)) {

> +                                                    if (ret == -ENOSPC) {

> +                                                                   do_dma_submit_and_poll(dev_id,

> &async_cnt, worker_info);

> +                                                                   goto dma_copy;

> +                                                    } else

> +                                                                   error_exit(dev_id);

> +                                      }

> +                                      async_cnt++;

> +

> +                                      if ((async_cnt % kick_batch) == 0)

> +                                                    do_dma_submit_and_poll(dev_id,

> &async_cnt, worker_info);

> +                       }

> +

> +                       if (worker_info->stop_flag)

> +                                      break;

> +         }

> +

> +         rte_dma_submit(dev_id, 0);

> +         while ((async_cnt > 0) && (poll_cnt++ < POLL_MAX)) {

> +                       nr_cpl = rte_dma_completed(dev_id, 0, MAX_DMA_CPL_NB,

> NULL, NULL);

> +                       async_cnt -= nr_cpl;

> +         }

> +

> +         return 0;

> +}

> +

> +static inline int

> +do_cpu_mem_copy(void *p)

> +{

> +         struct lcore_params *para = (struct lcore_params *)p;

> +         volatile struct worker_info *worker_info = &(para->worker_info);

> +         const uint32_t nr_buf = para->nr_buf;

> +         const uint32_t buf_size = para->buf_size;

> +         struct rte_mbuf **srcs = para->srcs;

> +         struct rte_mbuf **dsts = para->dsts;

> +         uint32_t i;

> +

> +         worker_info->stop_flag = false;

> +         worker_info->ready_flag = true;

> +

> +         while (!worker_info->start_flag)

> +                       ;

> +

> +         while (1) {

> +                       for (i = 0; i < nr_buf; i++) {

> +                                      /* copy buffer form src to dst */

> +                                      rte_memcpy((void

> *)(uintptr_t)rte_mbuf_data_iova(dsts[i]),

> +                                                    (void

> *)(uintptr_t)rte_mbuf_data_iova(srcs[i]),

> +                                                    (size_t)buf_size);

> +                                      worker_info->total_cpl++;

> +                       }

> +                       if (worker_info->stop_flag)

> +                                      break;

> +         }

> +

> +         return 0;

> +}

> +

> +static int

> +setup_memory_env(struct test_configure *cfg, struct rte_mbuf ***srcs,

> +                                      struct rte_mbuf ***dsts)

> +{

> +         unsigned int buf_size = cfg->buf_size.cur;

> +         unsigned int nr_sockets;

> +         uint32_t nr_buf = cfg->nr_buf;

> +

> +         nr_sockets = rte_socket_count();

> +         if (cfg->src_numa_node >= nr_sockets ||

> +                       cfg->dst_numa_node >= nr_sockets) {

> +                       printf("Error: Source or destination numa exceeds the acture

> numa nodes.\n");

> +                       return -1;

> +         }

> +

> +         src_pool = rte_pktmbuf_pool_create("Benchmark_DMA_SRC",

> +                                      nr_buf,

> +                                      0,

> +                                      0,

> +                                      buf_size + RTE_PKTMBUF_HEADROOM,

> +                                      cfg->src_numa_node);

> +         if (src_pool == NULL) {

> +                       PRINT_ERR("Error with source mempool creation.\n");

> +                       return -1;

> +         }

> +

> +         dst_pool = rte_pktmbuf_pool_create("Benchmark_DMA_DST",

> +                                      nr_buf,

> +                                      0,

> +                                      0,

> +                                      buf_size + RTE_PKTMBUF_HEADROOM,

> +                                      cfg->dst_numa_node);

> +         if (dst_pool == NULL) {

> +                       PRINT_ERR("Error with destination mempool creation.\n");

> +                       return -1;

> +         }

> +

> +         *srcs = rte_malloc(NULL, nr_buf * sizeof(struct rte_mbuf *), 0);

> +         if (*srcs == NULL) {

> +                       printf("Error: srcs malloc failed.\n");

> +                       return -1;

> +         }

> +

> +         *dsts = rte_malloc(NULL, nr_buf * sizeof(struct rte_mbuf *), 0);

> +         if (*dsts == NULL) {

> +                       printf("Error: dsts malloc failed.\n");

> +                       return -1;

> +         }

> +

> +         if (rte_pktmbuf_alloc_bulk(src_pool, *srcs, nr_buf) != 0) {

> +                       printf("alloc src mbufs failed.\n");

> +                       return -1;

> +         }

> +

> +         if (rte_pktmbuf_alloc_bulk(dst_pool, *dsts, nr_buf) != 0) {

> +                       printf("alloc dst mbufs failed.\n");

> +                       return -1;

> +         }

> +

> +         return 0;

> +}

> +

> +void

> +mem_copy_benchmark(struct test_configure *cfg, bool is_dma) {

> +         uint16_t i;

> +         uint32_t offset;

> +         unsigned int lcore_id = 0;

> +         struct rte_mbuf **srcs = NULL, **dsts = NULL;

> +         struct lcore_dma_map_t *ldm = &cfg->lcore_dma_map;

> +         unsigned int buf_size = cfg->buf_size.cur;

> +         uint16_t kick_batch = cfg->kick_batch.cur;

> +         uint32_t nr_buf = cfg->nr_buf = (cfg->mem_size.cur * 1024 * 1024) /

> (cfg->buf_size.cur * 2);

> +         uint16_t nb_workers = ldm->cnt;

> +         uint16_t test_secs = cfg->test_secs;

> +         float memory = 0;

> +         uint32_t avg_cycles = 0;

> +         uint32_t avg_cycles_total;

> +         float mops, mops_total;

> +         float bandwidth, bandwidth_total;

> +

> +         if (setup_memory_env(cfg, &srcs, &dsts) < 0)

> +                       goto out;

> +

> +         if (is_dma)

> +                       if (config_dmadevs(cfg) < 0)

> +                                      goto out;

> +

> +         if (cfg->cache_flush == 1) {

> +                       cache_flush_buf(srcs, buf_size, nr_buf);

> +                       cache_flush_buf(dsts, buf_size, nr_buf);

> +                       rte_mb();

> +         }

> +

> +         printf("Start testing....\n");

> +

> +         for (i = 0; i < nb_workers; i++) {

> +                       lcore_id = ldm->lcores[i];

> +                       offset = nr_buf / nb_workers * i;

> +                       lcores[i] = rte_malloc(NULL, sizeof(struct lcore_params), 0);

> +                       if (lcores[i] == NULL) {

> +                                      printf("lcore parameters malloc failure for

> lcore %d\n", lcore_id);

> +                                      break;

> +                       }

> +                       if (is_dma) {

> +                                     lcores[i]->dma_name = ldm->dma_names[i];

> +                                      lcores[i]->dev_id = ldm->dma_ids[i];

> +                                      lcores[i]->kick_batch = kick_batch;

> +                       }

> +                       lcores[i]->worker_id = i;

> +                       lcores[i]->nr_buf = (uint32_t)(nr_buf / nb_workers);

> +                       lcores[i]->buf_size = buf_size;

> +                       lcores[i]->test_secs = test_secs;

> +                       lcores[i]->srcs = srcs + offset;

> +                       lcores[i]->dsts = dsts + offset;

> +                       lcores[i]->scenario_id = cfg->scenario_id;

> +                       lcores[i]->lcore_id = lcore_id;

> +

> +                       if (is_dma)

> +                                      rte_eal_remote_launch(do_dma_mem_copy, (void

> *)(lcores[i]), lcore_id);

> +                       else

> +                                      rte_eal_remote_launch(do_cpu_mem_copy, (void

> *)(lcores[i]), lcore_id);

> +         }

> +

> +         while (1) {

> +                       bool ready = true;

> +                       for (i = 0; i < nb_workers; i++) {

> +                                      if (lcores[i]->worker_info.ready_flag == false) {

> +                                                    ready = 0;

> +                                                    break;

> +                                      }

> +                       }

> +                       if (ready)

> +                                      break;

> +         }

> +

> +         for (i = 0; i < nb_workers; i++)

> +                       lcores[i]->worker_info.start_flag = true;

> +

> +         usleep(TEST_WAIT_U_SECOND);

> +         for (i = 0; i < nb_workers; i++)

> +                       lcores[i]->worker_info.test_cpl = lcores[i]-

> >worker_info.total_cpl;

> +

> +         usleep(test_secs * 1000 * 1000);

> +         for (i = 0; i < nb_workers; i++)

> +                       lcores[i]->worker_info.test_cpl = lcores[i]-

> >worker_info.total_cpl -

> +                                                                                 lcores[i]-

> >worker_info.test_cpl;

> +

> +         for (i = 0; i < nb_workers; i++)

> +                       lcores[i]->worker_info.stop_flag = true;

> +

> +         rte_eal_mp_wait_lcore();

> +

> +         mops_total = 0;

> +         bandwidth_total = 0;

> +         avg_cycles_total = 0;

> +         for (i = 0; i < nb_workers; i++) {

> +                       calc_result(buf_size, nr_buf, nb_workers, test_secs,

> +                                      lcores[i]->worker_info.test_cpl,

> +                                      &memory, &avg_cycles, &bandwidth, &mops);

> +                       output_result(cfg->scenario_id, lcores[i]->lcore_id,

> +                                                                   lcores[i]->dma_name, cfg-

> >ring_size.cur, kick_batch,

> +                                                                   avg_cycles, buf_size, nr_buf /

> nb_workers, memory,

> +                                                                   bandwidth, mops, is_dma);

> +                       mops_total += mops;

> +                       bandwidth_total += bandwidth;

> +                       avg_cycles_total += avg_cycles;

> +         }

> +         printf("\nTotal Bandwidth: %.3lf Gbps, Total MOps: %.3lf\n",

> bandwidth_total, mops_total);

> +         snprintf(output_str[MAX_WORKER_NB], MAX_OUTPUT_STR_LEN,

> CSV_TOTAL_LINE_FMT,

> +                                      cfg->scenario_id, nr_buf, memory * nb_workers,

> +                                      avg_cycles_total / nb_workers, bandwidth_total,

> mops_total);

> +

> +out:

> +         /* free mbufs used in the test */

> +         if (srcs != NULL)

> +                       rte_pktmbuf_free_bulk(srcs, nr_buf);

> +         if (dsts != NULL)

> +                       rte_pktmbuf_free_bulk(dsts, nr_buf);

> +

> +         /* free the points for the mbufs */

> +         rte_free(srcs);

> +         srcs = NULL;

> +         rte_free(dsts);

> +         dsts = NULL;

> +

> +         rte_mempool_free(src_pool);

> +         src_pool = NULL;

> +

> +         rte_mempool_free(dst_pool);

> +         dst_pool = NULL;

> +

> +         /* free the worker parameters */

> +         for (i = 0; i < nb_workers; i++) {

> +                       rte_free(lcores[i]);

> +                       lcores[i] = NULL;

> +         }

> +

> +         if (is_dma) {

> +                       for (i = 0; i < nb_workers; i++) {

> +                                      printf("Stopping dmadev %d\n", ldm->dma_ids[i]);

> +                                      rte_dma_stop(ldm->dma_ids[i]);

> +                       }

> +         }

> +}

> diff --git a/app/test-dma-perf/config.ini

> b/app/test-dma-perf/config.ini new file mode 100644 index

> 0000000000..b550f4b23f

> --- /dev/null

> +++ b/app/test-dma-perf/config.ini

> @@ -0,0 +1,61 @@

> +

> +; This is an example configuration file for dma-perf, which details

> +the meanings of each parameter ; and instructions on how to use dma-perf.

> +

> +; Supported test types are DMA_MEM_COPY and CPU_MEM_COPY.

> +

> +; Parameters:

> +; "mem_size" denotes the size of the memory footprint.

> +; "buf_size" denotes the memory size of a single operation.

> +; "dma_ring_size" denotes the dma ring buffer size. It should be must

> +be a power of two, and between ;  64 and 4096.

> +; "kick_batch" denotes the dma operation batch size, and should be

> +greater

> than 1 normally.

> +

> +; The format for variables is variable=first,last,increment,ADD|MUL.

> +

> +; src_numa_node is used to control the numa node where the source

> memory is allocated.

> +; dst_numa_node is used to control the numa node where the

> +destination

> memory is allocated.

> +

> +; cache_flush is used to determine whether or not the cache should be

> +flushed, with 1 indicating to ; flush and 0 indicating to not flush.

> +

> +; test_seconds controls the test time of the whole case.

> +

> +; To use DMA for a test, please specify the "lcore_dma" parameter.

> +; If you have already set the "-l" and "-a" parameters using EAL, ;

> +make sure that the value of "lcore_dma" falls within their range of

> +the

> values.

> +; We have to ensure a 1:1 mapping between the core and DMA device.

> +

> +; To use CPU for a test, please specify the "lcore" parameter.

> +; If you have already set the "-l" and "-a" parameters using EAL, ;

> +make sure that the value of "lcore" falls within their range of values.

> +

> +; To specify a configuration file, use the "--config" flag followed

> +by the path

> to the file.

> +

> +; To specify a result file, use the "--result" flag followed by the

> +path to the

> file.

> +; If you do not specify a result file, one will be generated with the

> +same name as the configuration ; file, with the addition of

> +"_result.csv" at

> the end.

> +

> +[case1]

> +type=DMA_MEM_COPY

> +mem_size=10

> +buf_size=64,8192,2,MUL

> +dma_ring_size=1024

> +kick_batch=32

> +src_numa_node=0

> +dst_numa_node=0

> +cache_flush=0

> +test_seconds=2

> +lcore_dma=lcore10@0000:00:04.2, lcore11@0000:00:04.3

> +eal_args=--in-memory --file-prefix=test

> +

> +[case2]

> +type=CPU_MEM_COPY

> +mem_size=10

> +buf_size=64,8192,2,MUL

> +src_numa_node=0

> +dst_numa_node=1

> +cache_flush=0

> +test_seconds=2

> +lcore = 3, 4

> +eal_args=--in-memory --no-pci

> diff --git a/app/test-dma-perf/main.c b/app/test-dma-perf/main.c new

> file mode 100644 index 0000000000..de37120df6

> --- /dev/null

> +++ b/app/test-dma-perf/main.c

> @@ -0,0 +1,616 @@

> +/* SPDX-License-Identifier: BSD-3-Clause

> + * Copyright(c) 2023 Intel Corporation  */

> +

> +#include <stdio.h>

> +#include <stdlib.h>

> +#include <getopt.h>

> +#include <signal.h>

> +#include <stdbool.h>

> +#include <unistd.h>

> +#include <sys/wait.h>

> +#include <inttypes.h>

> +#include <libgen.h>

> +

> +#include <rte_eal.h>

> +#include <rte_cfgfile.h>

> +#include <rte_string_fns.h>

> +#include <rte_lcore.h>

> +

> +#include "main.h"

> +

> +#define CSV_HDR_FMT "Case %u : %s,lcore,DMA,DMA ring size,kick batch

> size,buffer size(B),number of buffers,memory(MB),average

> cycle,bandwidth(Gbps),MOps\n"

> +

> +#define MAX_EAL_PARAM_NB 100

> +#define MAX_EAL_PARAM_LEN 1024

> +

> +#define DMA_MEM_COPY "DMA_MEM_COPY"

> +#define CPU_MEM_COPY "CPU_MEM_COPY"

> +

> +#define CMDLINE_CONFIG_ARG "--config"

> +#define CMDLINE_RESULT_ARG "--result"

> +

> +#define MAX_PARAMS_PER_ENTRY 4

> +

> +#define MAX_LONG_OPT_SZ 64

> +

> +enum {

> +         TEST_TYPE_NONE = 0,

> +         TEST_TYPE_DMA_MEM_COPY,

> +         TEST_TYPE_CPU_MEM_COPY

> +};

> +

> +#define MAX_TEST_CASES 16

> +static struct test_configure test_cases[MAX_TEST_CASES];

> +

> +char output_str[MAX_WORKER_NB + 1][MAX_OUTPUT_STR_LEN];

> +

> +static FILE *fd;

> +

> +static void

> +output_csv(bool need_blankline)

> +{

> +         uint32_t i;

> +

> +         if (need_blankline) {

> +                       fprintf(fd, ",,,,,,,,\n");

> +                       fprintf(fd, ",,,,,,,,\n");

> +         }

> +

> +         for (i = 0; i < RTE_DIM(output_str); i++) {

> +                       if (output_str[i][0]) {

> +                                      fprintf(fd, "%s", output_str[i]);

> +                                      output_str[i][0] = '\0';

> +                       }

> +         }

> +

> +         fflush(fd);

> +}

> +

> +static void

> +output_env_info(void)

> +{

> +         snprintf(output_str[0], MAX_OUTPUT_STR_LEN, "Test

> Environment:\n");

> +         snprintf(output_str[1], MAX_OUTPUT_STR_LEN, "CPU

> frequency,%.3lf Ghz",

> +                                      rte_get_timer_hz() / 1000000000.0);

> +

> +         output_csv(true);

> +}

> +

> +static void

> +output_header(uint32_t case_id, struct test_configure *case_cfg) {

> +         snprintf(output_str[0], MAX_OUTPUT_STR_LEN,

> +                                      CSV_HDR_FMT, case_id, case_cfg->test_type_str);

> +

> +         output_csv(true);

> +}

> +

> +static void

> +run_test_case(struct test_configure *case_cfg) {

> +         switch (case_cfg->test_type) {

> +         case TEST_TYPE_DMA_MEM_COPY:

> +                       mem_copy_benchmark(case_cfg, true);

> +                       break;

> +         case TEST_TYPE_CPU_MEM_COPY:

> +                       mem_copy_benchmark(case_cfg, false);

> +                       break;

> +         default:

> +                       printf("Unknown test type. %s\n", case_cfg->test_type_str);

> +                       break;

> +         }

> +}

> +

> +static void

> +run_test(uint32_t case_id, struct test_configure *case_cfg) {

> +         uint32_t i;

> +         uint32_t nb_lcores = rte_lcore_count();

> +         struct test_configure_entry *mem_size = &case_cfg->mem_size;

> +         struct test_configure_entry *buf_size = &case_cfg->buf_size;

> +         struct test_configure_entry *ring_size = &case_cfg->ring_size;

> +         struct test_configure_entry *kick_batch = &case_cfg->kick_batch;

> +         struct test_configure_entry dummy = { 0 };

> +         struct test_configure_entry *var_entry = &dummy;

> +

> +         for (i = 0; i < RTE_DIM(output_str); i++)

> +                       memset(output_str[i], 0, MAX_OUTPUT_STR_LEN);

> +

> +         if (nb_lcores <= case_cfg->lcore_dma_map.cnt) {

> +                       printf("Case %u: Not enough lcores.\n", case_id);

> +                       return;

> +         }

> +

> +         printf("Number of used lcores: %u.\n", nb_lcores);

> +

> +         if (mem_size->incr != 0)

> +                       var_entry = mem_size;

> +

> +         if (buf_size->incr != 0)

> +                       var_entry = buf_size;

> +

> +         if (ring_size->incr != 0)

> +                       var_entry = ring_size;

> +

> +         if (kick_batch->incr != 0)

> +                       var_entry = kick_batch;

> +

> +         case_cfg->scenario_id = 0;

> +

> +         output_header(case_id, case_cfg);

> +

> +         for (var_entry->cur = var_entry->first; var_entry->cur <= var_entry-

> >last;) {

> +                       case_cfg->scenario_id++;

> +                       printf("\nRunning scenario %d\n", case_cfg->scenario_id);

> +

> +                       run_test_case(case_cfg);

> +                       output_csv(false);

> +

> +                       if (var_entry->op == OP_ADD)

> +                                      var_entry->cur += var_entry->incr;

> +                       else if (var_entry->op == OP_MUL)

> +                                      var_entry->cur *= var_entry->incr;

> +                       else {

> +                                      printf("No proper operation for variable entry.\n");

> +                                      break;

> +                       }

> +         }

> +}

> +

> +static int

> +parse_lcore(struct test_configure *test_case, const char *value) {

> +         uint16_t len;

> +         char *input;

> +         struct lcore_dma_map_t *lcore_dma_map;

> +

> +         if (test_case == NULL || value == NULL)

> +                       return -1;

> +

> +         len = strlen(value);

> +         input = (char *)malloc((len + 1) * sizeof(char));

> +         strlcpy(input, value, len + 1);

> +         lcore_dma_map = &(test_case->lcore_dma_map);

> +

> +         memset(lcore_dma_map, 0, sizeof(struct lcore_dma_map_t));

> +

> +         char *token = strtok(input, ", ");

> +         while (token != NULL) {

> +                       if (lcore_dma_map->cnt >= MAX_LCORE_NB) {

> +                                      free(input);

> +                                      return -1;

> +                       }

> +

> +                       uint16_t lcore_id = atoi(token);

> +                       lcore_dma_map->lcores[lcore_dma_map->cnt++] = lcore_id;

> +

> +                       token = strtok(NULL, ", ");

> +         }

> +

> +         free(input);

> +         return 0;

> +}

> +

> +static int

> +parse_lcore_dma(struct test_configure *test_case, const char *value) {

> +         struct lcore_dma_map_t *lcore_dma_map;

> +         char *input, *addrs;

> +         char *ptrs[2];

> +         char *start, *end, *substr;

> +         uint16_t lcore_id;

> +         int ret = 0;

> +

> +         if (test_case == NULL || value == NULL)

> +                       return -1;

> +

> +         input = strndup(value, strlen(value) + 1);

> +         addrs = input;

> +

> +         while (*addrs == '\0')

> +                       addrs++;

> +         if (*addrs == '\0') {

> +                       fprintf(stderr, "No input DMA addresses\n");

> +                       ret = -1;

> +                       goto out;

> +         }

> +

> +         substr = strtok(addrs, ",");

> +         if (substr == NULL) {

> +                       fprintf(stderr, "No input DMA address\n");

> +                       ret = -1;

> +                       goto out;

> +         }

> +

> +         memset(&test_case->lcore_dma_map, 0, sizeof(struct

> lcore_dma_map_t));

> +

> +         do {

> +                       if (rte_strsplit(substr, strlen(substr), ptrs, 2, '@') < 0) {

> +                                      fprintf(stderr, "Illegal DMA address\n");

> +                                      ret = -1;

> +                                      break;

> +                       }

> +

> +                       start = strstr(ptrs[0], "lcore");

> +                       if (start == NULL) {

> +                                      fprintf(stderr, "Illegal lcore\n");

> +                                      ret = -1;

> +                                      break;

> +                       }

> +

> +                       start += 5;

> +                       lcore_id = strtol(start, &end, 0);

> +                       if (end == start) {

> +                                      fprintf(stderr, "No input lcore ID or ID %d is wrong\n",

> lcore_id);

> +                                      ret = -1;

> +                                      break;

> +                       }

> +

> +                       lcore_dma_map = &test_case->lcore_dma_map;

> +                       if (lcore_dma_map->cnt >= MAX_LCORE_NB) {

> +                                      fprintf(stderr, "lcores count error\n");

> +                                      ret = -1;

> +                                      break;

> +                       }

> +

> +                       lcore_dma_map->lcores[lcore_dma_map->cnt] = lcore_id;

> +                       strlcpy(lcore_dma_map->dma_names[lcore_dma_map->cnt],

> ptrs[1],

> +                                                    RTE_DEV_NAME_MAX_LEN);

> +                       lcore_dma_map->cnt++;

> +                       substr = strtok(NULL, ",");

> +         } while (substr != NULL);

> +

> +out:

> +         free(input);

> +         return ret;

> +}

> +

> +static int

> +parse_entry(const char *value, struct test_configure_entry *entry) {

> +         char input[255] = {0};

> +         char *args[MAX_PARAMS_PER_ENTRY];

> +         int args_nr = -1;

> +         int ret;

> +

> +         if (value == NULL || entry == NULL)

> +                       goto out;

> +

> +         strncpy(input, value, 254);

> +         if (*input == '\0')

> +                       goto out;

> +

> +         ret = rte_strsplit(input, strlen(input), args, MAX_PARAMS_PER_ENTRY,

> ',');

> +         if (ret != 1 && ret != 4)

> +                       goto out;

> +

> +         entry->cur = entry->first = (uint32_t)atoi(args[0]);

> +

> +         if (ret == 4) {

> +                       args_nr = 4;

> +                       entry->last = (uint32_t)atoi(args[1]);

> +                       entry->incr = (uint32_t)atoi(args[2]);

> +                       if (!strcmp(args[3], "MUL"))

> +                                      entry->op = OP_MUL;

> +                       else if (!strcmp(args[3], "ADD"))

> +                                      entry->op = OP_ADD;

> +                       else {

> +                                      args_nr = -1;

> +                                      printf("Invalid op %s.\n", args[3]);

> +                       }

> +

> +         } else {

> +                       args_nr = 1;

> +                       entry->op = OP_NONE;

> +                       entry->last = 0;

> +                       entry->incr = 0;

> +         }

> +out:

> +         return args_nr;

> +}

> +

> +static uint16_t

> +load_configs(const char *path)

> +{

> +         struct rte_cfgfile *cfgfile;

> +         int nb_sections, i;

> +         struct test_configure *test_case;

> +         char section_name[CFG_NAME_LEN];

> +         const char *case_type;

> +         const char *lcore_dma;

> +         const char *mem_size_str, *buf_size_str, *ring_size_str,

> *kick_batch_str;

> +         int args_nr, nb_vp;

> +         bool is_dma;

> +

> +         printf("config file parsing...\n");

> +         cfgfile = rte_cfgfile_load(path, 0);

> +         if (!cfgfile) {

> +                       printf("Open configure file error.\n");

> +                       exit(1);

> +         }

> +

> +         nb_sections = rte_cfgfile_num_sections(cfgfile, NULL, 0);

> +         if (nb_sections > MAX_TEST_CASES) {

> +                       printf("Error: The maximum number of cases is %d.\n",

> MAX_TEST_CASES);

> +                       exit(1);

> +         }

> +

> +         for (i = 0; i < nb_sections; i++) {

> +                       snprintf(section_name, CFG_NAME_LEN, "case%d", i + 1);

> +                       test_case = &test_cases[i];

> +                       case_type = rte_cfgfile_get_entry(cfgfile, section_name,

> "type");

> +                       if (case_type == NULL) {

> +                                      printf("Error: No case type in case %d, the test will be

> finished here.\n",

> +                                                    i + 1);

> +                                      test_case->is_valid = false;

> +                                      continue;

> +                       }

> +

> +                       if (strcmp(case_type, DMA_MEM_COPY) == 0) {

> +                                      test_case->test_type = TEST_TYPE_DMA_MEM_COPY;

> +                                      test_case->test_type_str = DMA_MEM_COPY;

> +                                      is_dma = true;

> +                       } else if (strcmp(case_type, CPU_MEM_COPY) == 0) {

> +                                      test_case->test_type = TEST_TYPE_CPU_MEM_COPY;

> +                                      test_case->test_type_str = CPU_MEM_COPY;

> +                                      is_dma = false;

> +                       } else {

> +                                      printf("Error: Wrong test case type %s in case%d.\n",

> case_type, i + 1);

> +                                      test_case->is_valid = false;

> +                                      continue;

> +                       }

> +

> +                       test_case->src_numa_node =

> (int)atoi(rte_cfgfile_get_entry(cfgfile,

> +                                                                                                             section_name,

> "src_numa_node"));

> +                       test_case->dst_numa_node =

> (int)atoi(rte_cfgfile_get_entry(cfgfile,

> +                                                                                                             section_name,

> "dst_numa_node"));

> +                       nb_vp = 0;

> +                       mem_size_str = rte_cfgfile_get_entry(cfgfile, section_name,

> "mem_size");

> +                       args_nr = parse_entry(mem_size_str, &test_case-

> >mem_size);

> +                       if (args_nr < 0) {

> +                                      printf("parse error in case %d.\n", i + 1);

> +                                      test_case->is_valid = false;

> +                                      continue;

> +                       } else if (args_nr == 4)

> +                                      nb_vp++;

> +

> +                       buf_size_str = rte_cfgfile_get_entry(cfgfile, section_name,

> "buf_size");

> +                       args_nr = parse_entry(buf_size_str, &test_case->buf_size);

> +                       if (args_nr < 0) {

> +                                      printf("parse error in case %d.\n", i + 1);

> +                                      test_case->is_valid = false;

> +                                      continue;

> +                       } else if (args_nr == 4)

> +                                      nb_vp++;

> +

> +                       if (is_dma) {

> +                                      ring_size_str = rte_cfgfile_get_entry(cfgfile,

> section_name,

> +

>            "dma_ring_size");

> +                                      args_nr = parse_entry(ring_size_str, &test_case-

> >ring_size);

> +                                      if (args_nr < 0) {

> +                                                    printf("parse error in case %d.\n", i + 1);

> +                                                    test_case->is_valid = false;

> +                                                    continue;

> +                                      } else if (args_nr == 4)

> +                                                    nb_vp++;

> +

> +                                      kick_batch_str = rte_cfgfile_get_entry(cfgfile,

> section_name, "kick_batch");

> +                                      args_nr = parse_entry(kick_batch_str, &test_case-

> >kick_batch);

> +                                      if (args_nr < 0) {

> +                                                    printf("parse error in case %d.\n", i + 1);

> +                                                    test_case->is_valid = false;

> +                                                    continue;

> +                                      } else if (args_nr == 4)

> +                                                    nb_vp++;

> +

> +                                      lcore_dma = rte_cfgfile_get_entry(cfgfile,

> section_name, "lcore_dma");

> +                                      int lcore_ret = parse_lcore_dma(test_case,

> lcore_dma);

> +                                      if (lcore_ret < 0) {

> +                                                    printf("parse lcore dma error in case %d.\n",

> i + 1);

> +                                                    test_case->is_valid = false;

> +                                                    continue;

> +                                      }

> +                       } else {

> +                                      lcore_dma = rte_cfgfile_get_entry(cfgfile,

> section_name, "lcore");

> +                                      int lcore_ret = parse_lcore(test_case, lcore_dma);

> +                                      if (lcore_ret < 0) {

> +                                                    printf("parse lcore error in case %d.\n", i + 1);

> +                                                    test_case->is_valid = false;

> +                                                    continue;

> +                                      }

> +                       }

> +

> +                       if (nb_vp > 1) {

> +                                      printf("Case %d error, each section can only have a

> single variable parameter.\n",

> +                                                                   i + 1);

> +                                      test_case->is_valid = false;

> +                                      continue;

> +                       }

> +

> +                       test_case->cache_flush =

> +                                      (uint8_t)atoi(rte_cfgfile_get_entry(cfgfile,

> section_name, "cache_flush"));

> +                       test_case->test_secs =

> (uint16_t)atoi(rte_cfgfile_get_entry(cfgfile,

> +                                                                   section_name, "test_seconds"));

> +

> +                       test_case->eal_args = rte_cfgfile_get_entry(cfgfile,

> section_name, "eal_args");

> +                       test_case->is_valid = true;

> +         }

> +

> +         rte_cfgfile_close(cfgfile);

> +         printf("config file parsing complete.\n\n");

> +         return i;

> +}

> +

> +/* Parse the argument given in the command line of the application */

> +static int append_eal_args(int argc, char **argv, const char

> +*eal_args, char **new_argv) {

> +         int i;

> +         char *tokens[MAX_EAL_PARAM_NB];

> +         char args[MAX_EAL_PARAM_LEN] = {0};

> +         int token_nb, new_argc = 0;

> +

> +         for (i = 0; i < argc; i++) {

> +                       if ((strcmp(argv[i], CMDLINE_CONFIG_ARG) == 0) ||

> +                                                    (strcmp(argv[i], CMDLINE_RESULT_ARG) == 0))

> {

> +                                      i++;

> +                                      continue;

> +                       }

> +                       strlcpy(new_argv[new_argc], argv[i], MAX_EAL_PARAM_LEN);

> +                       new_argc++;

> +         }

> +

> +         if (eal_args) {

> +                       strlcpy(args, eal_args, MAX_EAL_PARAM_LEN);

> +                       token_nb = rte_strsplit(args, strlen(args),

> +                                                                   tokens, MAX_EAL_PARAM_NB, ' ');

> +                       for (i = 0; i < token_nb; i++)

> +                                      strlcpy(new_argv[new_argc++], tokens[i],

> MAX_EAL_PARAM_LEN);

> +         }

> +

> +         return new_argc;

> +}

> +

> +int

> +main(int argc, char *argv[])

> +{

> +         int ret;

> +         uint16_t case_nb;

> +         uint32_t i, nb_lcores;

> +         pid_t cpid, wpid;

> +         int wstatus;

> +         char args[MAX_EAL_PARAM_NB][MAX_EAL_PARAM_LEN];

> +         char *pargs[MAX_EAL_PARAM_NB];

> +         char *cfg_path_ptr = NULL;

> +         char *rst_path_ptr = NULL;

> +         char rst_path[PATH_MAX];

> +         int new_argc;

> +

> +         memset(args, 0, sizeof(args));

> +

> +         for (i = 0; i < RTE_DIM(pargs); i++)

> +                       pargs[i] = args[i];

> +

> +         for (i = 0; i < (uint32_t)argc; i++) {

> +                       if (strncmp(argv[i], CMDLINE_CONFIG_ARG,

> MAX_LONG_OPT_SZ) == 0)

> +                                      cfg_path_ptr = argv[i + 1];

> +                       if (strncmp(argv[i], CMDLINE_RESULT_ARG,

> MAX_LONG_OPT_SZ) == 0)

> +                                      rst_path_ptr = argv[i + 1];

> +         }

> +         if (cfg_path_ptr == NULL) {

> +                       printf("Config file not assigned.\n");

> +                       return -1;

> +         }

> +         if (rst_path_ptr == NULL) {

> +                       strlcpy(rst_path, cfg_path_ptr, PATH_MAX);

> +                       char *token = strtok(basename(rst_path), ".");

> +                       if (token == NULL) {

> +                                      printf("Config file error.\n");

> +                                      return -1;

> +                       }

> +                       strcat(token, "_result.csv");

> +                       rst_path_ptr = rst_path;

> +         }

> +

> +         case_nb = load_configs(cfg_path_ptr);

> +         fd = fopen(rst_path_ptr, "w");

> +         if (fd == NULL) {

> +                       printf("Open output CSV file error.\n");

> +                       return -1;

> +         }

> +         fclose(fd);

> +

> +         printf("Running cases...\n");

> +         for (i = 0; i < case_nb; i++) {

> +                       if (!test_cases[i].is_valid) {

> +                                      printf("Invalid test case %d.\n\n", i + 1);

> +                                      snprintf(output_str[0], MAX_OUTPUT_STR_LEN,

> "Invalid case %d\n", i +

> +1);

> +

> +                                      fd = fopen(rst_path_ptr, "a");

> +                                      if (!fd) {

> +                                                    printf("Open output CSV file error.\n");

> +                                                    return 0;

> +                                      }

> +                                      output_csv(true);

> +                                      fclose(fd);

> +                                      continue;

> +                       }

> +

> +                       if (test_cases[i].test_type == TEST_TYPE_NONE) {

> +                                      printf("No valid test type in test case %d.\n\n", i + 1);

> +                                      snprintf(output_str[0], MAX_OUTPUT_STR_LEN,

> "Invalid case %d\n", i +

> +1);

> +

> +                                      fd = fopen(rst_path_ptr, "a");

> +                                      if (!fd) {

> +                                                    printf("Open output CSV file error.\n");

> +                                                    return 0;

> +                                      }

> +                                      output_csv(true);

> +                                      fclose(fd);

> +                                      continue;

> +                       }

> +

> +                       cpid = fork();

> +                       if (cpid < 0) {

> +                                      printf("Fork case %d failed.\n", i + 1);

> +                                      exit(EXIT_FAILURE);

> +                       } else if (cpid == 0) {

> +                                      printf("\nRunning case %u\n\n", i + 1);

> +

> +                                      new_argc = append_eal_args(argc, argv,

> test_cases[i].eal_args, pargs);

> +                                      ret = rte_eal_init(new_argc, pargs);

> +                                      if (ret < 0)

> +                                                    rte_exit(EXIT_FAILURE, "Invalid EAL

> arguments\n");

> +

> +                                      /* Check lcores. */

> +                                      nb_lcores = rte_lcore_count();

> +                                      if (nb_lcores < 2)

> +                                                    rte_exit(EXIT_FAILURE,

> +                                                                   "There should be at least 2 worker

> lcores.\n");

> +

> +                                      fd = fopen(rst_path_ptr, "a");

> +                                      if (!fd) {

> +                                                    printf("Open output CSV file error.\n");

> +                                                    return 0;

> +                                      }

> +

> +                                      output_env_info();

> +

> +                                      run_test(i + 1, &test_cases[i]);

> +

> +                                      /* clean up the EAL */

> +                                     rte_eal_cleanup();

> +

> +                                      fclose(fd);

> +

> +                                      printf("\nCase %u completed.\n\n", i + 1);

> +

> +                                      exit(EXIT_SUCCESS);

> +                       } else {

> +                                      wpid = waitpid(cpid, &wstatus, 0);

> +                                      if (wpid == -1) {

> +                                                    printf("waitpid error.\n");

> +                                                    exit(EXIT_FAILURE);

> +                                      }

> +

> +                                      if (WIFEXITED(wstatus))

> +                                                    printf("Case process exited. status %d\n\n",

> +                                                                   WEXITSTATUS(wstatus));

> +                                      else if (WIFSIGNALED(wstatus))

> +                                                    printf("Case process killed by signal %d\n\n",

> +                                                                   WTERMSIG(wstatus));

> +                                      else if (WIFSTOPPED(wstatus))

> +                                                    printf("Case process stopped by

> signal %d\n\n",

> +                                                                   WSTOPSIG(wstatus));

> +                                      else if (WIFCONTINUED(wstatus))

> +                                                    printf("Case process continued.\n\n");

> +                                      else

> +                                                    printf("Case process unknown

> terminated.\n\n");

> +                       }

> +         }

> +

> +         printf("Bye...\n");

> +         return 0;

> +}

> +

> diff --git a/app/test-dma-perf/main.h b/app/test-dma-perf/main.h new

> file mode 100644 index 0000000000..12bc3f4e3f

> --- /dev/null

> +++ b/app/test-dma-perf/main.h

> @@ -0,0 +1,64 @@

> +/* SPDX-License-Identifier: BSD-3-Clause

> + * Copyright(c) 2023 Intel Corporation  */

> +

> +#ifndef _MAIN_H_

> +#define _MAIN_H_

> +

> +

> +#include <rte_common.h>

> +#include <rte_cycles.h>

> +#include <rte_dev.h>

> +

> +#define MAX_WORKER_NB 128

> +#define MAX_OUTPUT_STR_LEN 512

> +

> +#define MAX_DMA_NB 128

> +#define MAX_LCORE_NB 256

> +

> +extern char output_str[MAX_WORKER_NB + 1][MAX_OUTPUT_STR_LEN];

> +

> +typedef enum {

> +         OP_NONE = 0,

> +         OP_ADD,

> +         OP_MUL

> +} alg_op_type;

> +

> +struct test_configure_entry {

> +         uint32_t first;

> +         uint32_t last;

> +         uint32_t incr;

> +         alg_op_type op;

> +         uint32_t cur;

> +};

> +

> +struct lcore_dma_map_t {

> +         uint32_t lcores[MAX_WORKER_NB];

> +         char dma_names[MAX_WORKER_NB][RTE_DEV_NAME_MAX_LEN];

> +         int16_t dma_ids[MAX_WORKER_NB];

> +         uint16_t cnt;

> +};

> +

> +struct test_configure {

> +         bool is_valid;

> +         uint8_t test_type;

> +         const char *test_type_str;

> +         uint16_t src_numa_node;

> +         uint16_t dst_numa_node;

> +         uint16_t opcode;

> +         bool is_dma;

> +         struct lcore_dma_map_t lcore_dma_map;

> +         struct test_configure_entry mem_size;

> +         struct test_configure_entry buf_size;

> +         struct test_configure_entry ring_size;

> +         struct test_configure_entry kick_batch;

> +         uint8_t cache_flush;

> +         uint32_t nr_buf;

> +         uint16_t test_secs;

> +         const char *eal_args;

> +         uint8_t scenario_id;

> +};

> +

> +void mem_copy_benchmark(struct test_configure *cfg, bool is_dma);

> +

> +#endif /* _MAIN_H_ */

> diff --git a/app/test-dma-perf/meson.build b/app/test-dma-

> perf/meson.build new file mode 100644 index 0000000000..bd6c264002

> --- /dev/null

> +++ b/app/test-dma-perf/meson.build

> @@ -0,0 +1,17 @@

> +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2019-2023

> +Intel Corporation

> +

> +# meson file, for building this app as part of a main DPDK build.

> +

> +if is_windows

> +    build = false

> +    reason = 'not supported on Windows'

> +    subdir_done()

> +endif

> +

> +deps += ['dmadev', 'mbuf', 'cfgfile']

> +

> +sources = files(

> +        'main.c',

> +        'benchmark.c',

> +)

> diff --git a/doc/guides/rel_notes/release_23_07.rst

> b/doc/guides/rel_notes/release_23_07.rst

> index 4459144140..796cc5517d 100644

> --- a/doc/guides/rel_notes/release_23_07.rst

> +++ b/doc/guides/rel_notes/release_23_07.rst

> @@ -200,6 +200,12 @@ New Features

>

>    Enhanced the GRO library to support TCP packets over IPv6 network.

>

> +* **Added DMA device performance test application.**

> +

> +  Added an new application to test the performance of DMA device and CPU.

> +

> +  See the :doc:`../tools/dmaperf` for more details.

> +

>

>  Removed Items

>  -------------

> diff --git a/doc/guides/tools/dmaperf.rst

> b/doc/guides/tools/dmaperf.rst new file mode 100644 index

> 0000000000..c5f8a9406f

> --- /dev/null

> +++ b/doc/guides/tools/dmaperf.rst

> @@ -0,0 +1,103 @@

> +..  SPDX-License-Identifier: BSD-3-Clause

> +    Copyright(c) 2023 Intel Corporation.

> +

> +dpdk-test-dma-perf Application

> +==============================

> +

> +The ``dpdk-test-dma-perf`` tool is a Data Plane Development Kit

> +(DPDK) application that enables testing the performance of DMA

> +(Direct Memory

> +Access) devices available within DPDK. It provides a test framework

> +to assess the performance of CPU and DMA devices under various

> +scenarios, such as varying buffer lengths. Doing so provides insight

> +into the potential performance when using these DMA devices for

> +acceleration in DPDK applications. It supports memory copy

> +performance tests for now,

> comparing the performance of CPU and DMA automatically in various

> conditions with the help of a pre-set configuration file.

> +

> +

> +Configuration

> +-------------

> +This application uses inherent DPDK EAL command-line options as well

> +as custom command-line options in the application. An example

> +configuration file for the application is provided and gives the

> +meanings for

> each parameter.

> +

> +Here is an extracted sample from the configuration file (the complete

> +sample can be found in the application source directory):

> +

> +.. code-block:: ini

> +

> +   [case1]

> +   type=DMA_MEM_COPY

> +   mem_size=10

> +   buf_size=64,8192,2,MUL

> +   dma_ring_size=1024

> +   kick_batch=32

> +   src_numa_node=0

> +   dst_numa_node=0

> +   cache_flush=0

> +   test_seconds=2

> +   lcore_dma=lcore10@0000:00:04.2, lcore11@0000:00:04.3

> +   eal_args=--in-memory --file-prefix=test

> +

> +   [case2]

> +   type=CPU_MEM_COPY

> +   mem_size=10

> +   buf_size=64,8192,2,MUL

> +   src_numa_node=0

> +   dst_numa_node=1

> +   cache_flush=0

> +   test_seconds=2

> +   lcore = 3, 4

> +   eal_args=--in-memory --no-pci

> +

> +The configuration file is divided into multiple sections, each

> +section

> represents a test case.

> +The four variables mem_size, buf_size, dma_ring_size, and kick_batch

> +can

> vary in each test case.

> +The format for this is ``variable=first,last,increment,ADD\|MUL``.

> +This means that the first value of the variable is 'first', the last

> +value is 'last', 'increment' is the step size, and ADD|MUL indicates

> +whether the change is by addition or multiplication. Each case can

> +only have one

> variable change, and each change will generate a scenario, so each

> case can have multiple scenarios.

> +

> +Parameter Definitions

> +---------------------

> +

> +- **type**: The type of the test. Currently supported types are

> `DMA_MEM_COPY` and `CPU_MEM_COPY`.

> +- **mem_size**: The size of the memory footprint.

> +- **buf_size**: The memory size of a single operation.

> +- **dma_ring_size**: The DMA ring buffer size. Must be a power of

> +two,

> and between 64 and 4096.

> +- **kick_batch**: The DMA operation batch size, should be greater

> +than 1

> normally.

> +- **src_numa_node**: Controls the NUMA node where the source memory

> is allocated.

> +- **dst_numa_node**: Controls the NUMA node where the destination

> memory is allocated.

> +- **cache_flush**: Determines whether the cache should be flushed.

> +`1`

> indicates to flush and `0` to not flush.

> +- **test_seconds**: Controls the test time for each scenario.

> +- **lcore_dma**: Specifies the lcore/DMA mapping.

> +- **lcore**: Specifies the lcore for CPU testing.

> +- **eal_args**: Specifies the EAL arguments.

> +

> +.. Note::

> +

> +         The mapping of lcore to DMA must be one-to-one and cannot be

> duplicated.

> +

> +To specify a configuration file, use the "\-\-config" flag followed

> +by the path

> to the file.

> +

> +To specify a result file, use the "\-\-result" flag followed by the

> +path to the file. If you do not specify a result file, one will be

> +generated with the same name as the configuration file, with the

> +addition

> of "_result.csv" at the end.

> +

> +

> +Running the Application

> +-----------------------

> +

> +Typical command-line invocation to execute the application:

> +

> +.. code-block:: console

> +

> +   dpdk-test-dma-perf --config=./config_dma.ini

> + --result=./res_dma.csv

> +

> +Where `config_dma.ini` is the configuration file, and `res_dma.csv`

> +will be the generated result file.

> +

> +After the tests, you can find the results in the `res_dma.csv` file.

> +

> +Limitations

> +-----------

> +

> +Currently, this tool only supports memory copy performance tests.

> +Additional enhancements are possible in the future to support more

> +types

> of tests for DMA devices and CPUs.

> diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst

> index

> 6f84fc31ff..857572da96 100644

> --- a/doc/guides/tools/index.rst

> +++ b/doc/guides/tools/index.rst

> @@ -23,3 +23,4 @@ DPDK Tools User Guides

>      testregex

>      testmldev

>      dts

> +    dmaperf

> --

> 2.40.1

>

>

>

> End of dev Digest, Vol 462, Issue 27

> ************************************