From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7B77A42D84; Thu, 29 Jun 2023 01:50:58 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5282140EDB; Thu, 29 Jun 2023 01:50:58 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id A6D9040DF6 for ; Thu, 29 Jun 2023 01:50:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1687996255; x=1719532255; h=from:to:subject:date:message-id:mime-version; bh=W31lhQCHLR81EvKwmsMXinHxX3oY2ww01xDLDukrgqQ=; b=T/6qYFrj1CXZgimC/buhQHMJYOKNDekt7vsCt7vt9CRrhKuFV/MuU89O Fn6xV3/Lp3V+BExNufi2v23RYkUthP+Us9vAI44wEVVulqbU5eRkJulum tQEDzHuMjE1YhbeVFodWaOfblv/Kq0rpKJBsAiC2SAU4MFrb9yvbG1Orr aNTA558YNku/TrUYl6A8UfCAgefuDAspLqWHqdOEzfqIKH05ApFwVCac7 X+hCzgsubhlZqda/Xw+MJ+0Qf8dFiH2hbjACSQK127ukQOAxXdvyIAi6O FA+Zf9sznDjVImyqnJEGrlDY8pfsrE3NCe/JmadhYFke/1WRXTufKtroB w==; X-IronPort-AV: E=McAfee;i="6600,9927,10755"; a="342323388" X-IronPort-AV: E=Sophos;i="6.01,166,1684825200"; d="scan'208,217";a="342323388" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2023 16:50:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10755"; a="667308590" X-IronPort-AV: E=Sophos;i="6.01,166,1684825200"; d="scan'208,217";a="667308590" Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15]) by orsmga003.jf.intel.com with ESMTP; 28 Jun 2023 16:50:53 -0700 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Wed, 28 Jun 2023 16:50:52 -0700 Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27 via Frontend Transport; Wed, 28 Jun 2023 16:50:52 -0700 Received: from NAM02-SN1-obe.outbound.protection.outlook.com (104.47.57.46) by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.27; Wed, 28 Jun 2023 16:50:52 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ePhCYFeeCuvtjnOdNt8h4hyEltzZ2Qrh0ajRw/8n1LMSDuWENovToPchaQDn/OdYMB4cpDbJiBx0Gbpcyt26gEKGT83oTsYiv5qKFUUMUHR/Tagc7jFnWtSgUa/sN5L8pWMgwgAchNNLAkr1QAYD1csEEGZUfHk9b6Q6IMZFAZCXLyikED7lD0t+qKbVACCDoPGtX893kPeZvYXNU3cGj6hOPO03OVCdyL1jpv1r+PDTNibRKjIGjGCbTIh9E6qs6AOo5dPJPWd+DyJ/iBUKU3v9ZtFZ6A0IMFJtxQfM3L+/KJgX4Ahcv2L3tRQZ+DAkS7ya/tmad0Z2v9SDZbtV+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=p4g948srMT/rul1PhqqQw8pyZu3UjsXiF/iiN8Sy6UE=; b=e1dFdzdVyeErg+53g5L+cXObR61g2sbpVFnugOAluq6w2YRXlHsTNIhQ9/f6Y7OuE/OIYXrnfNzxpkt33LQa6cr4ZtCLC4Sufq0rfQuAoXsuvuBuh1TXPIOPpQf6+W+KQLavr6tDjk5yKt3m8eErEZCMI/SqYfRFhbZoJwgiGaK3kR3a4OVonjaRguA4lateIOeBpETZqsap7RG7HlCTmBZJXk+ZQP/CaZ1ndlG6P/bbpYRVmW6ZHCrhaXxIpp32BZ9sAWRdU+M9JUkhzswr1yqggPqVglq6skyCRYfu1jOn+3vAN0ctvOwmCQwOQmYk++hYqp8ZsCs7+c89mJTlLA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from DM6PR11MB3516.namprd11.prod.outlook.com (2603:10b6:5:72::13) by DS7PR11MB6061.namprd11.prod.outlook.com (2603:10b6:8:74::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 23:50:44 +0000 Received: from DM6PR11MB3516.namprd11.prod.outlook.com ([fe80::75e:5ca0:9cfd:701c]) by DM6PR11MB3516.namprd11.prod.outlook.com ([fe80::75e:5ca0:9cfd:701c%5]) with mapi id 15.20.6521.024; Wed, 28 Jun 2023 23:50:44 +0000 From: "Zhang, Yuying" To: "dev@dpdk.org" , "Jiang, Cheng1" Subject: RE: [PATCH v10] app/dma-perf: introduce dma-perf application Thread-Topic: RE: [PATCH v10] app/dma-perf: introduce dma-perf application Thread-Index: AdmqGyeMSFFguNspQy2MQ5s4NMNBZw== Date: Wed, 28 Jun 2023 23:50:43 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: DM6PR11MB3516:EE_|DS7PR11MB6061:EE_ x-ms-office365-filtering-correlation-id: ff61716b-fdd6-4bcd-99a2-08db78327ca8 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: d2lNS2lp4lUeeasQJXYNd9gMbHwjlC6vTitjzIjYG90MdD4tqb/bY15654Qc4t1bni0bkEsc2wziT5jyc6tftXxGBf+njZGLYdi+0M9PNcqmLxP+ICXUiW5xZQSNTHbouDhGguu7RK85lc2jn3vAteXWBGQQXCBjQeb54idiag8Nc2tyMhci/g91V53ib2uKCXNfaKgJkH2KQZsgNloVaTYAlQZGUP5mQtn8dtzZG/2RyvRH4q1Gm9eRiqgeo28vyhCPCFqMIEZwqopqk728tj/J9UQ3+NjSXe0gBJtBSbes2SuFmhLFZNE9d1OmvF8Fyt0O6TTUqXz158P7rWB0r3Xf+LIFnqpArWdhOEHvzRzbw9n9Nz9Eq6ACRYtKN9ZP/WmAwYsOBV0udx3EEIgfHprvjn77OqijQCpH6YJ3lE3c1dAEegyVW9M6JS8coADMywnbW70BzO3yTD/EUpXvYESstmv16AZnsW/AXn8WVs2R4J55GniBWg16SY5IRdhgAljB8lLSoYgzarTlwCXc0WQWy1InhqU6tVtkUvshie4Cy7FXAEr57lQaZI8biVAtB/I+NKK+bDLluXC62HGsnyqjVXHFcGZmxCwcEArFMOVdxmeVFcBg/bXmupToHjCk x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR11MB3516.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(346002)(396003)(376002)(136003)(39860400002)(366004)(451199021)(55016003)(9326002)(76116006)(66446008)(64756008)(66476007)(8676002)(41300700001)(66556008)(6636002)(316002)(8936002)(66946007)(6506007)(26005)(186003)(9686003)(110136005)(478600001)(7696005)(71200400001)(2906002)(5660300002)(30864003)(52536014)(38070700005)(122000001)(38100700002)(82960400001)(33656002)(86362001)(83380400001)(579004)(559001); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?4yDpSRqVfzIPq6ulwl5U8W4MA0cNwzVVdwMBTkY78m7DxVsqKFYP8UuT6chn?= =?us-ascii?Q?CmKW1Yg0xYK8deCRav1t1J6/aOmVbOnhAu0424RLCmuRiaDHmLEi0aFxg4KJ?= =?us-ascii?Q?qFrcPyfDKk+p6FfHLhAIdA85pQ+7s2/LTOI8UoCaEfKvb1Rq4L7q56Yf8qQc?= =?us-ascii?Q?XBAYxNVr69ME5Zthw6sSkT3ez16FPbGp/1+yRlHwdvVyt9r7TSwy8fZ3Gbgx?= =?us-ascii?Q?s9TIG5daT9dhwl3tDi35qE/8XqB2koeKPmS+2S3QmnVezygz5vfbEWFZGkou?= =?us-ascii?Q?jqj7Q9eddLQCqEzcKM1cALXuEgT6p/eDXP78HvVxwVRx21aV4mX+I5bYIJIL?= =?us-ascii?Q?ejbKrB5XwKX6JNCMMb8Vn9vkMStQOU7BqGptYpj6awX1Aiu4y/wPFQ8TfmMb?= =?us-ascii?Q?k0rzKhFcMwqguogAsu+xubeQ0Ybn8GY/NHb4Djdw5DEcdbF14l4y3hjrkr7n?= =?us-ascii?Q?HtkckSGBHS7wij1v87xAioHdTB1PswradPdAfVJDRDaqnrErd2GfmnktuI5Z?= =?us-ascii?Q?qf6DLoLr6dwgHZwLXw5bRPfqRc9HA9u/UkoWxJ3MUG3i6vyurWetChsyshPQ?= =?us-ascii?Q?1u7HQB3+ES6iOjBx51xzzR+MMhvYlSPxiTSDelTBZy1AQEmh9WEbt3niuI9x?= =?us-ascii?Q?jT2/PyZkNR1/IXQrQt134VKkv91P2lEe2cS/dl7dmapAO+2PhLdiPTD3LgDG?= =?us-ascii?Q?cnb4mp4WVvcvPMQS6gsnbCrmjjjQ7PnMOMDGU/zekeoLaqtxtMdS2IEoANnG?= =?us-ascii?Q?XOvR52YPMamfTARW+HJXdLEwf/Kb447a8UirZA1hurfBE/wsu4LmbsvB7uqg?= =?us-ascii?Q?n1b1gjIOuVRw+iEgOilngs/GbX1E+UlpM541UDUKXWFjzg0OqaYQbPPAE1Rz?= =?us-ascii?Q?jWkx1yXakrUAdulQkw5j7Ao12im3c0/97h5lKvHU4b2/nvEIKwKW+/HIYXDx?= =?us-ascii?Q?q/4XI9dvdBPO4m2cVh+Jlv9nZyEJkWq8Xh1Fm/n1G14JK1dlRd19gbAp0nH8?= =?us-ascii?Q?RSxMm9Ji2QUUJ7il8eDKoqREsKwN2LEHLxM9pm+t/oLra0MRQrOBzqJquFtR?= =?us-ascii?Q?gBpybwXvovpfyT0zNA1bLlMB53bTPCbl2LZubB98K5CA7Xofi99ANjg89rzg?= =?us-ascii?Q?ymOhjU9QibxuxABuew6UOqnVz7TCVEY8Zw78e/wKuEsv60FUx3fkTXN+FJkA?= =?us-ascii?Q?PV+ztNBe+pMW6q/yw4TLs9Bem8q++dQqN3WGBfHb78NdvyHWSCAbp6+4simm?= =?us-ascii?Q?a4D3Zh/XOwxAVFXz6CTSYEKzddmtPVpMl2qBS5KECMvNlrTVDqn0y9csyOhu?= =?us-ascii?Q?PJKYeXqdFAlvhnYqL8WsvU+lVYtNIh8YbootPem6mXA/S61fZt2TyEMymh0u?= =?us-ascii?Q?H47TyBwr74YSgQKZ4pNEgwjgl2zc3T3FafC8UqLxXFxsco7kbgvVa++30xnh?= =?us-ascii?Q?MCjmP8NkZOWvtABwTYPWrqNIInXnNNpiTf2ikbYXQHRIWKBGwARWMAwJi9Oa?= =?us-ascii?Q?QsZWd8Tu4Me8YIYzlDagMRK0j7avjnJ/ivP2T6RnTBoChD/5IMY0bX3+q0oB?= =?us-ascii?Q?1i663Gvq+KuFS5P4zHxk4TBtWQnt1KqFvIRZMoFb?= Content-Type: multipart/alternative; boundary="_000_DM6PR11MB3516915C3BB0B8E4B288B4DD8E24ADM6PR11MB3516namp_" MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM6PR11MB3516.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: ff61716b-fdd6-4bcd-99a2-08db78327ca8 X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Jun 2023 23:50:43.7759 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: CFIeAwS85v3uh1Cjsi0EsdO9t5eDQfgGrXss9/2ZncLI5EatwiZ7W8VANKIUE/r70wveHLToMYYtPy/BkrRXwg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR11MB6061 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --_000_DM6PR11MB3516915C3BB0B8E4B288B4DD8E24ADM6PR11MB3516namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Cheng, LGTM. > -----Original Message----- > Date: Wed, 28 Jun 2023 01:20:34 +0000 > From: Cheng Jiang > > To: thomas@monjalon.net, bruce.richardson@int= el.com, > mb@smartsharesystems.com, che= nbo.xia@intel.com, > amitprakashs@marvell.com, ano= obj@marvell.com, > huangdengdui@huawei.com, > kevin.laatz@intel.com, fengcheng= wen@huawei.com, jerinj@marvell.com > Cc: dev@dpdk.org, jiayu.hu@intel.com, xuan.ding@intel.com, > wenwux.ma@intel.com, yuanx.wang@in= tel.com, xingguang.he@intel.com, > weix.ling@intel.com, Cheng Jiang <= cheng1.jiang@intel.com> > Subject: [PATCH v10] app/dma-perf: introduce dma-perf application > Message-ID: <20230628012034.49016-1-cheng1.jiang@intel.com> > Content-Type: text/plain; charset=3DUTF-8 > > There are many high-performance DMA devices supported in DPDK now, and > these DMA devices can also be integrated into other modules of DPDK as > accelerators, such as Vhost. Before integrating DMA into applications, > developers need to know the performance of these DMA devices in > various scenarios and the performance of CPUs in the same scenario, > such as different buffer lengths. Only in this way can we know the > target performance of the application accelerated by using them. This > patch introduces a high-performance testing tool, which supports > comparing the performance of CPU and DMA in different scenarios > automatically with a pre-set config file. Memory Copy performance test ar= e supported for now. > > Signed-off-by: Cheng Jiang > > Signed-off-by: Jiayu Hu > > Signed-off-by: Yuan Wang > > Acked-by: Morten Br?rup > > Acked-by: Chenbo Xia > Acked-by: Yuying Zhang > > --- > v10: > rebased code from 23.07-rc2; > v9: > improved error handling; > improved lcore_params structure; > improved mbuf api calling; > improved exit process; > fixed some typos; > added scenario summary data display; > removed unnecessary include; > v8: > fixed string copy issue in parse_lcore(); > improved some data display format; > added doc in doc/guides/tools; > updated release notes; > v7: > fixed some strcpy issues; > removed cache setup in calling rte_pktmbuf_pool_create(); > fixed some typos; > added some memory free and null set operations; > improved result calculation; > v6: > improved code based on Anoob's comments; > fixed some code structure issues; > v5: > fixed some LONG_LINE warnings; > v4: > fixed inaccuracy of the memory footprint display; > v3: > fixed some typos; > v2: > added lcore/dmadev designation; > added error case process; > removed worker_threads parameter from config.ini; > improved the logs; > improved config file; > > app/meson.build | 1 + > app/test-dma-perf/benchmark.c | 508 ++++++++++++++++++++ > app/test-dma-perf/config.ini | 61 +++ > app/test-dma-perf/main.c | 616 +++++++++++++++++++++++++ > app/test-dma-perf/main.h | 64 +++ > app/test-dma-perf/meson.build | 17 + > doc/guides/rel_notes/release_23_07.rst | 6 + > doc/guides/tools/dmaperf.rst | 103 +++++ > doc/guides/tools/index.rst | 1 + > 9 files changed, 1377 insertions(+) > create mode 100644 app/test-dma-perf/benchmark.c create mode 100644 > app/test-dma-perf/config.ini create mode 100644 > app/test-dma-perf/main.c create mode 100644 app/test-dma-perf/main.h > create mode 100644 app/test-dma-perf/meson.build create mode 100644 > doc/guides/tools/dmaperf.rst > > diff --git a/app/meson.build b/app/meson.build index > 74d2420f67..4fc1a83eba 100644 > --- a/app/meson.build > +++ b/app/meson.build > @@ -19,6 +19,7 @@ apps =3D [ > 'test-cmdline', > 'test-compress-perf', > 'test-crypto-perf', > + 'test-dma-perf', > 'test-eventdev', > 'test-fib', > 'test-flow-perf', > diff --git a/app/test-dma-perf/benchmark.c b/app/test-dma- > perf/benchmark.c new file mode 100644 index 0000000000..0601e0d171 > --- /dev/null > +++ b/app/test-dma-perf/benchmark.c > @@ -0,0 +1,508 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2023 Intel Corporation */ > + > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > + > +#include "main.h" > + > +#define MAX_DMA_CPL_NB 255 > + > +#define TEST_WAIT_U_SECOND 10000 > +#define POLL_MAX 1000 > + > +#define CSV_LINE_DMA_FMT "Scenario %u,%u,%s,%u,%u,%u,%u,%.2lf,%" > PRIu64 ",%.3lf,%.3lf\n" > +#define CSV_LINE_CPU_FMT "Scenario %u,%u,NA,NA,NA,%u,%u,%.2lf,%" > PRIu64 ",%.3lf,%.3lf\n" > + > +#define CSV_TOTAL_LINE_FMT "Scenario %u > Summary, , , , , ,%u,%.2lf,%u,%.3lf,%.3lf\n" > + > +struct worker_info { > + bool ready_flag; > + bool start_flag; > + bool stop_flag; > + uint32_t total_cpl; > + uint32_t test_cpl; > +}; > + > +struct lcore_params { > + uint8_t scenario_id; > + unsigned int lcore_id; > + char *dma_name; > + uint16_t worker_id; > + uint16_t dev_id; > + uint32_t nr_buf; > + uint16_t kick_batch; > + uint32_t buf_size; > + uint16_t test_secs; > + struct rte_mbuf **srcs; > + struct rte_mbuf **dsts; > + volatile struct worker_info worker_info; }; > + > +static struct rte_mempool *src_pool; > +static struct rte_mempool *dst_pool; > + > +static struct lcore_params *lcores[MAX_WORKER_NB]; > + > +#define PRINT_ERR(...) print_err(__func__, __LINE__, __VA_ARGS__) > + > +static inline int > +__rte_format_printf(3, 4) > +print_err(const char *func, int lineno, const char *format, ...) { > + va_list ap; > + int ret; > + > + ret =3D fprintf(stderr, "In %s:%d - ", func, lineno); > + va_start(ap, format); > + ret +=3D vfprintf(stderr, format, ap); > + va_end(ap); > + > + return ret; > +} > + > +static inline void > +calc_result(uint32_t buf_size, uint32_t nr_buf, uint16_t nb_workers, > uint16_t test_secs, > + uint32_t total_cnt, = float *memory, uint32_t > *ave_cycle, > + float *bandwidth, fl= oat *mops) > +{ > + float ops; > + > + *memory =3D (float)(buf_size * (nr_buf / nb_workers) * 2) / (10= 24 * > 1024); > + *ave_cycle =3D test_secs * rte_get_timer_hz() / total_cnt; > + ops =3D (float)total_cnt / test_secs; > + *mops =3D ops / (1000 * 1000); > + *bandwidth =3D (ops * buf_size * 8) / (1000 * 1000 * 1000); } > + > +static void > +output_result(uint8_t scenario_id, uint32_t lcore_id, char *dma_name, > uint16_t ring_size, > + uint16_t kick_batch, uint64_t ave_= cycle, uint32_t > buf_size, uint32_t nr_buf, > + float memory, float bandwidth, flo= at mops, bool > is_dma) { > + if (is_dma) > + printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick= Batch > Size: %u.\n", > + lcore_id, dma_name, = ring_size, kick_batch); > + else > + printf("lcore %u\n", lcore_id); > + > + printf("Average Cycles/op: %" PRIu64 ", Buffer Size: %u B, Buff= er > Number: %u, Memory: %.2lf MB, Frequency: %.3lf Ghz.\n", > + ave_cycle, buf_size, nr_buf, memor= y, > rte_get_timer_hz()/1000000000.0); > + printf("Average Bandwidth: %.3lf Gbps, MOps: %.3lf\n", bandwidt= h, > +mops); > + > + if (is_dma) > + snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN= , > CSV_LINE_DMA_FMT, > + scenario_id, lcore_id, dma_name, r= ing_size, > kick_batch, buf_size, > + nr_buf, memory, ave_cycle, bandwid= th, mops); > + else > + snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN= , > CSV_LINE_CPU_FMT, > + scenario_id, lcore_id, buf_size, > + nr_buf, memory, ave_cycle, bandwid= th, mops); } > + > +static inline void > +cache_flush_buf(__rte_unused struct rte_mbuf **array, > + __rte_unused uint32_t buf_size, > + __rte_unused uint32_t nr_buf) > +{ > +#ifdef RTE_ARCH_X86_64 > + char *data; > + struct rte_mbuf **srcs =3D array; > + uint32_t i, offset; > + > + for (i =3D 0; i < nr_buf; i++) { > + data =3D rte_pktmbuf_mtod(srcs[i], char *); > + for (offset =3D 0; offset < buf_size; offset +=3D= 64) > + __builtin_ia32_clflush(data + offs= et); > + } > +#endif > +} > + > +/* Configuration of device. */ > +static void > +configure_dmadev_queue(uint32_t dev_id, uint32_t ring_size) { > + uint16_t vchan =3D 0; > + struct rte_dma_info info; > + struct rte_dma_conf dev_config =3D { .nb_vchans =3D 1 }; > + struct rte_dma_vchan_conf qconf =3D { > + .direction =3D RTE_DMA_DIR_MEM_TO_MEM, > + .nb_desc =3D ring_size > + }; > + > + if (rte_dma_configure(dev_id, &dev_config) !=3D 0) > + rte_exit(EXIT_FAILURE, "Error with dma configure.= \n"); > + > + if (rte_dma_vchan_setup(dev_id, vchan, &qconf) !=3D 0) > + rte_exit(EXIT_FAILURE, "Error with queue configur= ation.\n"); > + > + if (rte_dma_info_get(dev_id, &info) !=3D 0) > + rte_exit(EXIT_FAILURE, "Error with getting device= info.\n"); > + > + if (info.nb_vchans !=3D 1) > + rte_exit(EXIT_FAILURE, "Error, no configured queu= es > reported on device id. %u\n", > + dev_id); > + > + if (rte_dma_start(dev_id) !=3D 0) > + rte_exit(EXIT_FAILURE, "Error with dma start.\n")= ; } > + > +static int > +config_dmadevs(struct test_configure *cfg) { > + uint32_t ring_size =3D cfg->ring_size.cur; > + struct lcore_dma_map_t *ldm =3D &cfg->lcore_dma_map; > + uint32_t nb_workers =3D ldm->cnt; > + uint32_t i; > + int dev_id; > + uint16_t nb_dmadevs =3D 0; > + char *dma_name; > + > + for (i =3D 0; i < ldm->cnt; i++) { > + dma_name =3D ldm->dma_names[i]; > + dev_id =3D rte_dma_get_dev_id_by_name(dma_name); > + if (dev_id < 0) { > + fprintf(stderr, "Error: Fail to fi= nd DMA %s.\n", > dma_name); > + goto end; > + } > + > + ldm->dma_ids[i] =3D dev_id; > + configure_dmadev_queue(dev_id, ring_size); > + ++nb_dmadevs; > + } > + > +end: > + if (nb_dmadevs < nb_workers) { > + printf("Not enough dmadevs (%u) for all workers (= %u).\n", > nb_dmadevs, nb_workers); > + return -1; > + } > + > + printf("Number of used dmadevs: %u.\n", nb_dmadevs); > + > + return 0; > +} > + > +static void > +error_exit(int dev_id) > +{ > + rte_dma_stop(dev_id); > + rte_dma_close(dev_id); > + rte_exit(EXIT_FAILURE, "DMA error\n"); } > + > +static inline void > +do_dma_submit_and_poll(uint16_t dev_id, uint64_t *async_cnt, > + volatile struct worker_info *worke= r_info) { > + int ret; > + uint16_t nr_cpl; > + > + ret =3D rte_dma_submit(dev_id, 0); > + if (ret < 0) > + error_exit(dev_id); > + > + nr_cpl =3D rte_dma_completed(dev_id, 0, MAX_DMA_CPL_NB, NULL, > NULL); > + *async_cnt -=3D nr_cpl; > + worker_info->total_cpl +=3D nr_cpl; > +} > + > +static inline int > +do_dma_mem_copy(void *p) > +{ > + struct lcore_params *para =3D (struct lcore_params *)p; > + volatile struct worker_info *worker_info =3D &(para->worker_inf= o); > + const uint16_t dev_id =3D para->dev_id; > + const uint32_t nr_buf =3D para->nr_buf; > + const uint16_t kick_batch =3D para->kick_batch; > + const uint32_t buf_size =3D para->buf_size; > + struct rte_mbuf **srcs =3D para->srcs; > + struct rte_mbuf **dsts =3D para->dsts; > + uint16_t nr_cpl; > + uint64_t async_cnt =3D 0; > + uint32_t i; > + uint32_t poll_cnt =3D 0; > + int ret; > + > + worker_info->stop_flag =3D false; > + worker_info->ready_flag =3D true; > + > + while (!worker_info->start_flag) > + ; > + > + while (1) { > + for (i =3D 0; i < nr_buf; i++) { > +dma_copy: > + ret =3D rte_dma_copy(dev_id, 0, > rte_mbuf_data_iova(srcs[i]), > + rte_mbuf_data_iova(d= sts[i]), buf_size, 0); > + if (unlikely(ret < 0)) { > + if (ret =3D=3D -ENOS= PC) { > + do_dm= a_submit_and_poll(dev_id, > &async_cnt, worker_info); > + goto = dma_copy; > + } else > + error= _exit(dev_id); > + } > + async_cnt++; > + > + if ((async_cnt % kick_batch) =3D= =3D 0) > + do_dma_submit_and_po= ll(dev_id, > &async_cnt, worker_info); > + } > + > + if (worker_info->stop_flag) > + break; > + } > + > + rte_dma_submit(dev_id, 0); > + while ((async_cnt > 0) && (poll_cnt++ < POLL_MAX)) { > + nr_cpl =3D rte_dma_completed(dev_id, 0, MAX_DMA_C= PL_NB, > NULL, NULL); > + async_cnt -=3D nr_cpl; > + } > + > + return 0; > +} > + > +static inline int > +do_cpu_mem_copy(void *p) > +{ > + struct lcore_params *para =3D (struct lcore_params *)p; > + volatile struct worker_info *worker_info =3D &(para->worker_inf= o); > + const uint32_t nr_buf =3D para->nr_buf; > + const uint32_t buf_size =3D para->buf_size; > + struct rte_mbuf **srcs =3D para->srcs; > + struct rte_mbuf **dsts =3D para->dsts; > + uint32_t i; > + > + worker_info->stop_flag =3D false; > + worker_info->ready_flag =3D true; > + > + while (!worker_info->start_flag) > + ; > + > + while (1) { > + for (i =3D 0; i < nr_buf; i++) { > + /* copy buffer form src to dst */ > + rte_memcpy((void > *)(uintptr_t)rte_mbuf_data_iova(dsts[i]), > + (void > *)(uintptr_t)rte_mbuf_data_iova(srcs[i]), > + (size_t)buf_size); > + worker_info->total_cpl++; > + } > + if (worker_info->stop_flag) > + break; > + } > + > + return 0; > +} > + > +static int > +setup_memory_env(struct test_configure *cfg, struct rte_mbuf ***srcs, > + struct rte_mbuf ***dsts) > +{ > + unsigned int buf_size =3D cfg->buf_size.cur; > + unsigned int nr_sockets; > + uint32_t nr_buf =3D cfg->nr_buf; > + > + nr_sockets =3D rte_socket_count(); > + if (cfg->src_numa_node >=3D nr_sockets || > + cfg->dst_numa_node >=3D nr_sockets) { > + printf("Error: Source or destination numa exceeds= the acture > numa nodes.\n"); > + return -1; > + } > + > + src_pool =3D rte_pktmbuf_pool_create("Benchmark_DMA_SRC", > + nr_buf, > + 0, > + 0, > + buf_size + RTE_PKTMBUF_HEADROOM, > + cfg->src_numa_node); > + if (src_pool =3D=3D NULL) { > + PRINT_ERR("Error with source mempool creation.\n"= ); > + return -1; > + } > + > + dst_pool =3D rte_pktmbuf_pool_create("Benchmark_DMA_DST", > + nr_buf, > + 0, > + 0, > + buf_size + RTE_PKTMBUF_HEADROOM, > + cfg->dst_numa_node); > + if (dst_pool =3D=3D NULL) { > + PRINT_ERR("Error with destination mempool creatio= n.\n"); > + return -1; > + } > + > + *srcs =3D rte_malloc(NULL, nr_buf * sizeof(struct rte_mbuf *), = 0); > + if (*srcs =3D=3D NULL) { > + printf("Error: srcs malloc failed.\n"); > + return -1; > + } > + > + *dsts =3D rte_malloc(NULL, nr_buf * sizeof(struct rte_mbuf *), = 0); > + if (*dsts =3D=3D NULL) { > + printf("Error: dsts malloc failed.\n"); > + return -1; > + } > + > + if (rte_pktmbuf_alloc_bulk(src_pool, *srcs, nr_buf) !=3D 0) { > + printf("alloc src mbufs failed.\n"); > + return -1; > + } > + > + if (rte_pktmbuf_alloc_bulk(dst_pool, *dsts, nr_buf) !=3D 0) { > + printf("alloc dst mbufs failed.\n"); > + return -1; > + } > + > + return 0; > +} > + > +void > +mem_copy_benchmark(struct test_configure *cfg, bool is_dma) { > + uint16_t i; > + uint32_t offset; > + unsigned int lcore_id =3D 0; > + struct rte_mbuf **srcs =3D NULL, **dsts =3D NULL; > + struct lcore_dma_map_t *ldm =3D &cfg->lcore_dma_map; > + unsigned int buf_size =3D cfg->buf_size.cur; > + uint16_t kick_batch =3D cfg->kick_batch.cur; > + uint32_t nr_buf =3D cfg->nr_buf =3D (cfg->mem_size.cur * 1024 *= 1024) / > (cfg->buf_size.cur * 2); > + uint16_t nb_workers =3D ldm->cnt; > + uint16_t test_secs =3D cfg->test_secs; > + float memory =3D 0; > + uint32_t avg_cycles =3D 0; > + uint32_t avg_cycles_total; > + float mops, mops_total; > + float bandwidth, bandwidth_total; > + > + if (setup_memory_env(cfg, &srcs, &dsts) < 0) > + goto out; > + > + if (is_dma) > + if (config_dmadevs(cfg) < 0) > + goto out; > + > + if (cfg->cache_flush =3D=3D 1) { > + cache_flush_buf(srcs, buf_size, nr_buf); > + cache_flush_buf(dsts, buf_size, nr_buf); > + rte_mb(); > + } > + > + printf("Start testing....\n"); > + > + for (i =3D 0; i < nb_workers; i++) { > + lcore_id =3D ldm->lcores[i]; > + offset =3D nr_buf / nb_workers * i; > + lcores[i] =3D rte_malloc(NULL, sizeof(struct lcor= e_params), 0); > + if (lcores[i] =3D=3D NULL) { > + printf("lcore parameters malloc fa= ilure for > lcore %d\n", lcore_id); > + break; > + } > + if (is_dma) { > + lcores[i]->dma_name =3D ldm->dma_na= mes[i]; > + lcores[i]->dev_id =3D ldm->dma_ids= [i]; > + lcores[i]->kick_batch =3D kick_bat= ch; > + } > + lcores[i]->worker_id =3D i; > + lcores[i]->nr_buf =3D (uint32_t)(nr_buf / nb_work= ers); > + lcores[i]->buf_size =3D buf_size; > + lcores[i]->test_secs =3D test_secs; > + lcores[i]->srcs =3D srcs + offset; > + lcores[i]->dsts =3D dsts + offset; > + lcores[i]->scenario_id =3D cfg->scenario_id; > + lcores[i]->lcore_id =3D lcore_id; > + > + if (is_dma) > + rte_eal_remote_launch(do_dma_mem_c= opy, (void > *)(lcores[i]), lcore_id); > + else > + rte_eal_remote_launch(do_cpu_mem_c= opy, (void > *)(lcores[i]), lcore_id); > + } > + > + while (1) { > + bool ready =3D true; > + for (i =3D 0; i < nb_workers; i++) { > + if (lcores[i]->worker_info.ready_f= lag =3D=3D false) { > + ready =3D 0; > + break; > + } > + } > + if (ready) > + break; > + } > + > + for (i =3D 0; i < nb_workers; i++) > + lcores[i]->worker_info.start_flag =3D true; > + > + usleep(TEST_WAIT_U_SECOND); > + for (i =3D 0; i < nb_workers; i++) > + lcores[i]->worker_info.test_cpl =3D lcores[i]- > >worker_info.total_cpl; > + > + usleep(test_secs * 1000 * 1000); > + for (i =3D 0; i < nb_workers; i++) > + lcores[i]->worker_info.test_cpl =3D lcores[i]- > >worker_info.total_cpl - > + = lcores[i]- > >worker_info.test_cpl; > + > + for (i =3D 0; i < nb_workers; i++) > + lcores[i]->worker_info.stop_flag =3D true; > + > + rte_eal_mp_wait_lcore(); > + > + mops_total =3D 0; > + bandwidth_total =3D 0; > + avg_cycles_total =3D 0; > + for (i =3D 0; i < nb_workers; i++) { > + calc_result(buf_size, nr_buf, nb_workers, test_se= cs, > + lcores[i]->worker_info.test_cpl, > + &memory, &avg_cycles, &bandwidth, = &mops); > + output_result(cfg->scenario_id, lcores[i]->lcore_= id, > + lcore= s[i]->dma_name, cfg- > >ring_size.cur, kick_batch, > + avg_c= ycles, buf_size, nr_buf / > nb_workers, memory, > + bandw= idth, mops, is_dma); > + mops_total +=3D mops; > + bandwidth_total +=3D bandwidth; > + avg_cycles_total +=3D avg_cycles; > + } > + printf("\nTotal Bandwidth: %.3lf Gbps, Total MOps: %.3lf\n", > bandwidth_total, mops_total); > + snprintf(output_str[MAX_WORKER_NB], MAX_OUTPUT_STR_LEN, > CSV_TOTAL_LINE_FMT, > + cfg->scenario_id, nr_buf, memory *= nb_workers, > + avg_cycles_total / nb_workers, ban= dwidth_total, > mops_total); > + > +out: > + /* free mbufs used in the test */ > + if (srcs !=3D NULL) > + rte_pktmbuf_free_bulk(srcs, nr_buf); > + if (dsts !=3D NULL) > + rte_pktmbuf_free_bulk(dsts, nr_buf); > + > + /* free the points for the mbufs */ > + rte_free(srcs); > + srcs =3D NULL; > + rte_free(dsts); > + dsts =3D NULL; > + > + rte_mempool_free(src_pool); > + src_pool =3D NULL; > + > + rte_mempool_free(dst_pool); > + dst_pool =3D NULL; > + > + /* free the worker parameters */ > + for (i =3D 0; i < nb_workers; i++) { > + rte_free(lcores[i]); > + lcores[i] =3D NULL; > + } > + > + if (is_dma) { > + for (i =3D 0; i < nb_workers; i++) { > + printf("Stopping dmadev %d\n", ldm= ->dma_ids[i]); > + rte_dma_stop(ldm->dma_ids[i]); > + } > + } > +} > diff --git a/app/test-dma-perf/config.ini > b/app/test-dma-perf/config.ini new file mode 100644 index > 0000000000..b550f4b23f > --- /dev/null > +++ b/app/test-dma-perf/config.ini > @@ -0,0 +1,61 @@ > + > +; This is an example configuration file for dma-perf, which details > +the meanings of each parameter ; and instructions on how to use dma-perf= . > + > +; Supported test types are DMA_MEM_COPY and CPU_MEM_COPY. > + > +; Parameters: > +; "mem_size" denotes the size of the memory footprint. > +; "buf_size" denotes the memory size of a single operation. > +; "dma_ring_size" denotes the dma ring buffer size. It should be must > +be a power of two, and between ; 64 and 4096. > +; "kick_batch" denotes the dma operation batch size, and should be > +greater > than 1 normally. > + > +; The format for variables is variable=3Dfirst,last,increment,ADD|MUL. > + > +; src_numa_node is used to control the numa node where the source > memory is allocated. > +; dst_numa_node is used to control the numa node where the > +destination > memory is allocated. > + > +; cache_flush is used to determine whether or not the cache should be > +flushed, with 1 indicating to ; flush and 0 indicating to not flush. > + > +; test_seconds controls the test time of the whole case. > + > +; To use DMA for a test, please specify the "lcore_dma" parameter. > +; If you have already set the "-l" and "-a" parameters using EAL, ; > +make sure that the value of "lcore_dma" falls within their range of > +the > values. > +; We have to ensure a 1:1 mapping between the core and DMA device. > + > +; To use CPU for a test, please specify the "lcore" parameter. > +; If you have already set the "-l" and "-a" parameters using EAL, ; > +make sure that the value of "lcore" falls within their range of values. > + > +; To specify a configuration file, use the "--config" flag followed > +by the path > to the file. > + > +; To specify a result file, use the "--result" flag followed by the > +path to the > file. > +; If you do not specify a result file, one will be generated with the > +same name as the configuration ; file, with the addition of > +"_result.csv" at > the end. > + > +[case1] > +type=3DDMA_MEM_COPY > +mem_size=3D10 > +buf_size=3D64,8192,2,MUL > +dma_ring_size=3D1024 > +kick_batch=3D32 > +src_numa_node=3D0 > +dst_numa_node=3D0 > +cache_flush=3D0 > +test_seconds=3D2 > +lcore_dma=3Dlcore10@0000:00:04.2, lcore11@0000:00:04.3 > +eal_args=3D--in-memory --file-prefix=3Dtest > + > +[case2] > +type=3DCPU_MEM_COPY > +mem_size=3D10 > +buf_size=3D64,8192,2,MUL > +src_numa_node=3D0 > +dst_numa_node=3D1 > +cache_flush=3D0 > +test_seconds=3D2 > +lcore =3D 3, 4 > +eal_args=3D--in-memory --no-pci > diff --git a/app/test-dma-perf/main.c b/app/test-dma-perf/main.c new > file mode 100644 index 0000000000..de37120df6 > --- /dev/null > +++ b/app/test-dma-perf/main.c > @@ -0,0 +1,616 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2023 Intel Corporation */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > + > +#include "main.h" > + > +#define CSV_HDR_FMT "Case %u : %s,lcore,DMA,DMA ring size,kick batch > size,buffer size(B),number of buffers,memory(MB),average > cycle,bandwidth(Gbps),MOps\n" > + > +#define MAX_EAL_PARAM_NB 100 > +#define MAX_EAL_PARAM_LEN 1024 > + > +#define DMA_MEM_COPY "DMA_MEM_COPY" > +#define CPU_MEM_COPY "CPU_MEM_COPY" > + > +#define CMDLINE_CONFIG_ARG "--config" > +#define CMDLINE_RESULT_ARG "--result" > + > +#define MAX_PARAMS_PER_ENTRY 4 > + > +#define MAX_LONG_OPT_SZ 64 > + > +enum { > + TEST_TYPE_NONE =3D 0, > + TEST_TYPE_DMA_MEM_COPY, > + TEST_TYPE_CPU_MEM_COPY > +}; > + > +#define MAX_TEST_CASES 16 > +static struct test_configure test_cases[MAX_TEST_CASES]; > + > +char output_str[MAX_WORKER_NB + 1][MAX_OUTPUT_STR_LEN]; > + > +static FILE *fd; > + > +static void > +output_csv(bool need_blankline) > +{ > + uint32_t i; > + > + if (need_blankline) { > + fprintf(fd, ",,,,,,,,\n"); > + fprintf(fd, ",,,,,,,,\n"); > + } > + > + for (i =3D 0; i < RTE_DIM(output_str); i++) { > + if (output_str[i][0]) { > + fprintf(fd, "%s", output_str[i]); > + output_str[i][0] =3D '\0'; > + } > + } > + > + fflush(fd); > +} > + > +static void > +output_env_info(void) > +{ > + snprintf(output_str[0], MAX_OUTPUT_STR_LEN, "Test > Environment:\n"); > + snprintf(output_str[1], MAX_OUTPUT_STR_LEN, "CPU > frequency,%.3lf Ghz", > + rte_get_timer_hz() / 1000000000.0)= ; > + > + output_csv(true); > +} > + > +static void > +output_header(uint32_t case_id, struct test_configure *case_cfg) { > + snprintf(output_str[0], MAX_OUTPUT_STR_LEN, > + CSV_HDR_FMT, case_id, case_cfg->te= st_type_str); > + > + output_csv(true); > +} > + > +static void > +run_test_case(struct test_configure *case_cfg) { > + switch (case_cfg->test_type) { > + case TEST_TYPE_DMA_MEM_COPY: > + mem_copy_benchmark(case_cfg, true); > + break; > + case TEST_TYPE_CPU_MEM_COPY: > + mem_copy_benchmark(case_cfg, false); > + break; > + default: > + printf("Unknown test type. %s\n", case_cfg->test_= type_str); > + break; > + } > +} > + > +static void > +run_test(uint32_t case_id, struct test_configure *case_cfg) { > + uint32_t i; > + uint32_t nb_lcores =3D rte_lcore_count(); > + struct test_configure_entry *mem_size =3D &case_cfg->mem_size; > + struct test_configure_entry *buf_size =3D &case_cfg->buf_size; > + struct test_configure_entry *ring_size =3D &case_cfg->ring_size= ; > + struct test_configure_entry *kick_batch =3D &case_cfg->kick_bat= ch; > + struct test_configure_entry dummy =3D { 0 }; > + struct test_configure_entry *var_entry =3D &dummy; > + > + for (i =3D 0; i < RTE_DIM(output_str); i++) > + memset(output_str[i], 0, MAX_OUTPUT_STR_LEN); > + > + if (nb_lcores <=3D case_cfg->lcore_dma_map.cnt) { > + printf("Case %u: Not enough lcores.\n", case_id); > + return; > + } > + > + printf("Number of used lcores: %u.\n", nb_lcores); > + > + if (mem_size->incr !=3D 0) > + var_entry =3D mem_size; > + > + if (buf_size->incr !=3D 0) > + var_entry =3D buf_size; > + > + if (ring_size->incr !=3D 0) > + var_entry =3D ring_size; > + > + if (kick_batch->incr !=3D 0) > + var_entry =3D kick_batch; > + > + case_cfg->scenario_id =3D 0; > + > + output_header(case_id, case_cfg); > + > + for (var_entry->cur =3D var_entry->first; var_entry->cur <=3D v= ar_entry- > >last;) { > + case_cfg->scenario_id++; > + printf("\nRunning scenario %d\n", case_cfg->scena= rio_id); > + > + run_test_case(case_cfg); > + output_csv(false); > + > + if (var_entry->op =3D=3D OP_ADD) > + var_entry->cur +=3D var_entry->inc= r; > + else if (var_entry->op =3D=3D OP_MUL) > + var_entry->cur *=3D var_entry->inc= r; > + else { > + printf("No proper operation for va= riable entry.\n"); > + break; > + } > + } > +} > + > +static int > +parse_lcore(struct test_configure *test_case, const char *value) { > + uint16_t len; > + char *input; > + struct lcore_dma_map_t *lcore_dma_map; > + > + if (test_case =3D=3D NULL || value =3D=3D NULL) > + return -1; > + > + len =3D strlen(value); > + input =3D (char *)malloc((len + 1) * sizeof(char)); > + strlcpy(input, value, len + 1); > + lcore_dma_map =3D &(test_case->lcore_dma_map); > + > + memset(lcore_dma_map, 0, sizeof(struct lcore_dma_map_t)); > + > + char *token =3D strtok(input, ", "); > + while (token !=3D NULL) { > + if (lcore_dma_map->cnt >=3D MAX_LCORE_NB) { > + free(input); > + return -1; > + } > + > + uint16_t lcore_id =3D atoi(token); > + lcore_dma_map->lcores[lcore_dma_map->cnt++] =3D l= core_id; > + > + token =3D strtok(NULL, ", "); > + } > + > + free(input); > + return 0; > +} > + > +static int > +parse_lcore_dma(struct test_configure *test_case, const char *value) { > + struct lcore_dma_map_t *lcore_dma_map; > + char *input, *addrs; > + char *ptrs[2]; > + char *start, *end, *substr; > + uint16_t lcore_id; > + int ret =3D 0; > + > + if (test_case =3D=3D NULL || value =3D=3D NULL) > + return -1; > + > + input =3D strndup(value, strlen(value) + 1); > + addrs =3D input; > + > + while (*addrs =3D=3D '\0') > + addrs++; > + if (*addrs =3D=3D '\0') { > + fprintf(stderr, "No input DMA addresses\n"); > + ret =3D -1; > + goto out; > + } > + > + substr =3D strtok(addrs, ","); > + if (substr =3D=3D NULL) { > + fprintf(stderr, "No input DMA address\n"); > + ret =3D -1; > + goto out; > + } > + > + memset(&test_case->lcore_dma_map, 0, sizeof(struct > lcore_dma_map_t)); > + > + do { > + if (rte_strsplit(substr, strlen(substr), ptrs, 2,= '@') < 0) { > + fprintf(stderr, "Illegal DMA addre= ss\n"); > + ret =3D -1; > + break; > + } > + > + start =3D strstr(ptrs[0], "lcore"); > + if (start =3D=3D NULL) { > + fprintf(stderr, "Illegal lcore\n")= ; > + ret =3D -1; > + break; > + } > + > + start +=3D 5; > + lcore_id =3D strtol(start, &end, 0); > + if (end =3D=3D start) { > + fprintf(stderr, "No input lcore ID= or ID %d is wrong\n", > lcore_id); > + ret =3D -1; > + break; > + } > + > + lcore_dma_map =3D &test_case->lcore_dma_map; > + if (lcore_dma_map->cnt >=3D MAX_LCORE_NB) { > + fprintf(stderr, "lcores count erro= r\n"); > + ret =3D -1; > + break; > + } > + > + lcore_dma_map->lcores[lcore_dma_map->cnt] =3D lco= re_id; > + strlcpy(lcore_dma_map->dma_names[lcore_dma_map->c= nt], > ptrs[1], > + RTE_DEV_NAME_MAX_LEN= ); > + lcore_dma_map->cnt++; > + substr =3D strtok(NULL, ","); > + } while (substr !=3D NULL); > + > +out: > + free(input); > + return ret; > +} > + > +static int > +parse_entry(const char *value, struct test_configure_entry *entry) { > + char input[255] =3D {0}; > + char *args[MAX_PARAMS_PER_ENTRY]; > + int args_nr =3D -1; > + int ret; > + > + if (value =3D=3D NULL || entry =3D=3D NULL) > + goto out; > + > + strncpy(input, value, 254); > + if (*input =3D=3D '\0') > + goto out; > + > + ret =3D rte_strsplit(input, strlen(input), args, MAX_PARAMS_PER= _ENTRY, > ','); > + if (ret !=3D 1 && ret !=3D 4) > + goto out; > + > + entry->cur =3D entry->first =3D (uint32_t)atoi(args[0]); > + > + if (ret =3D=3D 4) { > + args_nr =3D 4; > + entry->last =3D (uint32_t)atoi(args[1]); > + entry->incr =3D (uint32_t)atoi(args[2]); > + if (!strcmp(args[3], "MUL")) > + entry->op =3D OP_MUL; > + else if (!strcmp(args[3], "ADD")) > + entry->op =3D OP_ADD; > + else { > + args_nr =3D -1; > + printf("Invalid op %s.\n", args[3]= ); > + } > + > + } else { > + args_nr =3D 1; > + entry->op =3D OP_NONE; > + entry->last =3D 0; > + entry->incr =3D 0; > + } > +out: > + return args_nr; > +} > + > +static uint16_t > +load_configs(const char *path) > +{ > + struct rte_cfgfile *cfgfile; > + int nb_sections, i; > + struct test_configure *test_case; > + char section_name[CFG_NAME_LEN]; > + const char *case_type; > + const char *lcore_dma; > + const char *mem_size_str, *buf_size_str, *ring_size_str, > *kick_batch_str; > + int args_nr, nb_vp; > + bool is_dma; > + > + printf("config file parsing...\n"); > + cfgfile =3D rte_cfgfile_load(path, 0); > + if (!cfgfile) { > + printf("Open configure file error.\n"); > + exit(1); > + } > + > + nb_sections =3D rte_cfgfile_num_sections(cfgfile, NULL, 0); > + if (nb_sections > MAX_TEST_CASES) { > + printf("Error: The maximum number of cases is %d.= \n", > MAX_TEST_CASES); > + exit(1); > + } > + > + for (i =3D 0; i < nb_sections; i++) { > + snprintf(section_name, CFG_NAME_LEN, "case%d", i = + 1); > + test_case =3D &test_cases[i]; > + case_type =3D rte_cfgfile_get_entry(cfgfile, sect= ion_name, > "type"); > + if (case_type =3D=3D NULL) { > + printf("Error: No case type in cas= e %d, the test will be > finished here.\n", > + i + 1); > + test_case->is_valid =3D false; > + continue; > + } > + > + if (strcmp(case_type, DMA_MEM_COPY) =3D=3D 0) { > + test_case->test_type =3D TEST_TYPE= _DMA_MEM_COPY; > + test_case->test_type_str =3D DMA_M= EM_COPY; > + is_dma =3D true; > + } else if (strcmp(case_type, CPU_MEM_COPY) =3D=3D= 0) { > + test_case->test_type =3D TEST_TYPE= _CPU_MEM_COPY; > + test_case->test_type_str =3D CPU_M= EM_COPY; > + is_dma =3D false; > + } else { > + printf("Error: Wrong test case typ= e %s in case%d.\n", > case_type, i + 1); > + test_case->is_valid =3D false; > + continue; > + } > + > + test_case->src_numa_node =3D > (int)atoi(rte_cfgfile_get_entry(cfgfile, > + = section_name, > "src_numa_node")); > + test_case->dst_numa_node =3D > (int)atoi(rte_cfgfile_get_entry(cfgfile, > + = section_name, > "dst_numa_node")); > + nb_vp =3D 0; > + mem_size_str =3D rte_cfgfile_get_entry(cfgfile, s= ection_name, > "mem_size"); > + args_nr =3D parse_entry(mem_size_str, &test_case- > >mem_size); > + if (args_nr < 0) { > + printf("parse error in case %d.\n"= , i + 1); > + test_case->is_valid =3D false; > + continue; > + } else if (args_nr =3D=3D 4) > + nb_vp++; > + > + buf_size_str =3D rte_cfgfile_get_entry(cfgfile, s= ection_name, > "buf_size"); > + args_nr =3D parse_entry(buf_size_str, &test_case-= >buf_size); > + if (args_nr < 0) { > + printf("parse error in case %d.\n"= , i + 1); > + test_case->is_valid =3D false; > + continue; > + } else if (args_nr =3D=3D 4) > + nb_vp++; > + > + if (is_dma) { > + ring_size_str =3D rte_cfgfile_get_= entry(cfgfile, > section_name, > + > "dma_ring_size"); > + args_nr =3D parse_entry(ring_size_= str, &test_case- > >ring_size); > + if (args_nr < 0) { > + printf("parse error = in case %d.\n", i + 1); > + test_case->is_valid = =3D false; > + continue; > + } else if (args_nr =3D=3D 4) > + nb_vp++; > + > + kick_batch_str =3D rte_cfgfile_get= _entry(cfgfile, > section_name, "kick_batch"); > + args_nr =3D parse_entry(kick_batch= _str, &test_case- > >kick_batch); > + if (args_nr < 0) { > + printf("parse error = in case %d.\n", i + 1); > + test_case->is_valid = =3D false; > + continue; > + } else if (args_nr =3D=3D 4) > + nb_vp++; > + > + lcore_dma =3D rte_cfgfile_get_entr= y(cfgfile, > section_name, "lcore_dma"); > + int lcore_ret =3D parse_lcore_dma(= test_case, > lcore_dma); > + if (lcore_ret < 0) { > + printf("parse lcore = dma error in case %d.\n", > i + 1); > + test_case->is_valid = =3D false; > + continue; > + } > + } else { > + lcore_dma =3D rte_cfgfile_get_entr= y(cfgfile, > section_name, "lcore"); > + int lcore_ret =3D parse_lcore(test= _case, lcore_dma); > + if (lcore_ret < 0) { > + printf("parse lcore = error in case %d.\n", i + 1); > + test_case->is_valid = =3D false; > + continue; > + } > + } > + > + if (nb_vp > 1) { > + printf("Case %d error, each sectio= n can only have a > single variable parameter.\n", > + i + 1= ); > + test_case->is_valid =3D false; > + continue; > + } > + > + test_case->cache_flush =3D > + (uint8_t)atoi(rte_cfgfile_get_entr= y(cfgfile, > section_name, "cache_flush")); > + test_case->test_secs =3D > (uint16_t)atoi(rte_cfgfile_get_entry(cfgfile, > + secti= on_name, "test_seconds")); > + > + test_case->eal_args =3D rte_cfgfile_get_entry(cfg= file, > section_name, "eal_args"); > + test_case->is_valid =3D true; > + } > + > + rte_cfgfile_close(cfgfile); > + printf("config file parsing complete.\n\n"); > + return i; > +} > + > +/* Parse the argument given in the command line of the application */ > +static int append_eal_args(int argc, char **argv, const char > +*eal_args, char **new_argv) { > + int i; > + char *tokens[MAX_EAL_PARAM_NB]; > + char args[MAX_EAL_PARAM_LEN] =3D {0}; > + int token_nb, new_argc =3D 0; > + > + for (i =3D 0; i < argc; i++) { > + if ((strcmp(argv[i], CMDLINE_CONFIG_ARG) =3D=3D 0= ) || > + (strcmp(argv[i], CMD= LINE_RESULT_ARG) =3D=3D 0)) > { > + i++; > + continue; > + } > + strlcpy(new_argv[new_argc], argv[i], MAX_EAL_PARA= M_LEN); > + new_argc++; > + } > + > + if (eal_args) { > + strlcpy(args, eal_args, MAX_EAL_PARAM_LEN); > + token_nb =3D rte_strsplit(args, strlen(args), > + token= s, MAX_EAL_PARAM_NB, ' '); > + for (i =3D 0; i < token_nb; i++) > + strlcpy(new_argv[new_argc++], toke= ns[i], > MAX_EAL_PARAM_LEN); > + } > + > + return new_argc; > +} > + > +int > +main(int argc, char *argv[]) > +{ > + int ret; > + uint16_t case_nb; > + uint32_t i, nb_lcores; > + pid_t cpid, wpid; > + int wstatus; > + char args[MAX_EAL_PARAM_NB][MAX_EAL_PARAM_LEN]; > + char *pargs[MAX_EAL_PARAM_NB]; > + char *cfg_path_ptr =3D NULL; > + char *rst_path_ptr =3D NULL; > + char rst_path[PATH_MAX]; > + int new_argc; > + > + memset(args, 0, sizeof(args)); > + > + for (i =3D 0; i < RTE_DIM(pargs); i++) > + pargs[i] =3D args[i]; > + > + for (i =3D 0; i < (uint32_t)argc; i++) { > + if (strncmp(argv[i], CMDLINE_CONFIG_ARG, > MAX_LONG_OPT_SZ) =3D=3D 0) > + cfg_path_ptr =3D argv[i + 1]; > + if (strncmp(argv[i], CMDLINE_RESULT_ARG, > MAX_LONG_OPT_SZ) =3D=3D 0) > + rst_path_ptr =3D argv[i + 1]; > + } > + if (cfg_path_ptr =3D=3D NULL) { > + printf("Config file not assigned.\n"); > + return -1; > + } > + if (rst_path_ptr =3D=3D NULL) { > + strlcpy(rst_path, cfg_path_ptr, PATH_MAX); > + char *token =3D strtok(basename(rst_path), "."); > + if (token =3D=3D NULL) { > + printf("Config file error.\n"); > + return -1; > + } > + strcat(token, "_result.csv"); > + rst_path_ptr =3D rst_path; > + } > + > + case_nb =3D load_configs(cfg_path_ptr); > + fd =3D fopen(rst_path_ptr, "w"); > + if (fd =3D=3D NULL) { > + printf("Open output CSV file error.\n"); > + return -1; > + } > + fclose(fd); > + > + printf("Running cases...\n"); > + for (i =3D 0; i < case_nb; i++) { > + if (!test_cases[i].is_valid) { > + printf("Invalid test case %d.\n\n"= , i + 1); > + snprintf(output_str[0], MAX_OUTPUT= _STR_LEN, > "Invalid case %d\n", i + > +1); > + > + fd =3D fopen(rst_path_ptr, "a"); > + if (!fd) { > + printf("Open output = CSV file error.\n"); > + return 0; > + } > + output_csv(true); > + fclose(fd); > + continue; > + } > + > + if (test_cases[i].test_type =3D=3D TEST_TYPE_NONE= ) { > + printf("No valid test type in test= case %d.\n\n", i + 1); > + snprintf(output_str[0], MAX_OUTPUT= _STR_LEN, > "Invalid case %d\n", i + > +1); > + > + fd =3D fopen(rst_path_ptr, "a"); > + if (!fd) { > + printf("Open output = CSV file error.\n"); > + return 0; > + } > + output_csv(true); > + fclose(fd); > + continue; > + } > + > + cpid =3D fork(); > + if (cpid < 0) { > + printf("Fork case %d failed.\n", i= + 1); > + exit(EXIT_FAILURE); > + } else if (cpid =3D=3D 0) { > + printf("\nRunning case %u\n\n", i = + 1); > + > + new_argc =3D append_eal_args(argc,= argv, > test_cases[i].eal_args, pargs); > + ret =3D rte_eal_init(new_argc, par= gs); > + if (ret < 0) > + rte_exit(EXIT_FAILUR= E, "Invalid EAL > arguments\n"); > + > + /* Check lcores. */ > + nb_lcores =3D rte_lcore_count(); > + if (nb_lcores < 2) > + rte_exit(EXIT_FAILUR= E, > + "Ther= e should be at least 2 worker > lcores.\n"); > + > + fd =3D fopen(rst_path_ptr, "a"); > + if (!fd) { > + printf("Open output = CSV file error.\n"); > + return 0; > + } > + > + output_env_info(); > + > + run_test(i + 1, &test_cases[i]); > + > + /* clean up the EAL */ > + rte_eal_cleanup(); > + > + fclose(fd); > + > + printf("\nCase %u completed.\n\n",= i + 1); > + > + exit(EXIT_SUCCESS); > + } else { > + wpid =3D waitpid(cpid, &wstatus, 0= ); > + if (wpid =3D=3D -1) { > + printf("waitpid erro= r.\n"); > + exit(EXIT_FAILURE); > + } > + > + if (WIFEXITED(wstatus)) > + printf("Case process= exited. status %d\n\n", > + WEXIT= STATUS(wstatus)); > + else if (WIFSIGNALED(wstatus)) > + printf("Case process= killed by signal %d\n\n", > + WTERM= SIG(wstatus)); > + else if (WIFSTOPPED(wstatus)) > + printf("Case process= stopped by > signal %d\n\n", > + WSTOP= SIG(wstatus)); > + else if (WIFCONTINUED(wstatus)) > + printf("Case process= continued.\n\n"); > + else > + printf("Case process= unknown > terminated.\n\n"); > + } > + } > + > + printf("Bye...\n"); > + return 0; > +} > + > diff --git a/app/test-dma-perf/main.h b/app/test-dma-perf/main.h new > file mode 100644 index 0000000000..12bc3f4e3f > --- /dev/null > +++ b/app/test-dma-perf/main.h > @@ -0,0 +1,64 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2023 Intel Corporation */ > + > +#ifndef _MAIN_H_ > +#define _MAIN_H_ > + > + > +#include > +#include > +#include > + > +#define MAX_WORKER_NB 128 > +#define MAX_OUTPUT_STR_LEN 512 > + > +#define MAX_DMA_NB 128 > +#define MAX_LCORE_NB 256 > + > +extern char output_str[MAX_WORKER_NB + 1][MAX_OUTPUT_STR_LEN]; > + > +typedef enum { > + OP_NONE =3D 0, > + OP_ADD, > + OP_MUL > +} alg_op_type; > + > +struct test_configure_entry { > + uint32_t first; > + uint32_t last; > + uint32_t incr; > + alg_op_type op; > + uint32_t cur; > +}; > + > +struct lcore_dma_map_t { > + uint32_t lcores[MAX_WORKER_NB]; > + char dma_names[MAX_WORKER_NB][RTE_DEV_NAME_MAX_LEN]; > + int16_t dma_ids[MAX_WORKER_NB]; > + uint16_t cnt; > +}; > + > +struct test_configure { > + bool is_valid; > + uint8_t test_type; > + const char *test_type_str; > + uint16_t src_numa_node; > + uint16_t dst_numa_node; > + uint16_t opcode; > + bool is_dma; > + struct lcore_dma_map_t lcore_dma_map; > + struct test_configure_entry mem_size; > + struct test_configure_entry buf_size; > + struct test_configure_entry ring_size; > + struct test_configure_entry kick_batch; > + uint8_t cache_flush; > + uint32_t nr_buf; > + uint16_t test_secs; > + const char *eal_args; > + uint8_t scenario_id; > +}; > + > +void mem_copy_benchmark(struct test_configure *cfg, bool is_dma); > + > +#endif /* _MAIN_H_ */ > diff --git a/app/test-dma-perf/meson.build b/app/test-dma- > perf/meson.build new file mode 100644 index 0000000000..bd6c264002 > --- /dev/null > +++ b/app/test-dma-perf/meson.build > @@ -0,0 +1,17 @@ > +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2019-2023 > +Intel Corporation > + > +# meson file, for building this app as part of a main DPDK build. > + > +if is_windows > + build =3D false > + reason =3D 'not supported on Windows' > + subdir_done() > +endif > + > +deps +=3D ['dmadev', 'mbuf', 'cfgfile'] > + > +sources =3D files( > + 'main.c', > + 'benchmark.c', > +) > diff --git a/doc/guides/rel_notes/release_23_07.rst > b/doc/guides/rel_notes/release_23_07.rst > index 4459144140..796cc5517d 100644 > --- a/doc/guides/rel_notes/release_23_07.rst > +++ b/doc/guides/rel_notes/release_23_07.rst > @@ -200,6 +200,12 @@ New Features > > Enhanced the GRO library to support TCP packets over IPv6 network. > > +* **Added DMA device performance test application.** > + > + Added an new application to test the performance of DMA device and CPU= . > + > + See the :doc:`../tools/dmaperf` for more details. > + > > Removed Items > ------------- > diff --git a/doc/guides/tools/dmaperf.rst > b/doc/guides/tools/dmaperf.rst new file mode 100644 index > 0000000000..c5f8a9406f > --- /dev/null > +++ b/doc/guides/tools/dmaperf.rst > @@ -0,0 +1,103 @@ > +.. SPDX-License-Identifier: BSD-3-Clause > + Copyright(c) 2023 Intel Corporation. > + > +dpdk-test-dma-perf Application > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D > + > +The ``dpdk-test-dma-perf`` tool is a Data Plane Development Kit > +(DPDK) application that enables testing the performance of DMA > +(Direct Memory > +Access) devices available within DPDK. It provides a test framework > +to assess the performance of CPU and DMA devices under various > +scenarios, such as varying buffer lengths. Doing so provides insight > +into the potential performance when using these DMA devices for > +acceleration in DPDK applications. It supports memory copy > +performance tests for now, > comparing the performance of CPU and DMA automatically in various > conditions with the help of a pre-set configuration file. > + > + > +Configuration > +------------- > +This application uses inherent DPDK EAL command-line options as well > +as custom command-line options in the application. An example > +configuration file for the application is provided and gives the > +meanings for > each parameter. > + > +Here is an extracted sample from the configuration file (the complete > +sample can be found in the application source directory): > + > +.. code-block:: ini > + > + [case1] > + type=3DDMA_MEM_COPY > + mem_size=3D10 > + buf_size=3D64,8192,2,MUL > + dma_ring_size=3D1024 > + kick_batch=3D32 > + src_numa_node=3D0 > + dst_numa_node=3D0 > + cache_flush=3D0 > + test_seconds=3D2 > + lcore_dma=3Dlcore10@0000:00:04.2, lcore11@0000:00:04.3 > + eal_args=3D--in-memory --file-prefix=3Dtest > + > + [case2] > + type=3DCPU_MEM_COPY > + mem_size=3D10 > + buf_size=3D64,8192,2,MUL > + src_numa_node=3D0 > + dst_numa_node=3D1 > + cache_flush=3D0 > + test_seconds=3D2 > + lcore =3D 3, 4 > + eal_args=3D--in-memory --no-pci > + > +The configuration file is divided into multiple sections, each > +section > represents a test case. > +The four variables mem_size, buf_size, dma_ring_size, and kick_batch > +can > vary in each test case. > +The format for this is ``variable=3Dfirst,last,increment,ADD\|MUL``. > +This means that the first value of the variable is 'first', the last > +value is 'last', 'increment' is the step size, and ADD|MUL indicates > +whether the change is by addition or multiplication. Each case can > +only have one > variable change, and each change will generate a scenario, so each > case can have multiple scenarios. > + > +Parameter Definitions > +--------------------- > + > +- **type**: The type of the test. Currently supported types are > `DMA_MEM_COPY` and `CPU_MEM_COPY`. > +- **mem_size**: The size of the memory footprint. > +- **buf_size**: The memory size of a single operation. > +- **dma_ring_size**: The DMA ring buffer size. Must be a power of > +two, > and between 64 and 4096. > +- **kick_batch**: The DMA operation batch size, should be greater > +than 1 > normally. > +- **src_numa_node**: Controls the NUMA node where the source memory > is allocated. > +- **dst_numa_node**: Controls the NUMA node where the destination > memory is allocated. > +- **cache_flush**: Determines whether the cache should be flushed. > +`1` > indicates to flush and `0` to not flush. > +- **test_seconds**: Controls the test time for each scenario. > +- **lcore_dma**: Specifies the lcore/DMA mapping. > +- **lcore**: Specifies the lcore for CPU testing. > +- **eal_args**: Specifies the EAL arguments. > + > +.. Note:: > + > + The mapping of lcore to DMA must be one-to-one and cannot be > duplicated. > + > +To specify a configuration file, use the "\-\-config" flag followed > +by the path > to the file. > + > +To specify a result file, use the "\-\-result" flag followed by the > +path to the file. If you do not specify a result file, one will be > +generated with the same name as the configuration file, with the > +addition > of "_result.csv" at the end. > + > + > +Running the Application > +----------------------- > + > +Typical command-line invocation to execute the application: > + > +.. code-block:: console > + > + dpdk-test-dma-perf --config=3D./config_dma.ini > + --result=3D./res_dma.csv > + > +Where `config_dma.ini` is the configuration file, and `res_dma.csv` > +will be the generated result file. > + > +After the tests, you can find the results in the `res_dma.csv` file. > + > +Limitations > +----------- > + > +Currently, this tool only supports memory copy performance tests. > +Additional enhancements are possible in the future to support more > +types > of tests for DMA devices and CPUs. > diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst > index > 6f84fc31ff..857572da96 100644 > --- a/doc/guides/tools/index.rst > +++ b/doc/guides/tools/index.rst > @@ -23,3 +23,4 @@ DPDK Tools User Guides > testregex > testmldev > dts > + dmaperf > -- > 2.40.1 > > > > End of dev Digest, Vol 462, Issue 27 > ************************************ --_000_DM6PR11MB3516915C3BB0B8E4B288B4DD8E24ADM6PR11MB3516namp_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Cheng,

 

LGTM.

 

> -----Original Message-----

> Date: Wed, 28 Jun 2023 01:20:34 +0000

> From: Cheng Jiang <cheng1.jiang@intel.com>

> To: t= homas@monjalon.net, bruce.richardson@intel.com,

>        &n= bsp;   mb@smartshares= ystems.com, chenbo.xia@intel.com,

>        &n= bsp;   amitprakashs@m= arvell.com, anoobj@marvell.com,

> h= uangdengdui@huawei.com,

>        &n= bsp;   kevin.laatz@intel= .com, fengchengwen@huawei.com, jerinj@marvell.com

> Cc: dev@dpdk= .org, jiayu.hu@intel.com, xuan.ding@in= tel.com,

>        &n= bsp;   wenwux.ma@intel.com= , yuanx.wang@intel.com, xingguang.he@intel.com,

>        &n= bsp;   weix.ling@intel.com= , Cheng Jiang <cheng1.jian= g@intel.com>

> Subject: [PATCH v10] app/dma-perf: introduce= dma-perf application

> Message-ID: <20230628012034.49016-1-cheng1.jiang@in= tel.com>

> Content-Type: text/plain; charset=3DUTF-8

>

> There are many high-performance DMA devices = supported in DPDK now, and

> these DMA devices can also be integrated int= o other modules of DPDK as

> accelerators, such as Vhost. Before integrat= ing DMA into applications,

> developers need to know the performance of t= hese DMA devices in

> various scenarios and the performance of CPU= s in the same scenario,

> such as different buffer lengths. Only in th= is way can we know the

> target performance of the application accele= rated by using them. This

> patch introduces a high-performance testing = tool, which supports

> comparing the performance of CPU and DMA in = different scenarios

> automatically with a pre-set config file. Me= mory Copy performance test are supported for now.

>

> Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>

> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>

> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>

> Acked-by: Morten Br?rup <mb@smartsharesystems.com>

> Acked-by: Chenbo Xia <chenbo.xia@intel.com>

 

Acked-by: Yuying Zhang <yuying.zhang@intel.com>

 

> ---

> v10:

>   rebased code from 23.07-rc2;

> v9:

>   improved error handling;

>   improved lcore_params structure;=

>   improved mbuf api calling;<= /o:p>

>   improved exit process;

>   fixed some typos;

>   added scenario summary data disp= lay;

>   removed unnecessary include;

> v8:

>   fixed string copy issue in parse= _lcore();

>   improved some data display forma= t;

>   added doc in doc/guides/tools;

>   updated release notes;

> v7:

>   fixed some strcpy issues;

>   removed cache setup in calling r= te_pktmbuf_pool_create();

>   fixed some typos;

>   added some memory free and null = set operations;

>   improved result calculation;

> v6:

>   improved code based on Anoob's c= omments;

>   fixed some code structure issues= ;

> v5:

>   fixed some LONG_LINE warnings;

> v4:

>   fixed inaccuracy of the memory f= ootprint display;

> v3:

>   fixed some typos;

> v2:

>   added lcore/dmadev designation;<= o:p>

>   added error case process;

>   removed worker_threads parameter= from config.ini;

>   improved the logs;

>   improved config file;=

>

>  app/meson.build    = ;            &n= bsp;       |   1 +

>  app/test-dma-perf/benchmark.c  &n= bsp;       | 508 ++++++++++++++++++++

>  app/test-dma-perf/config.ini &nbs= p;         |  61 +++

>  app/test-dma-perf/main.c  &n= bsp;            | 61= 6 +++++++++++++++++++++++++

>  app/test-dma-perf/main.h  &n= bsp;            |&nb= sp; 64 +++

>  app/test-dma-perf/meson.build &nb= sp;        |  17 +

>  doc/guides/rel_notes/release_23_07.rst= |   6 +

>  doc/guides/tools/dmaperf.rst &nbs= p;         | 103 +++++

>  doc/guides/tools/index.rst  =            |   = 1 +

>  9 files changed, 1377 insertions(+)

>  create mode 100644 app/test-dma-perf/b= enchmark.c  create mode 100644

> app/test-dma-perf/config.ini  create mo= de 100644

> app/test-dma-perf/main.c create mode 100644 = app/test-dma-perf/main.h 

> create mode 100644 app/test-dma-perf/meson.b= uild  create mode 100644

> doc/guides/tools/dmaperf.rst

>

> diff --git a/app/meson.build b/app/meson.bui= ld index

> 74d2420f67..4fc1a83eba 100644

> --- a/app/meson.build

> +++ b/app/meson.build

> @@ -19,6 +19,7 @@ apps =3D [

>       &nb= sp;  'test-cmdline',

>       &nb= sp;  'test-compress-perf',

>       &nb= sp;  'test-crypto-perf',

> +        = 'test-dma-perf',

>       &nb= sp;  'test-eventdev',

>       &nb= sp;  'test-fib',

>       &nb= sp;  'test-flow-perf',

> diff --git a/app/test-dma-perf/benchmark.c b= /app/test-dma-

> perf/benchmark.c new file mode 100644 index = 0000000000..0601e0d171

> --- /dev/null

> +++ b/app/test-dma-perf/benchmark.c

> @@ -0,0 +1,508 @@

> +/* SPDX-License-Identifier: BSD-3-Clause

> + * Copyright(c) 2023 Intel Corporation = ; */

> +

> +#include <inttypes.h>

> +#include <stdio.h>

> +#include <stdlib.h>

> +#include <unistd.h>

> +

> +#include <rte_time.h>

> +#include <rte_mbuf.h>

> +#include <rte_dmadev.h>

> +#include <rte_malloc.h>

> +#include <rte_lcore.h>

> +

> +#include "main.h"

> +

> +#define MAX_DMA_CPL_NB 255

> +

> +#define TEST_WAIT_U_SECOND 10000=

> +#define POLL_MAX 1000

> +

> +#define CSV_LINE_DMA_FMT "Scenario %u,= %u,%s,%u,%u,%u,%u,%.2lf,%"

> PRIu64 ",%.3lf,%.3lf\n"=

> +#define CSV_LINE_CPU_FMT "Scenario %u,= %u,NA,NA,NA,%u,%u,%.2lf,%"

> PRIu64 ",%.3lf,%.3lf\n"=

> +

> +#define CSV_TOTAL_LINE_FMT "Scenario %= u

> Summary, , , , , ,%u,%.2lf,%u,%.3lf,%.3lf\n&= quot;

> +

> +struct worker_info {

> +       &= nbsp; bool ready_flag;

> +       &= nbsp; bool start_flag;

> +       &= nbsp; bool stop_flag;

> +       &= nbsp; uint32_t total_cpl;

> +       &= nbsp; uint32_t test_cpl;

> +};

> +

> +struct lcore_params {

> +       &= nbsp; uint8_t scenario_id;

> +       &= nbsp; unsigned int lcore_id;

> +       &= nbsp; char *dma_name;

> +       &= nbsp; uint16_t worker_id;

> +       &= nbsp; uint16_t dev_id;

> +       &= nbsp; uint32_t nr_buf;

> +       &= nbsp; uint16_t kick_batch;

> +       &= nbsp; uint32_t buf_size;

> +       &= nbsp; uint16_t test_secs;

> +       &= nbsp; struct rte_mbuf **srcs;

> +       &= nbsp; struct rte_mbuf **dsts;

> +       &= nbsp; volatile struct worker_info worker_info; };

> +

> +static struct rte_mempool *src_pool;

> +static struct rte_mempool *dst_pool;

> +

> +static struct lcore_params *lcores[MAX_WORK= ER_NB];

> +

> +#define PRINT_ERR(...) print_err(__func__, = __LINE__, __VA_ARGS__)

> +

> +static inline int

> +__rte_format_printf(3, 4)

> +print_err(const char *func, int lineno, con= st char *format, ...) {

> +       &= nbsp; va_list ap;

> +       &= nbsp; int ret;

> +

> +       &= nbsp; ret =3D fprintf(stderr, "In %s:%d - ", func, lineno);<= /o:p>

> +       &= nbsp; va_start(ap, format);

> +       &= nbsp; ret +=3D vfprintf(stderr, format, ap);

> +       &= nbsp; va_end(ap);

> +

> +       &= nbsp; return ret;

> +}

> +

> +static inline void

> +calc_result(uint32_t buf_size, uint32_t nr_= buf, uint16_t nb_workers,

> uint16_t test_secs,

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       uint32_t total_cnt, float *memory, u= int32_t

> *ave_cycle,

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       float *bandwidth, float *mops)<= /o:p>

> +{

> +       &= nbsp; float ops;

> +

> +       &= nbsp; *memory =3D (float)(buf_size * (nr_buf / nb_workers) * 2) / (1024 *

> 1024);

> +       &= nbsp; *ave_cycle =3D test_secs * rte_get_timer_hz() / total_cnt;=

> +       &= nbsp; ops =3D (float)total_cnt / test_secs;

> +       &= nbsp; *mops =3D ops / (1000 * 1000);

> +       &= nbsp; *bandwidth =3D (ops * buf_size * 8) / (1000 * 1000 * 1000); }

> +

> +static void

> +output_result(uint8_t scenario_id, uint32_t= lcore_id, char *dma_name,

> uint16_t ring_size,

> +       &= nbsp;           &nbs= p;            &= nbsp;     uint16_t kick_batch, uint64_t ave_cycle, uint= 32_t

> buf_size, uint32_t nr_buf,

> +       &= nbsp;           &nbs= p;            &= nbsp;     float memory, float bandwidth, float mops, bo= ol

> is_dma) {

> +       &= nbsp; if (is_dma)

> +       &= nbsp;           &nbs= p;   printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch=

> Size: %u.\n",

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       lcore_id, dma_name, ring_size, kick_= batch);

> +       &= nbsp; else

> +       &= nbsp;           &nbs= p;   printf("lcore %u\n", lcore_id);

> +

> +       &= nbsp; printf("Average Cycles/op: %" PRIu64 ", Buffer Size: %= u B, Buffer

> Number: %u, Memory: %.2lf MB, Frequency: %.3= lf Ghz.\n",

> +       &= nbsp;           &nbs= p;            &= nbsp;     ave_cycle, buf_size, nr_buf, memory,

> rte_get_timer_hz()/1000000000.0);=

> +       &= nbsp; printf("Average Bandwidth: %.3lf Gbps, MOps: %.3lf\n", band= width,

> +mops);

> +

> +       &= nbsp; if (is_dma)

> +       &= nbsp;           &nbs= p;   snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN,

> CSV_LINE_DMA_FMT,

> +       &= nbsp;           &nbs= p;            &= nbsp;     scenario_id, lcore_id, dma_name, ring_size,

> kick_batch, buf_size,

> +       &= nbsp;           &nbs= p;            &= nbsp;     nr_buf, memory, ave_cycle, bandwidth, mops);<= o:p>

> +       &= nbsp; else

> +       &= nbsp;           &nbs= p;   snprintf(output_str[lcore_id], MAX_OUTPUT_STR_LEN,

> CSV_LINE_CPU_FMT,

> +       &= nbsp;           &nbs= p;            &= nbsp;     scenario_id, lcore_id, buf_size,

> +       &= nbsp;           &nbs= p;            &= nbsp;     nr_buf, memory, ave_cycle, bandwidth, mops); = }

> +

> +static inline void

> +cache_flush_buf(__rte_unused struct rte_mbu= f **array,

> +       &= nbsp;           &nbs= p;   __rte_unused uint32_t buf_size,

> +       &= nbsp;           &nbs= p;   __rte_unused uint32_t nr_buf)

> +{

> +#ifdef RTE_ARCH_X86_64

> +       &= nbsp; char *data;

> +       &= nbsp; struct rte_mbuf **srcs =3D array;

> +       &= nbsp; uint32_t i, offset;

> +

> +       &= nbsp; for (i =3D 0; i < nr_buf; i++) {

> +       &= nbsp;           &nbs= p;   data =3D rte_pktmbuf_mtod(srcs[i], char *);

> +       &= nbsp;           &nbs= p;   for (offset =3D 0; offset < buf_size; offset +=3D 64)

> +       &= nbsp;           &nbs= p;            &= nbsp;     __builtin_ia32_clflush(data + offset);

> +       &= nbsp; }

> +#endif

> +}

> +

> +/* Configuration of device. */

> +static void

> +configure_dmadev_queue(uint32_t dev_id, uin= t32_t ring_size) {

> +       &= nbsp; uint16_t vchan =3D 0;

> +       &= nbsp; struct rte_dma_info info;

> +       &= nbsp; struct rte_dma_conf dev_config =3D { .nb_vchans =3D 1 };

> +       &= nbsp; struct rte_dma_vchan_conf qconf =3D {

> +       &= nbsp;           &nbs= p;   .direction =3D RTE_DMA_DIR_MEM_TO_MEM,

> +       &= nbsp;           &nbs= p;   .nb_desc =3D ring_size

> +       &= nbsp; };

> +

> +       &= nbsp; if (rte_dma_configure(dev_id, &dev_config) !=3D 0)

> +       &= nbsp;           &nbs= p;   rte_exit(EXIT_FAILURE, "Error with dma configure.\n&quo= t;);

> +

> +       &= nbsp; if (rte_dma_vchan_setup(dev_id, vchan, &qconf) !=3D 0)=

> +       &= nbsp;           &nbs= p;   rte_exit(EXIT_FAILURE, "Error with queue configuration.= \n");

> +

> +       &= nbsp; if (rte_dma_info_get(dev_id, &info) !=3D 0)

> +       &= nbsp;           &nbs= p;   rte_exit(EXIT_FAILURE, "Error with getting device info.= \n");

> +

> +       &= nbsp; if (info.nb_vchans !=3D 1)

> +       &= nbsp;           &nbs= p;   rte_exit(EXIT_FAILURE, "Error, no configured queues

> reported on device id. %u\n",

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       dev_id);

> +

> +       &= nbsp; if (rte_dma_start(dev_id) !=3D 0)

> +       &= nbsp;           &nbs= p;   rte_exit(EXIT_FAILURE, "Error with dma start.\n");= }

> +

> +static int

> +config_dmadevs(struct test_configure *cfg) = {

> +       &= nbsp; uint32_t ring_size =3D cfg->ring_size.cur;

> +       &= nbsp; struct lcore_dma_map_t *ldm =3D &cfg->lcore_dma_map;

> +       &= nbsp; uint32_t nb_workers =3D ldm->cnt;

> +       &= nbsp; uint32_t i;

> +       &= nbsp; int dev_id;

> +       &= nbsp; uint16_t nb_dmadevs =3D 0;

> +       &= nbsp; char *dma_name;

> +

> +       &= nbsp; for (i =3D 0; i < ldm->cnt; i++) {

> +       &= nbsp;           &nbs= p;   dma_name =3D ldm->dma_names[i];

> +       &= nbsp;           &nbs= p;   dev_id =3D rte_dma_get_dev_id_by_name(dma_name);<= /p>

> +       &= nbsp;           &nbs= p;   if (dev_id < 0) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     fprintf(stderr, "Error: Fail to find DMA= %s.\n",

> dma_name);

> +       &= nbsp;           &nbs= p;            &= nbsp;     goto end;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   ldm->dma_ids[i] =3D dev_id;

> +       &= nbsp;           &nbs= p;   configure_dmadev_queue(dev_id, ring_size);

> +       &= nbsp;           &nbs= p;   ++nb_dmadevs;

> +       &= nbsp; }

> +

> +end:

> +       &= nbsp; if (nb_dmadevs < nb_workers) {

> +       &= nbsp;           &nbs= p;   printf("Not enough dmadevs (%u) for all workers (%u).\n= ",

> nb_dmadevs, nb_workers);

> +       &= nbsp;           &nbs= p;   return -1;

> +       &= nbsp; }

> +

> +       &= nbsp; printf("Number of used dmadevs: %u.\n", nb_dmadevs);

> +

> +       &= nbsp; return 0;

> +}

> +

> +static void

> +error_exit(int dev_id)

> +{

> +       &= nbsp; rte_dma_stop(dev_id);

> +       &= nbsp; rte_dma_close(dev_id);

> +       &= nbsp; rte_exit(EXIT_FAILURE, "DMA error\n"); }

> +

> +static inline void

> +do_dma_submit_and_poll(uint16_t dev_id, uin= t64_t *async_cnt,

> +       &= nbsp;           &nbs= p;            &= nbsp;     volatile struct worker_info *worker_info) {

> +       &= nbsp; int ret;

> +       &= nbsp; uint16_t nr_cpl;

> +

> +       &= nbsp; ret =3D rte_dma_submit(dev_id, 0);

> +       &= nbsp; if (ret < 0)

> +       &= nbsp;           &nbs= p;   error_exit(dev_id);

> +

> +       &= nbsp; nr_cpl =3D rte_dma_completed(dev_id, 0, MAX_DMA_CPL_NB, NULL,

> NULL);

> +       &= nbsp; *async_cnt -=3D nr_cpl;

> +       &= nbsp; worker_info->total_cpl +=3D nr_cpl;

> +}

> +

> +static inline int

> +do_dma_mem_copy(void *p)

> +{

> +       &= nbsp; struct lcore_params *para =3D (struct lcore_params *)p;

> +       &= nbsp; volatile struct worker_info *worker_info =3D &(para->worker_in= fo);

> +       &= nbsp; const uint16_t dev_id =3D para->dev_id;

> +       &= nbsp; const uint32_t nr_buf =3D para->nr_buf;

> +       &= nbsp; const uint16_t kick_batch =3D para->kick_batch;

> +       &= nbsp; const uint32_t buf_size =3D para->buf_size;

> +       &= nbsp; struct rte_mbuf **srcs =3D para->srcs;

> +       &= nbsp; struct rte_mbuf **dsts =3D para->dsts;

> +       &= nbsp; uint16_t nr_cpl;

> +       &= nbsp; uint64_t async_cnt =3D 0;

> +       &= nbsp; uint32_t i;

> +       &= nbsp; uint32_t poll_cnt =3D 0;

> +       &= nbsp; int ret;

> +

> +       &= nbsp; worker_info->stop_flag =3D false;

> +       &= nbsp; worker_info->ready_flag =3D true;

> +

> +       &= nbsp; while (!worker_info->start_flag)

> +       &= nbsp;           &nbs= p;   ;

> +

> +       &= nbsp; while (1) {

> +       &= nbsp;           &nbs= p;   for (i =3D 0; i < nr_buf; i++) {

> +dma_copy:

> +       &= nbsp;           &nbs= p;            &= nbsp;     ret =3D rte_dma_copy(dev_id, 0,

> rte_mbuf_data_iova(srcs[i]),

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       rte_mbuf_data_iova(dsts[i]), buf_siz= e, 0);

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (unlikely(ret < 0)) {

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       if (ret =3D=3D -ENOSPC) {=

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         do_dma_submit_and_pol= l(dev_id,

> &async_cnt, worker_info);

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         goto dma_copy;

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       } else

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         error_exit(dev_id);

> +       &= nbsp;           &nbs= p;            &= nbsp;     }

> +       &= nbsp;           &nbs= p;            &= nbsp;     async_cnt++;

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     if ((async_cnt % kick_batch) =3D=3D 0)

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       do_dma_submit_and_poll(dev_id,<= /o:p>

> &async_cnt, worker_info);

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   if (worker_info->stop_flag)

> +       &= nbsp;           &nbs= p;            &= nbsp;     break;

> +       &= nbsp; }

> +

> +       &= nbsp; rte_dma_submit(dev_id, 0);

> +       &= nbsp; while ((async_cnt > 0) && (poll_cnt++ < POLL_MAX)) {

> +       &= nbsp;           &nbs= p;   nr_cpl =3D rte_dma_completed(dev_id, 0, MAX_DMA_CPL_NB,=

> NULL, NULL);

> +       &= nbsp;           &nbs= p;   async_cnt -=3D nr_cpl;

> +       &= nbsp; }

> +

> +       &= nbsp; return 0;

> +}

> +

> +static inline int

> +do_cpu_mem_copy(void *p)

> +{

> +       &= nbsp; struct lcore_params *para =3D (struct lcore_params *)p;

> +       &= nbsp; volatile struct worker_info *worker_info =3D &(para->worker_in= fo);

> +       &= nbsp; const uint32_t nr_buf =3D para->nr_buf;

> +       &= nbsp; const uint32_t buf_size =3D para->buf_size;

> +       &= nbsp; struct rte_mbuf **srcs =3D para->srcs;

> +       &= nbsp; struct rte_mbuf **dsts =3D para->dsts;

> +       &= nbsp; uint32_t i;

> +

> +       &= nbsp; worker_info->stop_flag =3D false;

> +       &= nbsp; worker_info->ready_flag =3D true;

> +

> +       &= nbsp; while (!worker_info->start_flag)

> +       &= nbsp;           &nbs= p;   ;

> +

> +       &= nbsp; while (1) {

> +       &= nbsp;           &nbs= p;   for (i =3D 0; i < nr_buf; i++) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     /* copy buffer form src to dst */<= /p>

> +       &= nbsp;           &nbs= p;            &= nbsp;     rte_memcpy((void

> *)(uintptr_t)rte_mbuf_data_iova(dsts[i]),

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       (void

> *)(uintptr_t)rte_mbuf_data_iova(srcs[i]),

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       (size_t)buf_size);

> +       &= nbsp;           &nbs= p;            &= nbsp;     worker_info->total_cpl++;

> +       &= nbsp;           &nbs= p;   }

> +       &= nbsp;           &nbs= p;   if (worker_info->stop_flag)

> +       &= nbsp;           &nbs= p;            &= nbsp;     break;

> +       &= nbsp; }

> +

> +       &= nbsp; return 0;

> +}

> +

> +static int

> +setup_memory_env(struct test_configure *cfg= , struct rte_mbuf ***srcs,

> +       &= nbsp;           &nbs= p;            &= nbsp;     struct rte_mbuf ***dsts)

> +{

> +       &= nbsp; unsigned int buf_size =3D cfg->buf_size.cur;

> +       &= nbsp; unsigned int nr_sockets;

> +       &= nbsp; uint32_t nr_buf =3D cfg->nr_buf;

> +

> +       &= nbsp; nr_sockets =3D rte_socket_count();

> +       &= nbsp; if (cfg->src_numa_node >=3D nr_sockets ||

> +       &= nbsp;           &nbs= p;   cfg->dst_numa_node >=3D nr_sockets) {

> +       &= nbsp;           &nbs= p;   printf("Error: Source or destination numa exceeds the a= cture

> numa nodes.\n");

> +       &= nbsp;           &nbs= p;   return -1;

> +       &= nbsp; }

> +

> +       &= nbsp; src_pool =3D rte_pktmbuf_pool_create("Benchmark_DMA_SRC",

> +       &= nbsp;           &nbs= p;            &= nbsp;     nr_buf,

> +       &= nbsp;           &nbs= p;            &= nbsp;     0,

> +       &= nbsp;           &nbs= p;            &= nbsp;     0,

> +       &= nbsp;           &nbs= p;            &= nbsp;     buf_size + RTE_PKTMBUF_HEADROOM,

> +       &= nbsp;           &nbs= p;            &= nbsp;     cfg->src_numa_node);

> +       &= nbsp; if (src_pool =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;   PRINT_ERR("Error with source mempool creation.\n")= ;

> +       &= nbsp;           &nbs= p;   return -1;

> +       &= nbsp; }

> +

> +       &= nbsp; dst_pool =3D rte_pktmbuf_pool_create("Benchmark_DMA_DST",

> +       &= nbsp;           &nbs= p;            &= nbsp;     nr_buf,

> +       &= nbsp;           &nbs= p;            &= nbsp;     0,

> +       &= nbsp;           &nbs= p;            &= nbsp;     0,

> +       &= nbsp;           &nbs= p;            &= nbsp;     buf_size + RTE_PKTMBUF_HEADROOM,

> +       &= nbsp;           &nbs= p;            &= nbsp;     cfg->dst_numa_node);

> +       &= nbsp; if (dst_pool =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;   PRINT_ERR("Error with destination mempool creation.\n&q= uot;);

> +       &= nbsp;           &nbs= p;   return -1;

> +       &= nbsp; }

> +

> +       &= nbsp; *srcs =3D rte_malloc(NULL, nr_buf * sizeof(struct rte_mbuf *), 0);

> +       &= nbsp; if (*srcs =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;   printf("Error: srcs malloc failed.\n");=

> +       &= nbsp;           &nbs= p;   return -1;

> +       &= nbsp; }

> +

> +       &= nbsp; *dsts =3D rte_malloc(NULL, nr_buf * sizeof(struct rte_mbuf *), 0);

> +       &= nbsp; if (*dsts =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;   printf("Error: dsts malloc failed.\n");=

> +       &= nbsp;           &nbs= p;   return -1;

> +       &= nbsp; }

> +

> +       &= nbsp; if (rte_pktmbuf_alloc_bulk(src_pool, *srcs, nr_buf) !=3D 0) {

> +       &= nbsp;           &nbs= p;   printf("alloc src mbufs failed.\n");

> +       &= nbsp;           &nbs= p;   return -1;

> +       &= nbsp; }

> +

> +       &= nbsp; if (rte_pktmbuf_alloc_bulk(dst_pool, *dsts, nr_buf) !=3D 0) {

> +       &= nbsp;           &nbs= p;   printf("alloc dst mbufs failed.\n");

> +       &= nbsp;           &nbs= p;   return -1;

> +       &= nbsp; }

> +

> +       &= nbsp; return 0;

> +}

> +

> +void

> +mem_copy_benchmark(struct test_configure *c= fg, bool is_dma) {

> +       &= nbsp; uint16_t i;

> +       &= nbsp; uint32_t offset;

> +       &= nbsp; unsigned int lcore_id =3D 0;

> +       &= nbsp; struct rte_mbuf **srcs =3D NULL, **dsts =3D NULL;

> +       &= nbsp; struct lcore_dma_map_t *ldm =3D &cfg->lcore_dma_map;

> +       &= nbsp; unsigned int buf_size =3D cfg->buf_size.cur;

> +       &= nbsp; uint16_t kick_batch =3D cfg->kick_batch.cur;

> +       &= nbsp; uint32_t nr_buf =3D cfg->nr_buf =3D (cfg->mem_size.cur * 1024 *= 1024) /

> (cfg->buf_size.cur * 2);

> +       &= nbsp; uint16_t nb_workers =3D ldm->cnt;

> +       &= nbsp; uint16_t test_secs =3D cfg->test_secs;

> +       &= nbsp; float memory =3D 0;

> +       &= nbsp; uint32_t avg_cycles =3D 0;

> +       &= nbsp; uint32_t avg_cycles_total;

> +       &= nbsp; float mops, mops_total;

> +       &= nbsp; float bandwidth, bandwidth_total;

> +

> +       &= nbsp; if (setup_memory_env(cfg, &srcs, &dsts) < 0)

> +       &= nbsp;           &nbs= p;   goto out;

> +

> +       &= nbsp; if (is_dma)

> +       &= nbsp;           &nbs= p;   if (config_dmadevs(cfg) < 0)

> +       &= nbsp;           &nbs= p;            &= nbsp;     goto out;

> +

> +       &= nbsp; if (cfg->cache_flush =3D=3D 1) {

> +       &= nbsp;           &nbs= p;   cache_flush_buf(srcs, buf_size, nr_buf);

> +       &= nbsp;           &nbs= p;   cache_flush_buf(dsts, buf_size, nr_buf);

> +       &= nbsp;           &nbs= p;   rte_mb();

> +       &= nbsp; }

> +

> +       &= nbsp; printf("Start testing....\n");

> +

> +       &= nbsp; for (i =3D 0; i < nb_workers; i++) {

> +       &= nbsp;           &nbs= p;   lcore_id =3D ldm->lcores[i];

> +       &= nbsp;           &nbs= p;   offset =3D nr_buf / nb_workers * i;

> +       &= nbsp;           &nbs= p;   lcores[i] =3D rte_malloc(NULL, sizeof(struct lcore_params), = 0);

> +       &= nbsp;           &nbs= p;   if (lcores[i] =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("lcore parameters malloc failure = for

> lcore %d\n", lcore_id);

> +       &= nbsp;           &nbs= p;            &= nbsp;     break;

> +       &= nbsp;           &nbs= p;   }

> +       &= nbsp;           &nbs= p;   if (is_dma) {

> +       &= nbsp;           &nbs= p;             =     lcores[i]->dma_name =3D ldm->dma_names[i];

> +       &= nbsp;           &nbs= p;            &= nbsp;     lcores[i]->dev_id =3D ldm->dma_ids[i];<= o:p>

> +       &= nbsp;           &nbs= p;            &= nbsp;     lcores[i]->kick_batch =3D kick_batch;=

> +       &= nbsp;           &nbs= p;   }

> +       &= nbsp;           &nbs= p;   lcores[i]->worker_id =3D i;

> +       &= nbsp;           &nbs= p;   lcores[i]->nr_buf =3D (uint32_t)(nr_buf / nb_workers);

> +       &= nbsp;           &nbs= p;   lcores[i]->buf_size =3D buf_size;

> +       &= nbsp;           &nbs= p;   lcores[i]->test_secs =3D test_secs;

> +       &= nbsp;           &nbs= p;   lcores[i]->srcs =3D srcs + offset;

> +       &= nbsp;           &nbs= p;   lcores[i]->dsts =3D dsts + offset;

> +       &= nbsp;           &nbs= p;   lcores[i]->scenario_id =3D cfg->scenario_id;

> +       &= nbsp;           &nbs= p;   lcores[i]->lcore_id =3D lcore_id;

> +

> +       &= nbsp;           &nbs= p;   if (is_dma)

> +       &= nbsp;           &nbs= p;            &= nbsp;     rte_eal_remote_launch(do_dma_mem_copy, (void<= o:p>

> *)(lcores[i]), lcore_id);

> +       &= nbsp;           &nbs= p;   else

> +       &= nbsp;           &nbs= p;            &= nbsp;     rte_eal_remote_launch(do_cpu_mem_copy, (void<= o:p>

> *)(lcores[i]), lcore_id);

> +       &= nbsp; }

> +

> +       &= nbsp; while (1) {

> +       &= nbsp;           &nbs= p;   bool ready =3D true;

> +       &= nbsp;           &nbs= p;   for (i =3D 0; i < nb_workers; i++) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (lcores[i]->worker_info.ready_flag =3D= =3D false) {

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       ready =3D 0;

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       break;

> +       &= nbsp;           &nbs= p;            &= nbsp;     }

> +       &= nbsp;           &nbs= p;   }

> +       &= nbsp;           &nbs= p;   if (ready)

> +       &= nbsp;           &nbs= p;            &= nbsp;     break;

> +       &= nbsp; }

> +

> +       &= nbsp; for (i =3D 0; i < nb_workers; i++)

> +       &= nbsp;           &nbs= p;   lcores[i]->worker_info.start_flag =3D true;

> +

> +       &= nbsp; usleep(TEST_WAIT_U_SECOND);

> +       &= nbsp; for (i =3D 0; i < nb_workers; i++)

> +       &= nbsp;           &nbs= p;   lcores[i]->worker_info.test_cpl =3D lcores[i]-=

> >worker_info.total_cpl;

> +

> +       &= nbsp; usleep(test_secs * 1000 * 1000);

> +       &= nbsp; for (i =3D 0; i < nb_workers; i++)

> +       &= nbsp;           &nbs= p;   lcores[i]->worker_info.test_cpl =3D lcores[i]-=

> >worker_info.total_cpl -

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;           lcores[i]-

> >worker_info.test_cpl;

> +

> +       &= nbsp; for (i =3D 0; i < nb_workers; i++)

> +       &= nbsp;           &nbs= p;   lcores[i]->worker_info.stop_flag =3D true;

> +

> +       &= nbsp; rte_eal_mp_wait_lcore();

> +

> +       &= nbsp; mops_total =3D 0;

> +       &= nbsp; bandwidth_total =3D 0;

> +       &= nbsp; avg_cycles_total =3D 0;

> +       &= nbsp; for (i =3D 0; i < nb_workers; i++) {

> +       &= nbsp;           &nbs= p;   calc_result(buf_size, nr_buf, nb_workers, test_secs,

> +       &= nbsp;           &nbs= p;            &= nbsp;     lcores[i]->worker_info.test_cpl,

> +       &= nbsp;           &nbs= p;            &= nbsp;     &memory, &avg_cycles, &bandwidth,= &mops);

> +       &= nbsp;           &nbs= p;   output_result(cfg->scenario_id, lcores[i]->lcore_id,

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         lcores[i]->dma_nam= e, cfg-

> >ring_size.cur, kick_batch,

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         avg_cycles, buf_size,= nr_buf /

> nb_workers, memory,

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         bandwidth, mops, is_d= ma);

> +       &= nbsp;           &nbs= p;   mops_total +=3D mops;

> +       &= nbsp;           &nbs= p;   bandwidth_total +=3D bandwidth;

> +       &= nbsp;           &nbs= p;   avg_cycles_total +=3D avg_cycles;

> +       &= nbsp; }

> +       &= nbsp; printf("\nTotal Bandwidth: %.3lf Gbps, Total MOps: %.3lf\n"= ,

> bandwidth_total, mops_total);

> +       &= nbsp; snprintf(output_str[MAX_WORKER_NB], MAX_OUTPUT_STR_LEN,

> CSV_TOTAL_LINE_FMT,

> +       &= nbsp;           &nbs= p;            &= nbsp;     cfg->scenario_id, nr_buf, memory * nb_work= ers,

> +       &= nbsp;           &nbs= p;            &= nbsp;     avg_cycles_total / nb_workers, bandwidth_tota= l,

> mops_total);

> +

> +out:

> +       &= nbsp; /* free mbufs used in the test */

> +       &= nbsp; if (srcs !=3D NULL)

> +       &= nbsp;           &nbs= p;   rte_pktmbuf_free_bulk(srcs, nr_buf);

> +       &= nbsp; if (dsts !=3D NULL)

> +       &= nbsp;           &nbs= p;   rte_pktmbuf_free_bulk(dsts, nr_buf);

> +

> +       &= nbsp; /* free the points for the mbufs */

> +       &= nbsp; rte_free(srcs);

> +       &= nbsp; srcs =3D NULL;

> +       &= nbsp; rte_free(dsts);

> +       &= nbsp; dsts =3D NULL;

> +

> +       &= nbsp; rte_mempool_free(src_pool);

> +       &= nbsp; src_pool =3D NULL;

> +

> +       &= nbsp; rte_mempool_free(dst_pool);

> +       &= nbsp; dst_pool =3D NULL;

> +

> +       &= nbsp; /* free the worker parameters */

> +       &= nbsp; for (i =3D 0; i < nb_workers; i++) {

> +       &= nbsp;           &nbs= p;   rte_free(lcores[i]);

> +       &= nbsp;           &nbs= p;   lcores[i] =3D NULL;

> +       &= nbsp; }

> +

> +       &= nbsp; if (is_dma) {

> +       &= nbsp;           &nbs= p;   for (i =3D 0; i < nb_workers; i++) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("Stopping dmadev %d\n", ldm-= >dma_ids[i]);

> +       &= nbsp;           &nbs= p;            &= nbsp;     rte_dma_stop(ldm->dma_ids[i]);<= /p>

> +       &= nbsp;           &nbs= p;   }

> +       &= nbsp; }

> +}

> diff --git a/app/test-dma-perf/config.ini

> b/app/test-dma-perf/config.ini new file mode= 100644 index

> 0000000000..b550f4b23f

> --- /dev/null

> +++ b/app/test-dma-perf/config.ini

> @@ -0,0 +1,61 @@

> +

> +; This is an example configuration file for= dma-perf, which details

> +the meanings of each parameter ; and instru= ctions on how to use dma-perf.

> +

> +; Supported test types are DMA_MEM_COPY and= CPU_MEM_COPY.

> +

> +; Parameters:

> +; "mem_size" denotes the size of = the memory footprint.

> +; "buf_size" denotes the memory s= ize of a single operation.

> +; "dma_ring_size" denotes the dma= ring buffer size. It should be must

> +be a power of two, and between ;  64 a= nd 4096.

> +; "kick_batch" denotes the dma op= eration batch size, and should be

> +greater

> than 1 normally.

> +

> +; The format for variables is variable=3Dfi= rst,last,increment,ADD|MUL.

> +

> +; src_numa_node is used to control the numa= node where the source

> memory is allocated.

> +; dst_numa_node is used to control the numa= node where the

> +destination

> memory is allocated.

> +

> +; cache_flush is used to determine whether = or not the cache should be

> +flushed, with 1 indicating to ; flush and 0= indicating to not flush.

> +

> +; test_seconds controls the test time of th= e whole case.

> +

> +; To use DMA for a test, please specify the= "lcore_dma" parameter.

> +; If you have already set the "-l"= ; and "-a" parameters using EAL, ;

> +make sure that the value of "lcore_dma= " falls within their range of

> +the

> values.

> +; We have to ensure a 1:1 mapping between t= he core and DMA device.

> +

> +; To use CPU for a test, please specify the= "lcore" parameter.

> +; If you have already set the "-l"= ; and "-a" parameters using EAL, ;

> +make sure that the value of "lcore&quo= t; falls within their range of values.

> +

> +; To specify a configuration file, use the = "--config" flag followed

> +by the path

> to the file.

> +

> +; To specify a result file, use the "-= -result" flag followed by the

> +path to the

> file.

> +; If you do not specify a result file, one = will be generated with the

> +same name as the configuration ; file, with= the addition of

> +"_result.csv" at

> the end.

> +

> +[case1]

> +type=3DDMA_MEM_COPY

> +mem_size=3D10

> +buf_size=3D64,8192,2,MUL

> +dma_ring_size=3D1024

> +kick_batch=3D32

> +src_numa_node=3D0

> +dst_numa_node=3D0

> +cache_flush=3D0

> +test_seconds=3D2

> +lcore_dma=3Dlcore10@0000:00:04.2, lcore11@0000:00:04.3

> +eal_args=3D--in-memory --file-prefix=3Dtest=

> +

> +[case2]

> +type=3DCPU_MEM_COPY

> +mem_size=3D10

> +buf_size=3D64,8192,2,MUL

> +src_numa_node=3D0

> +dst_numa_node=3D1

> +cache_flush=3D0

> +test_seconds=3D2

> +lcore =3D 3, 4

> +eal_args=3D--in-memory --no-pci<= /p>

> diff --git a/app/test-dma-perf/main.c b/app/= test-dma-perf/main.c new

> file mode 100644 index 0000000000..de37120df= 6

> --- /dev/null

> +++ b/app/test-dma-perf/main.c

> @@ -0,0 +1,616 @@

> +/* SPDX-License-Identifier: BSD-3-Clause

> + * Copyright(c) 2023 Intel Corporation = ; */

> +

> +#include <stdio.h>

> +#include <stdlib.h>

> +#include <getopt.h>

> +#include <signal.h>

> +#include <stdbool.h>

> +#include <unistd.h>

> +#include <sys/wait.h>

> +#include <inttypes.h>

> +#include <libgen.h>

> +

> +#include <rte_eal.h>

> +#include <rte_cfgfile.h>

> +#include <rte_string_fns.h>

> +#include <rte_lcore.h>

> +

> +#include "main.h"

> +

> +#define CSV_HDR_FMT "Case %u : %s,lcor= e,DMA,DMA ring size,kick batch

> size,buffer size(B),number of buffers,memory= (MB),average

> cycle,bandwidth(Gbps),MOps\n"

> +

> +#define MAX_EAL_PARAM_NB 100

> +#define MAX_EAL_PARAM_LEN 1024

> +

> +#define DMA_MEM_COPY "DMA_MEM_COPY&quo= t;

> +#define CPU_MEM_COPY "CPU_MEM_COPY&quo= t;

> +

> +#define CMDLINE_CONFIG_ARG "--config&q= uot;

> +#define CMDLINE_RESULT_ARG "--result&q= uot;

> +

> +#define MAX_PARAMS_PER_ENTRY 4

> +

> +#define MAX_LONG_OPT_SZ 64

> +

> +enum {

> +       &= nbsp; TEST_TYPE_NONE =3D 0,

> +       &= nbsp; TEST_TYPE_DMA_MEM_COPY,

> +       &= nbsp; TEST_TYPE_CPU_MEM_COPY

> +};

> +

> +#define MAX_TEST_CASES 16

> +static struct test_configure test_cases[MAX= _TEST_CASES];

> +

> +char output_str[MAX_WORKER_NB + 1][MAX_OUTP= UT_STR_LEN];

> +

> +static FILE *fd;

> +

> +static void

> +output_csv(bool need_blankline)<= /p>

> +{

> +       &= nbsp; uint32_t i;

> +

> +       &= nbsp; if (need_blankline) {

> +       &= nbsp;           &nbs= p;   fprintf(fd, ",,,,,,,,\n");

> +       &= nbsp;           &nbs= p;   fprintf(fd, ",,,,,,,,\n");

> +       &= nbsp; }

> +

> +       &= nbsp; for (i =3D 0; i < RTE_DIM(output_str); i++) {

> +       &= nbsp;           &nbs= p;   if (output_str[i][0]) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     fprintf(fd, "%s", output_str[i]);

> +       &= nbsp;           &nbs= p;            &= nbsp;     output_str[i][0] =3D '\0';

> +       &= nbsp;           &nbs= p;   }

> +       &= nbsp; }

> +

> +       &= nbsp; fflush(fd);

> +}

> +

> +static void

> +output_env_info(void)

> +{

> +       &= nbsp; snprintf(output_str[0], MAX_OUTPUT_STR_LEN, "Test

> Environment:\n");

> +       &= nbsp; snprintf(output_str[1], MAX_OUTPUT_STR_LEN, "CPU

> frequency,%.3lf Ghz",

> +       &= nbsp;           &nbs= p;            &= nbsp;     rte_get_timer_hz() / 1000000000.0);

> +

> +       &= nbsp; output_csv(true);

> +}

> +

> +static void

> +output_header(uint32_t case_id, struct test= _configure *case_cfg) {

> +       &= nbsp; snprintf(output_str[0], MAX_OUTPUT_STR_LEN,

> +       &= nbsp;           &nbs= p;            &= nbsp;     CSV_HDR_FMT, case_id, case_cfg->test_type_= str);

> +

> +       &= nbsp; output_csv(true);

> +}

> +

> +static void

> +run_test_case(struct test_configure *case_c= fg) {

> +       &= nbsp; switch (case_cfg->test_type) {

> +       &= nbsp; case TEST_TYPE_DMA_MEM_COPY:

> +       &= nbsp;           &nbs= p;   mem_copy_benchmark(case_cfg, true);

> +       &= nbsp;           &nbs= p;   break;

> +       &= nbsp; case TEST_TYPE_CPU_MEM_COPY:

> +       &= nbsp;           &nbs= p;   mem_copy_benchmark(case_cfg, false);

> +       &= nbsp;           &nbs= p;   break;

> +       &= nbsp; default:

> +       &= nbsp;           &nbs= p;   printf("Unknown test type. %s\n", case_cfg->tes= t_type_str);

> +       &= nbsp;           &nbs= p;   break;

> +       &= nbsp; }

> +}

> +

> +static void

> +run_test(uint32_t case_id, struct test_conf= igure *case_cfg) {

> +       &= nbsp; uint32_t i;

> +       &= nbsp; uint32_t nb_lcores =3D rte_lcore_count();

> +       &= nbsp; struct test_configure_entry *mem_size =3D &case_cfg->mem_size;=

> +       &= nbsp; struct test_configure_entry *buf_size =3D &case_cfg->buf_size;=

> +       &= nbsp; struct test_configure_entry *ring_size =3D &case_cfg->ring_siz= e;

> +       &= nbsp; struct test_configure_entry *kick_batch =3D &case_cfg->kick_ba= tch;

> +       &= nbsp; struct test_configure_entry dummy =3D { 0 };

> +       &= nbsp; struct test_configure_entry *var_entry =3D &dummy;

> +

> +       &= nbsp; for (i =3D 0; i < RTE_DIM(output_str); i++)

> +       &= nbsp;           &nbs= p;   memset(output_str[i], 0, MAX_OUTPUT_STR_LEN);

> +

> +       &= nbsp; if (nb_lcores <=3D case_cfg->lcore_dma_map.cnt) {

> +       &= nbsp;           &nbs= p;   printf("Case %u: Not enough lcores.\n", case_id);<= o:p>

> +       &= nbsp;           &nbs= p;   return;

> +       &= nbsp; }

> +

> +       &= nbsp; printf("Number of used lcores: %u.\n", nb_lcores);

> +

> +       &= nbsp; if (mem_size->incr !=3D 0)

> +       &= nbsp;           &nbs= p;   var_entry =3D mem_size;

> +

> +       &= nbsp; if (buf_size->incr !=3D 0)

> +       &= nbsp;           &nbs= p;   var_entry =3D buf_size;

> +

> +       &= nbsp; if (ring_size->incr !=3D 0)

> +       &= nbsp;           &nbs= p;   var_entry =3D ring_size;

> +

> +       &= nbsp; if (kick_batch->incr !=3D 0)

> +       &= nbsp;           &nbs= p;   var_entry =3D kick_batch;

> +

> +       &= nbsp; case_cfg->scenario_id =3D 0;

> +

> +       &= nbsp; output_header(case_id, case_cfg);

> +

> +       &= nbsp; for (var_entry->cur =3D var_entry->first; var_entry->cur <= ;=3D var_entry-

> >last;) {

> +       &= nbsp;           &nbs= p;   case_cfg->scenario_id++;

> +       &= nbsp;           &nbs= p;   printf("\nRunning scenario %d\n", case_cfg->sce= nario_id);

> +

> +       &= nbsp;           &nbs= p;   run_test_case(case_cfg);

> +       &= nbsp;           &nbs= p;   output_csv(false);

> +

> +       &= nbsp;           &nbs= p;   if (var_entry->op =3D=3D OP_ADD)

> +       &= nbsp;           &nbs= p;            &= nbsp;     var_entry->cur +=3D var_entry->incr;

> +       &= nbsp;           &nbs= p;   else if (var_entry->op =3D=3D OP_MUL)

> +       &= nbsp;           &nbs= p;            &= nbsp;     var_entry->cur *=3D var_entry->incr;

> +       &= nbsp;           &nbs= p;   else {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("No proper operation for variable= entry.\n");

> +       &= nbsp;           &nbs= p;            &= nbsp;     break;

> +       &= nbsp;           &nbs= p;   }

> +       &= nbsp; }

> +}

> +

> +static int

> +parse_lcore(struct test_configure *test_cas= e, const char *value) {

> +       &= nbsp; uint16_t len;

> +       &= nbsp; char *input;

> +       &= nbsp; struct lcore_dma_map_t *lcore_dma_map;

> +

> +       &= nbsp; if (test_case =3D=3D NULL || value =3D=3D NULL)

> +       &= nbsp;           &nbs= p;   return -1;

> +

> +       &= nbsp; len =3D strlen(value);

> +       &= nbsp; input =3D (char *)malloc((len + 1) * sizeof(char));

> +       &= nbsp; strlcpy(input, value, len + 1);

> +       &= nbsp; lcore_dma_map =3D &(test_case->lcore_dma_map);

> +

> +       &= nbsp; memset(lcore_dma_map, 0, sizeof(struct lcore_dma_map_t));<= /p>

> +

> +       &= nbsp; char *token =3D strtok(input, ", ");

> +       &= nbsp; while (token !=3D NULL) {

> +       &= nbsp;           &nbs= p;   if (lcore_dma_map->cnt >=3D MAX_LCORE_NB) {=

> +       &= nbsp;           &nbs= p;            &= nbsp;     free(input);

> +       &= nbsp;           &nbs= p;            &= nbsp;     return -1;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   uint16_t lcore_id =3D atoi(token);

> +       &= nbsp;           &nbs= p;   lcore_dma_map->lcores[lcore_dma_map->cnt++] =3D lcore_= id;

> +

> +       &= nbsp;           &nbs= p;   token =3D strtok(NULL, ", ");

> +       &= nbsp; }

> +

> +       &= nbsp; free(input);

> +       &= nbsp; return 0;

> +}

> +

> +static int

> +parse_lcore_dma(struct test_configure *test= _case, const char *value) {

> +       &= nbsp; struct lcore_dma_map_t *lcore_dma_map;

> +       &= nbsp; char *input, *addrs;

> +       &= nbsp; char *ptrs[2];

> +       &= nbsp; char *start, *end, *substr;

> +       &= nbsp; uint16_t lcore_id;

> +       &= nbsp; int ret =3D 0;

> +

> +       &= nbsp; if (test_case =3D=3D NULL || value =3D=3D NULL)

> +       &= nbsp;           &nbs= p;   return -1;

> +

> +       &= nbsp; input =3D strndup(value, strlen(value) + 1);

> +       &= nbsp; addrs =3D input;

> +

> +       &= nbsp; while (*addrs =3D=3D '\0')

> +       &= nbsp;           &nbs= p;   addrs++;

> +       &= nbsp; if (*addrs =3D=3D '\0') {

> +       &= nbsp;           &nbs= p;   fprintf(stderr, "No input DMA addresses\n");<= /o:p>

> +       &= nbsp;           &nbs= p;   ret =3D -1;

> +       &= nbsp;           &nbs= p;   goto out;

> +       &= nbsp; }

> +

> +       &= nbsp; substr =3D strtok(addrs, ",");

> +       &= nbsp; if (substr =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;   fprintf(stderr, "No input DMA address\n");

> +       &= nbsp;           &nbs= p;   ret =3D -1;

> +       &= nbsp;           &nbs= p;   goto out;

> +       &= nbsp; }

> +

> +       &= nbsp; memset(&test_case->lcore_dma_map, 0, sizeof(struct<= /p>

> lcore_dma_map_t));

> +

> +       &= nbsp; do {

> +       &= nbsp;           &nbs= p;   if (rte_strsplit(substr, strlen(substr), ptrs, 2, '@') < = 0) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     fprintf(stderr, "Illegal DMA address\n&q= uot;);

> +       &= nbsp;           &nbs= p;            &= nbsp;     ret =3D -1;

> +       &= nbsp;           &nbs= p;            &= nbsp;     break;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   start =3D strstr(ptrs[0], "lcore");

> +       &= nbsp;           &nbs= p;   if (start =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     fprintf(stderr, "Illegal lcore\n");=

> +       &= nbsp;           &nbs= p;            &= nbsp;     ret =3D -1;

> +       &= nbsp;           &nbs= p;            &= nbsp;     break;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   start +=3D 5;

> +       &= nbsp;           &nbs= p;   lcore_id =3D strtol(start, &end, 0);

> +       &= nbsp;           &nbs= p;   if (end =3D=3D start) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     fprintf(stderr, "No input lcore ID or ID= %d is wrong\n",

> lcore_id);

> +       &= nbsp;           &nbs= p;            &= nbsp;     ret =3D -1;

> +       &= nbsp;           &nbs= p;            &= nbsp;     break;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   lcore_dma_map =3D &test_case->lcore_dma_map;

> +       &= nbsp;           &nbs= p;   if (lcore_dma_map->cnt >=3D MAX_LCORE_NB) {=

> +       &= nbsp;           &nbs= p;            &= nbsp;     fprintf(stderr, "lcores count error\n&qu= ot;);

> +       &= nbsp;           &nbs= p;            &= nbsp;     ret =3D -1;

> +       &= nbsp;           &nbs= p;            &= nbsp;     break;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   lcore_dma_map->lcores[lcore_dma_map->cnt] =3D lcore_id= ;

> +       &= nbsp;            &nb= sp;  strlcpy(lcore_dma_map->dma_names[lcore_dma_map->cnt],<= /o:p>

> ptrs[1],

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       RTE_DEV_NAME_MAX_LEN);

> +       &= nbsp;           &nbs= p;   lcore_dma_map->cnt++;

> +       &= nbsp;           &nbs= p;   substr =3D strtok(NULL, ",");

> +       &= nbsp; } while (substr !=3D NULL);

> +

> +out:

> +       &= nbsp; free(input);

> +       &= nbsp; return ret;

> +}

> +

> +static int

> +parse_entry(const char *value, struct test_= configure_entry *entry) {

> +       &= nbsp; char input[255] =3D {0};

> +       &= nbsp; char *args[MAX_PARAMS_PER_ENTRY];

> +       &= nbsp; int args_nr =3D -1;

> +       &= nbsp; int ret;

> +

> +       &= nbsp; if (value =3D=3D NULL || entry =3D=3D NULL)

> +       &= nbsp;           &nbs= p;   goto out;

> +

> +       &= nbsp; strncpy(input, value, 254);

> +       &= nbsp; if (*input =3D=3D '\0')

> +       &= nbsp;           &nbs= p;   goto out;

> +

> +       &= nbsp; ret =3D rte_strsplit(input, strlen(input), args, MAX_PARAMS_PER_ENTRY= ,

> ',');

> +       &= nbsp; if (ret !=3D 1 && ret !=3D 4)

> +       &= nbsp;           &nbs= p;   goto out;

> +

> +       &= nbsp; entry->cur =3D entry->first =3D (uint32_t)atoi(args[0]);

> +

> +       &= nbsp; if (ret =3D=3D 4) {

> +       &= nbsp;           &nbs= p;   args_nr =3D 4;

> +       &= nbsp;           &nbs= p;   entry->last =3D (uint32_t)atoi(args[1]);

> +       &= nbsp;           &nbs= p;   entry->incr =3D (uint32_t)atoi(args[2]);

> +       &= nbsp;           &nbs= p;   if (!strcmp(args[3], "MUL"))

> +       &= nbsp;           &nbs= p;            &= nbsp;     entry->op =3D OP_MUL;

> +       &= nbsp;           &nbs= p;   else if (!strcmp(args[3], "ADD"))

> +       &= nbsp;           &nbs= p;            &= nbsp;     entry->op =3D OP_ADD;

> +       &= nbsp;           &nbs= p;   else {

> +       &= nbsp;           &nbs= p;            &= nbsp;     args_nr =3D -1;

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("Invalid op %s.\n", args[3])= ;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp; } else {

> +       &= nbsp;           &nbs= p;   args_nr =3D 1;

> +       &= nbsp;           &nbs= p;   entry->op =3D OP_NONE;

> +       &= nbsp;           &nbs= p;   entry->last =3D 0;

> +       &= nbsp;           &nbs= p;   entry->incr =3D 0;

> +       &= nbsp; }

> +out:

> +       &= nbsp; return args_nr;

> +}

> +

> +static uint16_t

> +load_configs(const char *path)

> +{

> +       &= nbsp; struct rte_cfgfile *cfgfile;

> +       &= nbsp; int nb_sections, i;

> +       &= nbsp; struct test_configure *test_case;

> +       &= nbsp; char section_name[CFG_NAME_LEN];

> +       &= nbsp; const char *case_type;

> +       &= nbsp; const char *lcore_dma;

> +       &= nbsp; const char *mem_size_str, *buf_size_str, *ring_size_str,

> *kick_batch_str;

> +       &= nbsp; int args_nr, nb_vp;

> +       &= nbsp; bool is_dma;

> +

> +       &= nbsp; printf("config file parsing...\n");

> +       &= nbsp; cfgfile =3D rte_cfgfile_load(path, 0);

> +       &= nbsp; if (!cfgfile) {

> +       &= nbsp;           &nbs= p;   printf("Open configure file error.\n");=

> +       &= nbsp;           &nbs= p;   exit(1);

> +       &= nbsp; }

> +

> +       &= nbsp; nb_sections =3D rte_cfgfile_num_sections(cfgfile, NULL, 0);

> +       &= nbsp; if (nb_sections > MAX_TEST_CASES) {

> +       &= nbsp;           &nbs= p;   printf("Error: The maximum number of cases is %d.\n&quo= t;,

> MAX_TEST_CASES);

> +       &= nbsp;           &nbs= p;   exit(1);

> +       &= nbsp; }

> +

> +       &= nbsp; for (i =3D 0; i < nb_sections; i++) {

> +       &= nbsp;           &nbs= p;   snprintf(section_name, CFG_NAME_LEN, "case%d", i += 1);

> +       &= nbsp;           &nbs= p;   test_case =3D &test_cases[i];

> +       &= nbsp;           &nbs= p;   case_type =3D rte_cfgfile_get_entry(cfgfile, section_name,

> "type");

> +       &= nbsp;           &nbs= p;   if (case_type =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("Error: No case type in case %d, = the test will be

> finished here.\n",

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       i + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;     test_case->is_valid =3D false;<= /p>

> +       &= nbsp;           &nbs= p;            &= nbsp;     continue;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   if (strcmp(case_type, DMA_MEM_COPY) =3D=3D 0) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     test_case->test_type =3D TEST_TYPE_DMA_MEM= _COPY;

> +       &= nbsp;           &nbs= p;            &= nbsp;     test_case->test_type_str =3D DMA_MEM_COPY;=

> +       &= nbsp;           &nbs= p;            &= nbsp;     is_dma =3D true;

> +       &= nbsp;           &nbs= p;   } else if (strcmp(case_type, CPU_MEM_COPY) =3D=3D 0) {<= /o:p>

> +       &= nbsp;           &nbs= p;            &= nbsp;     test_case->test_type =3D TEST_TYPE_CPU_MEM= _COPY;

> +       &= nbsp;           &nbs= p;            &= nbsp;     test_case->test_type_str =3D CPU_MEM_COPY;=

> +       &= nbsp;           &nbs= p;            &= nbsp;     is_dma =3D false;

> +       &= nbsp;           &nbs= p;   } else {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("Error: Wrong test case type %s i= n case%d.\n",

> case_type, i + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;     test_case->is_valid =3D false;<= /p>

> +       &= nbsp;           &nbs= p;            &= nbsp;     continue;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   test_case->src_numa_node =3D

> (int)atoi(rte_cfgfile_get_entry(cfgfile,

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;            &nb= sp;            = section_name,

> "src_numa_node"));

> +       &= nbsp;           &nbs= p;   test_case->dst_numa_node =3D

> (int)atoi(rte_cfgfile_get_entry(cfgfile,

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;            &nb= sp;            = section_name,

> "dst_numa_node"));

> +       &= nbsp;           &nbs= p;   nb_vp =3D 0;

> +       &= nbsp;           &nbs= p;   mem_size_str =3D rte_cfgfile_get_entry(cfgfile, section_name= ,

> "mem_size");

> +       &= nbsp;           &nbs= p;   args_nr =3D parse_entry(mem_size_str, &test_case-

> >mem_size);

> +       &= nbsp;           &nbs= p;   if (args_nr < 0) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("parse error in case %d.\n",= i + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;     test_case->is_valid =3D false;<= /p>

> +       &= nbsp;           &nbs= p;            &= nbsp;     continue;

> +       &= nbsp;           &nbs= p;   } else if (args_nr =3D=3D 4)

> +       &= nbsp;           &nbs= p;            &= nbsp;     nb_vp++;

> +

> +       &= nbsp;           &nbs= p;   buf_size_str =3D rte_cfgfile_get_entry(cfgfile, section_name= ,

> "buf_size");

> +       &= nbsp;           &nbs= p;   args_nr =3D parse_entry(buf_size_str, &test_case->buf= _size);

> +       &= nbsp;           &nbs= p;   if (args_nr < 0) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("parse error in case %d.\n",= i + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;     test_case->is_valid =3D false;<= /p>

> +       &= nbsp;           &nbs= p;            &= nbsp;     continue;

> +       &= nbsp;           &nbs= p;   } else if (args_nr =3D=3D 4)

> +       &= nbsp;           &nbs= p;            &= nbsp;     nb_vp++;

> +

> +       &= nbsp;           &nbs= p;   if (is_dma) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     ring_size_str =3D rte_cfgfile_get_entry(cfgfi= le,

> section_name,

> +

>        &n= bsp;   "dma_ring_size");

> +       &= nbsp;           &nbs= p;            &= nbsp;     args_nr =3D parse_entry(ring_size_str, &t= est_case-

> >ring_size);

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (args_nr < 0) {

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("parse error in case %d.= \n", i + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       test_case->is_valid =3D false;

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       continue;

> +       &= nbsp;           &nbs= p;            &= nbsp;     } else if (args_nr =3D=3D 4)

> +       &= nbsp;           &nbs= p;             =             &nb= sp;      nb_vp++;

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     kick_batch_str =3D rte_cfgfile_get_entry(cfgf= ile,

> section_name, "kick_batch");<= /o:p>

> +       &= nbsp;           &nbs= p;            &= nbsp;     args_nr =3D parse_entry(kick_batch_str, &= test_case-

> >kick_batch);

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (args_nr < 0) {

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("parse error in case %d.= \n", i + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       test_case->is_valid =3D false;

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       continue;

> +       &= nbsp;           &nbs= p;            &= nbsp;     } else if (args_nr =3D=3D 4)

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       nb_vp++;

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     lcore_dma =3D rte_cfgfile_get_entry(cfgfile,<= o:p>

> section_name, "lcore_dma");

> +       &= nbsp;           &nbs= p;            &= nbsp;     int lcore_ret =3D parse_lcore_dma(test_case,<= o:p>

> lcore_dma);

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (lcore_ret < 0) {

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("parse lcore dma error i= n case %d.\n",

> i + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       test_case->is_valid =3D false;

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       continue;

> +       &= nbsp;           &nbs= p;            &= nbsp;     }

> +       &= nbsp;           &nbs= p;   } else {

> +       &= nbsp;           &nbs= p;            &= nbsp;     lcore_dma =3D rte_cfgfile_get_entry(cfgfile,<= o:p>

> section_name, "lcore");=

> +       &= nbsp;           &nbs= p;            &= nbsp;     int lcore_ret =3D parse_lcore(test_case, lcor= e_dma);

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (lcore_ret < 0) {

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("parse lcore error in ca= se %d.\n", i + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       test_case->is_valid =3D false;

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       continue;

> +       &= nbsp;           &nbs= p;            &= nbsp;     }

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   if (nb_vp > 1) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("Case %d error, each section can = only have a

> single variable parameter.\n",

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         i + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;     test_case->is_valid =3D false;<= /p>

> +       &= nbsp;           &nbs= p;            &= nbsp;     continue;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   test_case->cache_flush =3D

> +       &= nbsp;           &nbs= p;            &= nbsp;     (uint8_t)atoi(rte_cfgfile_get_entry(cfgfile,<= o:p>

> section_name, "cache_flush"));

> +       &= nbsp;           &nbs= p;   test_case->test_secs =3D

> (uint16_t)atoi(rte_cfgfile_get_entry(cfgfile= ,

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         section_name, "t= est_seconds"));

> +

> +       &= nbsp;           &nbs= p;   test_case->eal_args =3D rte_cfgfile_get_entry(cfgfile,

> section_name, "eal_args");

> +       &= nbsp;           &nbs= p;   test_case->is_valid =3D true;

> +       &= nbsp; }

> +

> +       &= nbsp; rte_cfgfile_close(cfgfile);

> +       &= nbsp; printf("config file parsing complete.\n\n");

> +       &= nbsp; return i;

> +}

> +

> +/* Parse the argument given in the command = line of the application */

> +static int append_eal_args(int argc, char *= *argv, const char

> +*eal_args, char **new_argv) {

> +       &= nbsp; int i;

> +       &= nbsp; char *tokens[MAX_EAL_PARAM_NB];

> +       &= nbsp; char args[MAX_EAL_PARAM_LEN] =3D {0};

> +       &= nbsp; int token_nb, new_argc =3D 0;

> +

> +       &= nbsp; for (i =3D 0; i < argc; i++) {

> +       &= nbsp;           &nbs= p;   if ((strcmp(argv[i], CMDLINE_CONFIG_ARG) =3D=3D 0) ||

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       (strcmp(argv[i], CMDLINE_RESULT_ARG)= =3D=3D 0))

> {

> +       &= nbsp;           &nbs= p;            &= nbsp;     i++;

> +       &= nbsp;           &nbs= p;            &= nbsp;     continue;

> +       &= nbsp;           &nbs= p;   }

> +       &= nbsp;           &nbs= p;   strlcpy(new_argv[new_argc], argv[i], MAX_EAL_PARAM_LEN);

> +       &= nbsp;           &nbs= p;   new_argc++;

> +       &= nbsp; }

> +

> +       &= nbsp; if (eal_args) {

> +       &= nbsp;           &nbs= p;   strlcpy(args, eal_args, MAX_EAL_PARAM_LEN);

> +       &= nbsp;           &nbs= p;   token_nb =3D rte_strsplit(args, strlen(args),

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         tokens, MAX_EAL_PARAM= _NB, ' ');

> +       &= nbsp;           &nbs= p;   for (i =3D 0; i < token_nb; i++)

> +       &= nbsp;           &nbs= p;            &= nbsp;     strlcpy(new_argv[new_argc++], tokens[i],=

> MAX_EAL_PARAM_LEN);

> +       &= nbsp; }

> +

> +       &= nbsp; return new_argc;

> +}

> +

> +int

> +main(int argc, char *argv[])

> +{

> +       &= nbsp; int ret;

> +       &= nbsp; uint16_t case_nb;

> +       &= nbsp; uint32_t i, nb_lcores;

> +       &= nbsp; pid_t cpid, wpid;

> +       &= nbsp; int wstatus;

> +       &= nbsp; char args[MAX_EAL_PARAM_NB][MAX_EAL_PARAM_LEN];

> +       &= nbsp; char *pargs[MAX_EAL_PARAM_NB];

> +       &= nbsp; char *cfg_path_ptr =3D NULL;

> +       &= nbsp; char *rst_path_ptr =3D NULL;

> +       &= nbsp; char rst_path[PATH_MAX];

> +       &= nbsp; int new_argc;

> +

> +       &= nbsp; memset(args, 0, sizeof(args));

> +

> +       &= nbsp; for (i =3D 0; i < RTE_DIM(pargs); i++)

> +       &= nbsp;           &nbs= p;   pargs[i] =3D args[i];

> +

> +       &= nbsp; for (i =3D 0; i < (uint32_t)argc; i++) {

> +       &= nbsp;           &nbs= p;   if (strncmp(argv[i], CMDLINE_CONFIG_ARG,

> MAX_LONG_OPT_SZ) =3D=3D 0)

> +       &= nbsp;           &nbs= p;            &= nbsp;     cfg_path_ptr =3D argv[i + 1];

> +       &= nbsp;           &nbs= p;   if (strncmp(argv[i], CMDLINE_RESULT_ARG,

> MAX_LONG_OPT_SZ) =3D=3D 0)

> +       &= nbsp;           &nbs= p;            &= nbsp;     rst_path_ptr =3D argv[i + 1];

> +       &= nbsp; }

> +       &= nbsp; if (cfg_path_ptr =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;   printf("Config file not assigned.\n");<= /p>

> +       &= nbsp;           &nbs= p;   return -1;

> +       &= nbsp; }

> +       &= nbsp; if (rst_path_ptr =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;   strlcpy(rst_path, cfg_path_ptr, PATH_MAX);

> +       &= nbsp;           &nbs= p;   char *token =3D strtok(basename(rst_path), ".");

> +       &= nbsp;           &nbs= p;   if (token =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("Config file error.\n");

> +       &= nbsp;           &nbs= p;            &= nbsp;     return -1;

> +       &= nbsp;           &nbs= p;   }

> +       &= nbsp;           &nbs= p;   strcat(token, "_result.csv");

> +       &= nbsp;           &nbs= p;   rst_path_ptr =3D rst_path;

> +       &= nbsp; }

> +

> +       &= nbsp; case_nb =3D load_configs(cfg_path_ptr);

> +       &= nbsp; fd =3D fopen(rst_path_ptr, "w");

> +       &= nbsp; if (fd =3D=3D NULL) {

> +       &= nbsp;           &nbs= p;   printf("Open output CSV file error.\n");

> +       &= nbsp;           &nbs= p;   return -1;

> +       &= nbsp; }

> +       &= nbsp; fclose(fd);

> +

> +       &= nbsp; printf("Running cases...\n");

> +       &= nbsp; for (i =3D 0; i < case_nb; i++) {

> +       &= nbsp;           &nbs= p;   if (!test_cases[i].is_valid) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("Invalid test case %d.\n\n",= i + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;     snprintf(output_str[0], MAX_OUTPUT_STR_LEN,

> "Invalid case %d\n", i +

> +1);

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     fd =3D fopen(rst_path_ptr, "a");

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (!fd) {

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("Open output CSV file er= ror.\n");

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       return 0;

> +       &= nbsp;           &nbs= p;            &= nbsp;     }

> +       &= nbsp;           &nbs= p;            &= nbsp;     output_csv(true);

> +       &= nbsp;           &nbs= p;            &= nbsp;     fclose(fd);

> +       &= nbsp;           &nbs= p;            &= nbsp;     continue;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   if (test_cases[i].test_type =3D=3D TEST_TYPE_NONE) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("No valid test type in test case = %d.\n\n", i + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;     snprintf(output_str[0], MAX_OUTPUT_STR_LEN,

> "Invalid case %d\n", i +

> +1);

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     fd =3D fopen(rst_path_ptr, "a");

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (!fd) {

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("Open output CSV file er= ror.\n");

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       return 0;

> +       &= nbsp;           &nbs= p;            &= nbsp;     }

> +       &= nbsp;           &nbs= p;            &= nbsp;     output_csv(true);

> +       &= nbsp;           &nbs= p;            &= nbsp;     fclose(fd);

> +       &= nbsp;           &nbs= p;            &= nbsp;     continue;

> +       &= nbsp;           &nbs= p;   }

> +

> +       &= nbsp;           &nbs= p;   cpid =3D fork();

> +       &= nbsp;           &nbs= p;   if (cpid < 0) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("Fork case %d failed.\n", i = + 1);

> +       &= nbsp;           &nbs= p;            &= nbsp;     exit(EXIT_FAILURE);

> +       &= nbsp;           &nbs= p;   } else if (cpid =3D=3D 0) {

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("\nRunning case %u\n\n", i += 1);

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     new_argc =3D append_eal_args(argc, argv,=

> test_cases[i].eal_args, pargs);

> +       &= nbsp;           &nbs= p;            &= nbsp;     ret =3D rte_eal_init(new_argc, pargs);

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (ret < 0)

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       rte_exit(EXIT_FAILURE, "Invalid= EAL

> arguments\n");

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     /* Check lcores. */

> +       &= nbsp;           &nbs= p;            &= nbsp;     nb_lcores =3D rte_lcore_count();

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (nb_lcores < 2)

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       rte_exit(EXIT_FAILURE,

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         "There should be= at least 2 worker

> lcores.\n");

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     fd =3D fopen(rst_path_ptr, "a");

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (!fd) {

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("Open output CSV file er= ror.\n");

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       return 0;

> +       &= nbsp;           &nbs= p;            &= nbsp;     }

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     output_env_info();

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     run_test(i + 1, &test_cases[i]);

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     /* clean up the EAL */

> +       &= nbsp;           &nbs= p;             =     rte_eal_cleanup();

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     fclose(fd);

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     printf("\nCase %u completed.\n\n", = i + 1);

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     exit(EXIT_SUCCESS);

> +       &= nbsp;           &nbs= p;   } else {

> +       &= nbsp;           &nbs= p;            &= nbsp;     wpid =3D waitpid(cpid, &wstatus, 0);=

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (wpid =3D=3D -1) {

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("waitpid error.\n")= ;

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       exit(EXIT_FAILURE);

> +       &= nbsp;           &nbs= p;            &= nbsp;     }

> +

> +       &= nbsp;           &nbs= p;            &= nbsp;     if (WIFEXITED(wstatus))

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("Case process exited. st= atus %d\n\n",

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         WEXITSTATUS(wstatus))= ;

> +       &= nbsp;           &nbs= p;            &= nbsp;     else if (WIFSIGNALED(wstatus))

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("Case process killed by = signal %d\n\n",

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         WTERMSIG(wstatus));

> +       &= nbsp;           &nbs= p;            &= nbsp;     else if (WIFSTOPPED(wstatus))

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("Case process stopped by=

> signal %d\n\n",

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;         WSTOPSIG(wstatus));

> +       &= nbsp;           &nbs= p;            &= nbsp;     else if (WIFCONTINUED(wstatus))

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("Case process continued.= \n\n");

> +       &= nbsp;           &nbs= p;            &= nbsp;     else

> +       &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;       printf("Case process unknown

> terminated.\n\n");

> +       &= nbsp;           &nbs= p;   }

> +       &= nbsp; }

> +

> +       &= nbsp; printf("Bye...\n");

> +       &= nbsp; return 0;

> +}

> +

> diff --git a/app/test-dma-perf/main.h b/app/= test-dma-perf/main.h new

> file mode 100644 index 0000000000..12bc3f4e3= f

> --- /dev/null

> +++ b/app/test-dma-perf/main.h

> @@ -0,0 +1,64 @@

> +/* SPDX-License-Identifier: BSD-3-Clause

> + * Copyright(c) 2023 Intel Corporation = ; */

> +

> +#ifndef _MAIN_H_

> +#define _MAIN_H_

> +

> +

> +#include <rte_common.h>

> +#include <rte_cycles.h>

> +#include <rte_dev.h>

> +

> +#define MAX_WORKER_NB 128

> +#define MAX_OUTPUT_STR_LEN 512

> +

> +#define MAX_DMA_NB 128

> +#define MAX_LCORE_NB 256

> +

> +extern char output_str[MAX_WORKER_NB + 1][M= AX_OUTPUT_STR_LEN];

> +

> +typedef enum {

> +       &= nbsp; OP_NONE =3D 0,

> +       &= nbsp; OP_ADD,

> +       &= nbsp; OP_MUL

> +} alg_op_type;

> +

> +struct test_configure_entry {

> +       &= nbsp; uint32_t first;

> +       &= nbsp; uint32_t last;

> +       &= nbsp; uint32_t incr;

> +       &= nbsp; alg_op_type op;

> +       &= nbsp; uint32_t cur;

> +};

> +

> +struct lcore_dma_map_t {

> +       &= nbsp; uint32_t lcores[MAX_WORKER_NB];

> +       &= nbsp; char dma_names[MAX_WORKER_NB][RTE_DEV_NAME_MAX_LEN];

> +       &= nbsp; int16_t dma_ids[MAX_WORKER_NB];

> +       &= nbsp; uint16_t cnt;

> +};

> +

> +struct test_configure {

> +       &= nbsp; bool is_valid;

> +       &= nbsp; uint8_t test_type;

> +       &= nbsp; const char *test_type_str;

> +       &= nbsp; uint16_t src_numa_node;

> +       &= nbsp; uint16_t dst_numa_node;

> +       &= nbsp; uint16_t opcode;

> +       &= nbsp; bool is_dma;

> +       &= nbsp; struct lcore_dma_map_t lcore_dma_map;

> +       &= nbsp; struct test_configure_entry mem_size;

> +       &= nbsp; struct test_configure_entry buf_size;

> +       &= nbsp; struct test_configure_entry ring_size;

> +       &= nbsp; struct test_configure_entry kick_batch;

> +       &= nbsp; uint8_t cache_flush;

> +       &= nbsp; uint32_t nr_buf;

> +       &= nbsp; uint16_t test_secs;

> +       &= nbsp; const char *eal_args;

> +       &= nbsp; uint8_t scenario_id;

> +};

> +

> +void mem_copy_benchmark(struct test_configu= re *cfg, bool is_dma);

> +

> +#endif /* _MAIN_H_ */

> diff --git a/app/test-dma-perf/meson.build b= /app/test-dma-

> perf/meson.build new file mode 100644 index = 0000000000..bd6c264002

> --- /dev/null

> +++ b/app/test-dma-perf/meson.build

> @@ -0,0 +1,17 @@

> +# SPDX-License-Identifier: BSD-3-Clause # C= opyright(c) 2019-2023

> +Intel Corporation

> +

> +# meson file, for building this app as part= of a main DPDK build.

> +

> +if is_windows

> +    build =3D false

> +    reason =3D 'not supporte= d on Windows'

> +    subdir_done()=

> +endif

> +

> +deps +=3D ['dmadev', 'mbuf', 'cfgfile']

> +

> +sources =3D files(

> +        = 'main.c',

> +        = 'benchmark.c',

> +)

> diff --git a/doc/guides/rel_notes/release_23= _07.rst

> b/doc/guides/rel_notes/release_23_07.rst

> index 4459144140..796cc5517d 100644

> --- a/doc/guides/rel_notes/release_23_07.rst=

> +++ b/doc/guides/rel_notes/release_23_07.rst=

> @@ -200,6 +200,12 @@ New Features=

>

>    Enhanced the GRO library t= o support TCP packets over IPv6 network.

>

> +* **Added DMA device performance test appli= cation.**

> +

> +  Added an new application to test the= performance of DMA device and CPU.

> +

> +  See the :doc:`../tools/dmaperf` for = more details.

> +

>

>  Removed Items

>  -------------

> diff --git a/doc/guides/tools/dmaperf.rst

> b/doc/guides/tools/dmaperf.rst new file mode= 100644 index

> 0000000000..c5f8a9406f

> --- /dev/null

> +++ b/doc/guides/tools/dmaperf.rst

> @@ -0,0 +1,103 @@

> +..  SPDX-License-Identifier: BSD-3-Cla= use

> +    Copyright(c) 2023 Intel = Corporation.

> +

> +dpdk-test-dma-perf Application

> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

> +

> +The ``dpdk-test-dma-perf`` tool is a Data P= lane Development Kit

> +(DPDK) application that enables testing the= performance of DMA

> +(Direct Memory

> +Access) devices available within DPDK. It p= rovides a test framework

> +to assess the performance of CPU and DMA de= vices under various

> +scenarios, such as varying buffer lengths. = Doing so provides insight

> +into the potential performance when using t= hese DMA devices for

> +acceleration in DPDK applications. It suppo= rts memory copy

> +performance tests for now,

> comparing the performance of CPU and DMA aut= omatically in various

> conditions with the help of a pre-set config= uration file.

> +

> +

> +Configuration

> +-------------

> +This application uses inherent DPDK EAL com= mand-line options as well

> +as custom command-line options in the appli= cation. An example

> +configuration file for the application is p= rovided and gives the

> +meanings for

> each parameter.

> +

> +Here is an extracted sample from the config= uration file (the complete

> +sample can be found in the application sour= ce directory):

> +

> +.. code-block:: ini

> +

> +   [case1]

> +   type=3DDMA_MEM_COPY=

> +   mem_size=3D10

> +   buf_size=3D64,8192,2,MUL<= /o:p>

> +   dma_ring_size=3D1024

> +   kick_batch=3D32

> +   src_numa_node=3D0

> +   dst_numa_node=3D0

> +   cache_flush=3D0

> +   test_seconds=3D2

> +   lcore_dma=3Dlcore10@0000:00:04.2, lcore11@0000:00:04.3=

> +   eal_args=3D--in-memory --file-= prefix=3Dtest

> +

> +   [case2]

> +   type=3DCPU_MEM_COPY=

> +   mem_size=3D10

> +   buf_size=3D64,8192,2,MUL<= /o:p>

> +   src_numa_node=3D0

> +   dst_numa_node=3D1

> +   cache_flush=3D0

> +   test_seconds=3D2

> +   lcore =3D 3, 4

> +   eal_args=3D--in-memory --no-pc= i

> +

> +The configuration file is divided into mult= iple sections, each

> +section

> represents a test case.

> +The four variables mem_size, buf_size, dma_= ring_size, and kick_batch

> +can

> vary in each test case.

> +The format for this is ``variable=3Dfirst,l= ast,increment,ADD\|MUL``.

> +This means that the first value of the vari= able is 'first', the last

> +value is 'last', 'increment' is the step si= ze, and ADD|MUL indicates

> +whether the change is by addition or multip= lication. Each case can

> +only have one

> variable change, and each change will genera= te a scenario, so each

> case can have multiple scenarios.=

> +

> +Parameter Definitions

> +---------------------

> +

> +- **type**: The type of the test. Currently= supported types are

> `DMA_MEM_COPY` and `CPU_MEM_COPY`.

> +- **mem_size**: The size of the memory foot= print.

> +- **buf_size**: The memory size of a single= operation.

> +- **dma_ring_size**: The DMA ring buffer si= ze. Must be a power of

> +two,

> and between 64 and 4096.

> +- **kick_batch**: The DMA operation batch s= ize, should be greater

> +than 1

> normally.

> +- **src_numa_node**: Controls the NUMA node= where the source memory

> is allocated.

> +- **dst_numa_node**: Controls the NUMA node= where the destination

> memory is allocated.

> +- **cache_flush**: Determines whether the c= ache should be flushed.

> +`1`

> indicates to flush and `0` to not flush.

> +- **test_seconds**: Controls the test time = for each scenario.

> +- **lcore_dma**: Specifies the lcore/DMA ma= pping.

> +- **lcore**: Specifies the lcore for CPU te= sting.

> +- **eal_args**: Specifies the EAL arguments= .

> +

> +.. Note::

> +

> +       &= nbsp; The mapping of lcore to DMA must be one-to-one and cannot be

> duplicated.

> +

> +To specify a configuration file, use the &q= uot;\-\-config" flag followed

> +by the path

> to the file.

> +

> +To specify a result file, use the "\-\= -result" flag followed by the

> +path to the file. If you do not specify a r= esult file, one will be

> +generated with the same name as the configu= ration file, with the

> +addition

> of "_result.csv" at the end.<= /o:p>

> +

> +

> +Running the Application

> +-----------------------

> +

> +Typical command-line invocation to execute = the application:

> +

> +.. code-block:: console

> +

> +   dpdk-test-dma-perf --config=3D= ./config_dma.ini

> + --result=3D./res_dma.csv

> +

> +Where `config_dma.ini` is the configuration= file, and `res_dma.csv`

> +will be the generated result file.

> +

> +After the tests, you can find the results i= n the `res_dma.csv` file.

> +

> +Limitations

> +-----------

> +

> +Currently, this tool only supports memory c= opy performance tests.

> +Additional enhancements are possible in the= future to support more

> +types

> of tests for DMA devices and CPUs.

> diff --git a/doc/guides/tools/index.rst b/do= c/guides/tools/index.rst

> index

> 6f84fc31ff..857572da96 100644

> --- a/doc/guides/tools/index.rst<= /p>

> +++ b/doc/guides/tools/index.rst<= /p>

> @@ -23,3 +23,4 @@ DPDK Tools User Guides

>      testregex=

>      testmldev=

>      dts=

> +    dmaperf

> --

> 2.40.1

>

>

>

> End of dev Digest, Vol 462, Issue 27

> ************************************

--_000_DM6PR11MB3516915C3BB0B8E4B288B4DD8E24ADM6PR11MB3516namp_--