From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id F3FDA41C12; Mon, 6 Feb 2023 15:20:20 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8B1AC427E9; Mon, 6 Feb 2023 15:20:20 +0100 (CET) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by mails.dpdk.org (Postfix) with ESMTP id 5882040FAE for ; Mon, 6 Feb 2023 15:20:18 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675693218; x=1707229218; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=RCPMxrk5gzaQr/8DSaRpVabMKMlk8rP8fwBdRuVDfKU=; b=HvXFTvQqUPilvlKUfZIOAy/59cwmJOmuy1Ohy4BxaK55u7K4ovnuIPpi 2hr6bp8mX58tY9B6rAOjB4gVggFisbsGpgdzFL40yDjhyc+QbK2roPm5h x+qjUvHBMkBVIBjR89Lud5nb1TBtV1gRYplcIO1E925FiRfpaTJBehMS8 0J/lIM3qaAtu/fAI7XIN5ecfOMQS4EQ3tolxHq5OrJ2uj0Ql6KGajydq5 CKxdrBOGZx60rGJhaosK9XmEHV+g+oxXL4nD8dS8H/IVi9EzZydcwazwT aSMpLoDyy1722TgQYF34AvSJOCdPS3UNmzeVnzvEQjWh1ifUSgHDp/d8o Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10612"; a="327848422" X-IronPort-AV: E=Sophos;i="5.97,276,1669104000"; d="scan'208";a="327848422" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 06:20:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10612"; a="790437882" X-IronPort-AV: E=Sophos;i="5.97,276,1669104000"; d="scan'208";a="790437882" Received: from fmsmsx603.amr.corp.intel.com ([10.18.126.83]) by orsmga004.jf.intel.com with ESMTP; 06 Feb 2023 06:20:16 -0800 Received: from fmsmsx601.amr.corp.intel.com (10.18.126.81) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.16; Mon, 6 Feb 2023 06:20:15 -0800 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx601.amr.corp.intel.com (10.18.126.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.16 via Frontend Transport; Mon, 6 Feb 2023 06:20:15 -0800 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (104.47.56.174) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.16; Mon, 6 Feb 2023 06:20:15 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=b9LYj7vSxi9YkewIckMtP1yGzgc8148dy+GKDT5YcdhlrVYXDPS88L1jfbJaXc3u+bvgd33R99yuyy2oh3CFwf4CLtXoe2AJlzUm7B4mfufiRUfQedDfBOtJ7++3/8B1YadebXDGUJO2rO9irki6kDYUGAd1AbKj3eyY4lI1tTZhBcIT0QW8zQAGpP+FmFFHqNgbnP/3ms+9tYHUAh8wiYG1nNyGwVGBjUD2pTwasY4yM7sOBpSaY/L2Qibbi2rTmih89CKh3WKj+bSXolKi1xx3HGX8t9GyM1MjXm7p5eUtYF0JECVfXhVf9uyaqNAfXs4SkVzLkyrdMgk22PYcTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XUuN1NSnDS1ScTiFJeVjLHCWwK5I3WFFqr33hj2fKvs=; b=eFGLxuuF+Aii+pwhyoPZg+yYEQ3jnHT+/a6K6kdP6cLuvZZtQJKmZOViROlO0coQdEUVoLGtcSmQYjrdqKKe/UeUwwRAjZCOCaw+EfkZGpHBBfdVp/EZA45F/a1/R5WdtcGqVMyMAPjazfkBBGcWaycan16AUqSsTEc+1m/a6rDwFbKywuB0w4lrSKvPqDtZB9I0Z8HHCsqM4X8fSgbjhoxLevZvHqjlEt/n3bmnNJPyUwXZIKJL6YOEfI7Y2szMZFq4LMEF2eqZNLw5eTkL6y2+kyfuJFTpd0e9UHz6/oBo0qVy1lxAv4PNub3dpQTm1HhhJsoLdLHGqbVoDyFK1A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from SN7PR11MB7019.namprd11.prod.outlook.com (2603:10b6:806:2ae::22) by CY5PR11MB6488.namprd11.prod.outlook.com (2603:10b6:930:30::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.35; Mon, 6 Feb 2023 14:20:13 +0000 Received: from SN7PR11MB7019.namprd11.prod.outlook.com ([fe80::bacc:6241:115b:b26]) by SN7PR11MB7019.namprd11.prod.outlook.com ([fe80::bacc:6241:115b:b26%8]) with mapi id 15.20.6064.034; Mon, 6 Feb 2023 14:20:13 +0000 From: "Jiang, Cheng1" To: "Richardson, Bruce" CC: "thomas@monjalon.net" , "mb@smartsharesystems.com" , "dev@dpdk.org" , "Hu, Jiayu" , "Ding, Xuan" , "Ma, WenwuX" , "Wang, YuanX" , "He, Xingguang" Subject: RE: [PATCH v3] app/dma-perf: introduce dma-perf application Thread-Topic: [PATCH v3] app/dma-perf: introduce dma-perf application Thread-Index: AQHZKnL8tM8ec+sHQUWegU+S/Kqmva6i0xEAgBCqxVCAA0VnAIALS4LA Date: Mon, 6 Feb 2023 14:20:13 +0000 Message-ID: References: <20221220010619.31829-1-cheng1.jiang@intel.com> <20230117120526.39375-1-cheng1.jiang@intel.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SN7PR11MB7019:EE_|CY5PR11MB6488:EE_ x-ms-office365-filtering-correlation-id: 6b6b6b8a-b9f0-434b-517e-08db084d4306 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: O8+RSIiIohBnIOaBgCvkYenWm+KJwnB8KMEMZGFxJ/gB3iOC6krYyVa3uGPEWrEU7uEeb2n5xV0tTEnIvavptxGnYbvYQNk0TtWIBKbGknfMjmBnGXjR1fbJaz4Yvqqr/P6+v6GwbEpBLN7qJ5IbGuAZD+UYeXhoVpq8+v36TbGAnafq7PzP3w0S3LTd1MHFim+d3o2ZddV+HrdFaJS5sqOEdYxhLxrcVIhUhGUeRZohI/mKJb0lK7rgfOqS9RVqd+3aBcNddK7K5WUb1UHPNEicYkO7WHNTDVWr97O4W4T9zppTmubTfmrfCiulhdlHaKzepmnlcw5qostJ4wJ/gGXiAjBOe517kVk2fTimUhBpwsB5Uzz6NJinzFP5C3yrKrdshj85PGI7K9xHCAJafejk6ncFlwlkU37JhoBGv6oneEslE1zLYJyOS3Ubn8Z6WKHJ+yDHPYSolvQ7rkh/avKb8dPa+QbBaCUiJpUUsc7lzhC1z8t6bL+3ofzFx3jfmHrX/jOEA2+CCGqLtjK2XEnPgau0p1cfSuHqJw8v2CKgP75iAwwBYnerUkN3j7Q05SPZTPjMvwJlRRGQtX3pKYDdqehK0XrS6kalrBD8FNPe1kDFhVfX0jHhSpkGStN9ecf91q8ml2FG3REHX0zeBvWEzGmHvBGM8qjdmxBCrTZ40IAqVi/xrf1D7h66AMs1bH2Tt8f8JF9kEcJsHTsgdQ== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SN7PR11MB7019.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230025)(376002)(39860400002)(396003)(366004)(136003)(346002)(451199018)(76116006)(66946007)(66556008)(6636002)(54906003)(4326008)(41300700001)(8676002)(8936002)(64756008)(66446008)(66476007)(316002)(6862004)(52536014)(5660300002)(122000001)(86362001)(38100700002)(38070700005)(33656002)(82960400001)(6506007)(186003)(26005)(7696005)(71200400001)(9686003)(53546011)(107886003)(66574015)(55016003)(2906002)(30864003)(478600001)(83380400001); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?w7sPLsYrGsiuCvhOyFv+S4lkFHf/qrXtpEusH2TZ5WkjB8CtbtHNIBMilB?= =?iso-8859-1?Q?LdWTdQhNTyty7LsmPUFJW0Ni0JSIG731/hh5VV/gvQOFKuLH3Yi5+tywlK?= =?iso-8859-1?Q?RMeLgSj0nWOgTmn7xZxnnKsd0MzkJsLOT1pzx/ZVcBJiaxO/7YtTMZkEYj?= =?iso-8859-1?Q?W0jXyAO9bZw6EHdEZdLAUyJVkRHV4KG5CCcc0ZTU18m5DH3XQqi7Kkn5Us?= =?iso-8859-1?Q?EMnjd5CP2/Sft22jEVagsZzCIYwzBOBQXHQYZ2mWSd1BSU406LhPAYZCXW?= =?iso-8859-1?Q?raUqQvsWj2VuLgT7GKGu+dJ5Ws5uahrExG9Dr5sm4BL0QE0C9DhB/RROBF?= =?iso-8859-1?Q?+v1fywF5viqBn5zKmRkjtpt+wvhXjd3TOGpAkbBWezvVGczAFxgYXY7xfW?= =?iso-8859-1?Q?H2FJk0IotYlgSbhY8vQmh1pP0ntzjf/kQt1uTdhvhAzUJibOEyV8sDnx67?= =?iso-8859-1?Q?YKOCmOfMgb9JFHeUGAbw4gZJvhHPuozfClyqnbjbvpdqYBjOWoMQfF1/qA?= =?iso-8859-1?Q?gQSJRXrr2oBJEiGLt8TKJTjfDHBZuVjCLNlyw5Yej2ePiCCAN369Dp9HaA?= =?iso-8859-1?Q?4c/G1PWVM0q+AJOmiviAXoavjknYWqgOV/iD3IkJn9LcRh8bzNGU2PppBg?= =?iso-8859-1?Q?oJdn5Z/3kcwOyoU4B5rWgJiuLieCq8y8UKhND8oqnQnsfejb8H/EdPEkNo?= =?iso-8859-1?Q?KZjD1snVePeOESk/4951wpIbuPlFM85H5K0DTR12XvORtVF2hR7pEZc/sJ?= =?iso-8859-1?Q?0q4CuSWccqZUiQV4nNFAkQJRV1YJuFkAbkaaeD/w62x0jHzHi7J1GIX7cP?= =?iso-8859-1?Q?Ey2eqDxqZOQ+5kaDURgyUxScxmyb8OtwLZcAphbE4OSLyJy6aZXlF+K5cK?= =?iso-8859-1?Q?v9pnnbFgZd6TG2kYKblG2tzGaIWIv8aZt9JG3USBjw3FZFbAq+NIdvlXYt?= =?iso-8859-1?Q?N7KDQGyALYrhZbuTRQNWRHx9ZX775slj953TMoz527EBwgPSqedq0+VgUf?= =?iso-8859-1?Q?TXd3wAXMpzb03apHYREGH6tWQ5rTouoOAqf12sjrxNFF+CilxZiv980Q/4?= =?iso-8859-1?Q?ES60ZLClThmTx0RF+G4Iik4sboJ+zNLFhhve70DvjT4rQo7tvuxWQOnxUY?= =?iso-8859-1?Q?lnjtpf5khGx+ciS4ez+crgfdEz7xL/X8Wc3aVVxF/4PCuJZTLkkaw5YqDJ?= =?iso-8859-1?Q?G4fS0FRwCAUQs4tKAK//Nz2/WeSLDRTWEgHZtNFtBvnXTYdXKvRSy06CLa?= =?iso-8859-1?Q?/DlCk9kxYa4rdUkC7LVdFRWtLOhSAXjNOP9Ocu7yXn02x+bknd1EQqyyzX?= =?iso-8859-1?Q?FMBixshMHUMKUJcZcS1XZWnAz67FIWMLpvwK+2MCEfiOnPDQI4qDyDDn6c?= =?iso-8859-1?Q?QQ0+3PzhwBH8k+IcMW5aB8XIZ8sjdxlU48BikOKQ7vbFF1LG0TCipX0mA+?= =?iso-8859-1?Q?KwRj6CIVMcu23Tr0J2zjnIz2AVC0T1QSzFF8j7pfkCjpK2xxX25co5I2Vp?= =?iso-8859-1?Q?DLUZNm352C73dHvH0Y1jC3EdR1hOpz/KHJvRppqPWNhBglhxyj4ALEcrSN?= =?iso-8859-1?Q?0xMTxPjwo2BH2OOARpvlGnJIffy9ROFx4fvvBMjgBvFfQxdesIci5dIz6s?= =?iso-8859-1?Q?8PcHFyqk9w4mtMwb3mF9KQahInj5ZHyRAT?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SN7PR11MB7019.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6b6b6b8a-b9f0-434b-517e-08db084d4306 X-MS-Exchange-CrossTenant-originalarrivaltime: 06 Feb 2023 14:20:13.2892 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: b/B2IUASS3elr4XiyzLV3YJVbsefaM/NWp3dg0Z/KzDGl5HhpMpjUoa0YH/XgcuKYklLPlRsLRgeiauG4yh3Cg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR11MB6488 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Bruce, Replies are inline, Thank, Cheng > -----Original Message----- > From: Richardson, Bruce > Sent: Monday, January 30, 2023 5:20 PM > To: Jiang, Cheng1 > Cc: thomas@monjalon.net; mb@smartsharesystems.com; dev@dpdk.org; > Hu, Jiayu ; Ding, Xuan ; Ma, > WenwuX ; Wang, YuanX > ; He, Xingguang > Subject: Re: [PATCH v3] app/dma-perf: introduce dma-perf application >=20 > On Sat, Jan 28, 2023 at 01:32:05PM +0000, Jiang, Cheng1 wrote: > > Hi Bruce, > > > > Sorry for the late reply. We are in the Spring Festival holiday last we= ek. > > Thanks for your comments. > > Replies are inline. > > > > Thanks, > > Cheng > > > > > -----Original Message----- > > > From: Richardson, Bruce > > > Sent: Wednesday, January 18, 2023 12:52 AM > > > To: Jiang, Cheng1 > > > Cc: thomas@monjalon.net; mb@smartsharesystems.com; > dev@dpdk.org; Hu, > > > Jiayu ; Ding, Xuan ; Ma, > > > WenwuX ; Wang, YuanX > ; > > > He, Xingguang > > > Subject: Re: [PATCH v3] app/dma-perf: introduce dma-perf application > > > > > > On Tue, Jan 17, 2023 at 12:05:26PM +0000, Cheng Jiang wrote: > > > > There are many high-performance DMA devices supported in DPDK > now, > > > > and these DMA devices can also be integrated into other modules of > > > > DPDK as accelerators, such as Vhost. Before integrating DMA into > > > > applications, developers need to know the performance of these DMA > > > > devices in various scenarios and the performance of CPUs in the > > > > same scenario, such as different buffer lengths. Only in this way > > > > can we know the target performance of the application accelerated > > > > by using them. This patch introduces a high-performance testing > > > > tool, which supports comparing the performance of CPU and DMA in > > > > different scenarios automatically with a pre-set config file. > > > > Memory Copy performance test are > > > supported for now. > > > > > > > > Signed-off-by: Cheng Jiang > > > > Signed-off-by: Jiayu Hu > > > > Signed-off-by: Yuan Wang > > > > Acked-by: Morten Br=F8rup > > > > --- > > > > > > More input based off trying running the application, including some > > > thoughts on the testing methodology below. > > > > > > > > > > +static void > > > > +output_result(uint8_t scenario_id, uint32_t lcore_id, uint16_t > > > > +dev_id, > > > uint64_t ave_cycle, > > > > + uint32_t buf_size, uint32_t nr_buf, uint32_t memory, > > > > + float bandwidth, uint64_t ops, bool is_dma) { > > > > + if (is_dma) > > > > + printf("lcore %u, DMA %u:\n" > > > > + "average cycles: %" PRIu64 "," > > > > + " buffer size: %u, nr_buf: %u," > > > > + " memory: %uMB, frequency: %" PRIu64 > ".\n", > > > > + lcore_id, > > > > + dev_id, > > > > + ave_cycle, > > > > + buf_size, > > > > + nr_buf, > > > > + memory, > > > > + rte_get_timer_hz()); > > > > + else > > > > + printf("lcore %u\n" > > > > + "average cycles: %" PRIu64 "," > > > > + " buffer size: %u, nr_buf: %u," > > > > + " memory: %uMB, frequency: %" PRIu64 ".\n", > > > > + lcore_id, > > > > + ave_cycle, > > > > + buf_size, > > > > + nr_buf, > > > > + memory, > > > > + rte_get_timer_hz()); > > > > + > > > > > > The term "average cycles" is unclear here - is it average cycles per > > > test iteration, or average cycles per buffer copy? > > > > The average cycles stands for average cycles per buffer copy, I'll clar= ify it in > the next version. > > > > > > > > > > > > + printf("Average bandwidth: %.3lfGbps, OPS: %" PRIu64 "\n", > > > > +bandwidth, ops); > > > > + > > > > > > > > > > > > > + > > > > +static inline void > > > > +do_dma_mem_copy(uint16_t dev_id, uint32_t nr_buf, uint16_t > > > > +kick_batch, > > > uint32_t buf_size, > > > > + uint16_t mpool_iter_step, struct rte_mbuf **srcs, > > > struct rte_mbuf > > > > +**dsts) { > > > > + int64_t async_cnt =3D 0; > > > > + int nr_cpl =3D 0; > > > > + uint32_t index; > > > > + uint16_t offset; > > > > + uint32_t i; > > > > + > > > > + for (offset =3D 0; offset < mpool_iter_step; offset++) { > > > > + for (i =3D 0; index =3D i * mpool_iter_step + offset, index < > > > > +nr_buf; > > > i++) { > > > > + if (unlikely(rte_dma_copy(dev_id, > > > > + 0, > > > > + srcs[index]->buf_iova + > > > srcs[index]->data_off, > > > > + dsts[index]->buf_iova + > > > dsts[index]->data_off, > > > > + buf_size, > > > > + 0) < 0)) { > > > > + rte_dma_submit(dev_id, 0); > > > > + while (rte_dma_burst_capacity(dev_id, 0) =3D=3D > 0) > > > { > > > > + nr_cpl =3D rte_dma_completed(dev_id, > 0, > > > MAX_DMA_CPL_NB, > > > > + NULL, NULL); > > > > + async_cnt -=3D nr_cpl; > > > > + } > > > > + if (rte_dma_copy(dev_id, > > > > + 0, > > > > + srcs[index]->buf_iova + > > > srcs[index]->data_off, > > > > + dsts[index]->buf_iova + > > > dsts[index]->data_off, > > > > + buf_size, > > > > + 0) < 0) { > > > > + printf("enqueue fail again at %u\n", > > > index); > > > > + printf("space:%d\n", > > > rte_dma_burst_capacity(dev_id, 0)); > > > > + rte_exit(EXIT_FAILURE, "DMA > enqueue > > > failed\n"); > > > > + } > > > > + } > > > > + async_cnt++; > > > > + > > > > + /** > > > > + * When '&' is used to wrap an index, mask must be a > > > power of 2. > > > > + * That is, kick_batch must be 2^n. > > > > + */ > > > > + if (unlikely((async_cnt % kick_batch) =3D=3D 0)) { > > > > + rte_dma_submit(dev_id, 0); > > > > + /* add a poll to avoid ring full */ > > > > + nr_cpl =3D rte_dma_completed(dev_id, 0, > > > MAX_DMA_CPL_NB, NULL, NULL); > > > > + async_cnt -=3D nr_cpl; > > > > + } > > > > + } > > > > + > > > > + rte_dma_submit(dev_id, 0); > > > > + while (async_cnt > 0) { > > > > + nr_cpl =3D rte_dma_completed(dev_id, 0, > > > MAX_DMA_CPL_NB, NULL, NULL); > > > > + async_cnt -=3D nr_cpl; > > > > + } > > > > > > I have a couple of concerns about the methodology for testing the HW > > > DMA performance. For example, the inclusion of that final block > > > means that we are including the latency of the copy operation in the > result. > > > > > > If the objective of the test application is to determine if it is > > > cheaper for software to offload a copy operation to HW or do it in > > > SW, then the primary concern is the HW offload cost. That offload > > > cost should remain constant irrespective of the size of the copy - > > > since all you are doing is writing a descriptor and reading a > > > completion result. However, seeing the results of running the app, I > > > notice that the reported average cycles increases as the packet size > > > increases, which would tend to indicate that we are not giving a real= istic > measurement of offload cost. > > > > We are trying to compare the time required to complete a certain > > amount of work using DMA with the time required to complete it using > > CPU. I think in addition to the offload cost, the capability of the DMA= itself > is also an important factor to be considered. > > The offload cost should be constant , but when DMA copies memory of > > different lengths, the time costs are different. So the reported averag= e > cycles increases as the packet size increases. > > Therefore, this test result includes both offload cost and DMA > > operation cost. To some extent, it should be a relative realistic > measurement result. > > > > Do you think it makes sense to you? > > >=20 > Hi, >=20 > Yes, I get your point about the job latency being different when the > packet/copy sizes increase, but on the other hand, as I state above the a= ctual > cycle cost to the application should not increase. If any application is = doing > what this test app is doing, just sitting around waiting for job completi= on (in > the fast path), then it is likely that the programmer should look at impr= oving > the offload into the app. >=20 > The main issue here is that by outputting a single number, you are mixing > two separate values - both offload cost and job latency. If you want to s= how > the effects of larger/smaller packets on both, then you should output bot= h > values separately. For most applications where you will offload copies an= d do > other work while the copy is being done, the offload cost is of primary > concern. For some applications the latency figure may also be important, = but > in those cases the user will want to see the latency called out explicitl= y, not > just mixed up in a single figure with offload cost. Sure, makes sense to me, thanks. >=20 > > > > > > The trouble then becomes how to do so in a more realistic manner. > > > The most accurate way I can think of in a unit test like this is to > > > offload entries to the device and measure the cycles > > > taken there. Then wait until such time as all copies are completed > > > (to eliminate the latency time, which in a real- world case would be > > > spent by a core doing something else), and then do a second > > > measurement of the time taken to process all the completions. In the > > > same way as for a SW copy, any time not spent in memcpy is not copy > > > time, for HW copies any time spent not writing descriptors or reading > completions is not part of the offload cost. > > > > Agreed, we are thinking about adding offload cost as one of test result= s in > the future. > > > > > > > > That said, doing the above is still not fully realistic, as a > > > real-world app will likely still have some amount of other overhead, > > > for example, polling occasionally for completions in between doing > > > other work (though one would expect this to be relatively cheap). > > > Similarly, if the submission queue fills, the app may have to delay > > > waiting for space to submit jobs, and therefore see some of the HW co= py > latency. > > > > > > Therefore, I think the most realistic way to measure this is to look > > > at the rate of operations while processing is being done in the > > > middle of the test. For example, if we have a simple packet > > > processing application, running the application just doing RX and TX > > > and measuring the rate allows us to determine the basic packet I/O > > > cost. Adding in an offload to HW for each packet and again measuring > > > the rate, will then allow us to compute the true offload copy cost > > > of the operation, and should give us a number that remains flat even > > > as packet size increases. For previous work done on vhost with DMA > > > acceleration, I believe we saw exactly that - while SW PPS reduced as > packet size increased, with HW copies the PPS remained constant even as > packet size increased. > > > > > > The challenge to my mind, is therefore how to implement this in a > > > suitable unit- test style way, to fit into the framework you have > > > given here. I would suggest that the actual performance measurement > > > needs to be done - not on a total time - but on a fixed time basis > > > within each test. For example, when doing HW copies, 1ms into each > > > test run, we need to snapshot the completed entries, and then say > > > 1ms later measure the number that have been completed since. In this > > > way, we avoid the initial startup latency while we wait for jobs to s= tart > completing, and we avoid the final latency as we await the last job to > complete. > > > We would also include time for some potentially empty polls, and if > > > a queue size is too small see that reflected in the performance too. > > > > I understand your concerns, but I think maybe we are not discussing the > same performance number here. > > We are trying to test the maximum bandwidth of DMA, and what you said > is how to measure the offload cost more accurately if I understand it > correctly. > > I think these two performance data are both important. Maybe we can > > add your test methodology as one of performance aspect for DMA in the > future, I need to reconsider it and get back to you later. > > >=20 > Max bandwidth of HW is a third and separate number from that of offload- > cost and latency. Again, it should be measured and reported separately if= you > want the app to provide it. OK, got it. We will try to implement such test method in the next version. Thanks. Cheng >=20 > Regards, >=20 > /Bruce