From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2E11C424AA; Sat, 28 Jan 2023 14:32:18 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id BC01B40146; Sat, 28 Jan 2023 14:32:17 +0100 (CET) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id C5EFA40143 for ; Sat, 28 Jan 2023 14:32:15 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674912736; x=1706448736; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=+S016c4bwt550rDRHPItd0Tu+7OM4xH7atI0nS0oc0A=; b=djgOOUZfQNV55Fqyco9yS+uB0bkXZAuZLDlR7WDHhJlXay0ZO5thj3gU KzWoZH6gZ+dogEvNL5UFJR/6Lx2VOTgUdR/K6ZeNCBwQwMHnkhUN5EZIs H1IrXKrSja5YiW2LHFx09h2b9aQU+TEmRXvYpbH6DEvOgRTdicwtARXmU huXk4QszN+tJVUJnZCEonGmccuekI/WEF4C/PpsQoPXSS6jN9pj75REV1 d0IMo9eG091eT2B4XLKtwevzbNLjkMFFEvWulKqEPvaVweaCKPcL/RPF1 co52ZN2BJrFwjgeOWyEz4JLajbJksueDmtOgsTNHMMC4QlBJvPV0Fp5OX Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10604"; a="329419135" X-IronPort-AV: E=Sophos;i="5.97,254,1669104000"; d="scan'208";a="329419135" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jan 2023 05:32:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10604"; a="663595922" X-IronPort-AV: E=Sophos;i="5.97,254,1669104000"; d="scan'208";a="663595922" Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15]) by orsmga002.jf.intel.com with ESMTP; 28 Jan 2023 05:32:14 -0800 Received: from orsmsx602.amr.corp.intel.com (10.22.229.15) by ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.16; Sat, 28 Jan 2023 05:32:14 -0800 Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by orsmsx602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.16 via Frontend Transport; Sat, 28 Jan 2023 05:32:14 -0800 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (104.47.58.169) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.16; Sat, 28 Jan 2023 05:32:13 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Ajitrr5eO34B7DMcNbpIHjy3UFI5O8HHt/OLDbEtt5s3shhcaadizbmFg2ubBk2ahZ4foi6jecWciWvnDfNv96IFG1v8X3BkEez1znzUCz7PEjczQSHG3cOxOoXw6TiE9OytdKjCXN4z02dUd7R+70SvjQDSl/oEIUPkiDNWvU/sFRAEZDNNxdDC7bndka+JJYWDZa1xy8I7mbKwKl+QFnP5qWPgjQ6N+YVQkz4R7+yACOvZda/B7o8JWwibjULCqXpwhO+7EoB9ewKHqYGQQq+Zan1HnT19POeVbZFDEX92Eb7rxtoBtEMwjMAt7cnV937NgFTraLn4GMvpI9MoWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QD7pPUG9zqGtWc2sROqtGkG4mATXKBTZCUBtgH380Aw=; b=JtZhukS3QrAx4tAcXYdDX8zFZFFXaVOssttzDfsUfXqPrzJ5CK7K+huPPfvIlAyOo5ikUhgLpxoUplKJb9Ii9rh7reyHjsJtM5eMPznPfAMDRm/NyQ+UUHvejZWpD+rGA0mNgAisycuQrW+s4bzvWGa9FgsnOOQkGzxXrC8FHMtTjZvg/g4YozjPKUpOU1iE8bNy+RcEwzdLZ4kcMwA+fXejd3MjwBFJK66NUHtig7qteDS/1Y9hpXanMtEEr9p6d7AAVoQPwUMGxzdIaQW4jYsWI848JMecTYOZOktYb3OUtylC0UAeDQitJ0Mm4Bve08nZk67HemRajTZ4NKSuiQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from SN7PR11MB7019.namprd11.prod.outlook.com (2603:10b6:806:2ae::22) by BL3PR11MB6363.namprd11.prod.outlook.com (2603:10b6:208:3b6::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6043.23; Sat, 28 Jan 2023 13:32:06 +0000 Received: from SN7PR11MB7019.namprd11.prod.outlook.com ([fe80::bacc:6241:115b:b26]) by SN7PR11MB7019.namprd11.prod.outlook.com ([fe80::bacc:6241:115b:b26%9]) with mapi id 15.20.6043.028; Sat, 28 Jan 2023 13:32:06 +0000 From: "Jiang, Cheng1" To: "Richardson, Bruce" CC: "thomas@monjalon.net" , "mb@smartsharesystems.com" , "dev@dpdk.org" , "Hu, Jiayu" , "Ding, Xuan" , "Ma, WenwuX" , "Wang, YuanX" , "He, Xingguang" Subject: RE: [PATCH v3] app/dma-perf: introduce dma-perf application Thread-Topic: [PATCH v3] app/dma-perf: introduce dma-perf application Thread-Index: AQHZKnL8tM8ec+sHQUWegU+S/Kqmva6i0xEAgBCqxVA= Date: Sat, 28 Jan 2023 13:32:05 +0000 Message-ID: References: <20221220010619.31829-1-cheng1.jiang@intel.com> <20230117120526.39375-1-cheng1.jiang@intel.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SN7PR11MB7019:EE_|BL3PR11MB6363:EE_ x-ms-office365-filtering-correlation-id: 6549c23d-5d18-44d6-602a-08db01340c51 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: ipZbylKmUPfcALgwYc/D8yflWkQhNHz9+QTfW7uZi0ljrmF02lTvDLWlcGcmrhuGvyteCUSBPu0DCQNoKYYMDWmVAO+J0wnhQmyNdKz28ox/zWM4XtIbsWMQwaq2Jaemtq8y1kelNLxKYC8s6w0Xeby2o+YgEigxP3EcPERvk9djn4ZsZ0niLf3zXvnY9moS+sjDLoNpNcR7EAA+SLQJ1bpNLmyRawxSnxAcwz0TSeMsMPy4w+80CPUiNeyytzFf5B55J3qdYj/NxwyUPqV9Cu3yzk0UAlGUIw+OhQ0kwMa/H0zp1FtHt0mwfRtgtCwWOXkf/OjvVM0xjO8AvX54v3SRbe/L81UguwH2ypeA/9LIJgp8q2AjzCu2TNd2EB05HlUDvCzHHM1n6uODgVP4RnC2zO0d4kHpp8OKR2GB0dX5FnnwRlyZdSJUOuS/wZ+lFQOgpI4VbI2cLlgQlM/q9IxNLOdl0lg9xMEwdgGqwN4CCoUcvC5sBg1ffpn4FfUQ8blpL4/NLV7pWfTAc7NhCvaiKgxIp3fyD1m8SgsdAjVZVjf2lzr2949emCJc4nt8m2x6VkkJ94ocUIG54LDLyEEdSIYB0bLHjJxPB0VthOJD8LJQboDtRUX9k+yIrBeNVgRynQaXtGM1q1dHvX6668XmX8BFEYhQGL8SSyRtHZz/BaBY5/kkDzLWbW9lW5BFRMZPws9P13Qyz4sOr3zxxA== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SN7PR11MB7019.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230025)(39860400002)(396003)(136003)(376002)(346002)(366004)(451199018)(2906002)(38070700005)(5660300002)(82960400001)(8936002)(6862004)(52536014)(38100700002)(41300700001)(478600001)(107886003)(86362001)(71200400001)(33656002)(83380400001)(6636002)(76116006)(54906003)(55016003)(316002)(66574015)(7696005)(66476007)(64756008)(122000001)(4326008)(66556008)(8676002)(66946007)(66446008)(6506007)(53546011)(9686003)(186003)(26005); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?JtLHMoMJ/lt6ruEkS3gtQrt8Ce5TdNFz0Mh+3gLFr+OyzwLYiz7DGbRMtT?= =?iso-8859-1?Q?bMJIzg8qJTdgfPWlJBnCSNFT1G7ps27TaDqSwl/3M5agM+rxW8JTlh6aNY?= =?iso-8859-1?Q?hV4htGKuzMWVurWLBncYvKWhKY5knQ8dKbI+uHTLZgVyjFMVqOiZbC2t3s?= =?iso-8859-1?Q?qQFNLTD1yLAYnW1GS32XWVysYVoGx2Ac7wwoDO6OIHposqbYo1YEpfWZ0l?= =?iso-8859-1?Q?dOE0j8l8l0DBGBciFJv5k1vhriljUC0f3Xg0afqT8F+fUwPm2SQIuvmirR?= =?iso-8859-1?Q?9w+9gKUiHNN7a1GWAmzifsiKFQiBrLRYZlv9OCXf09OPPTAU95lDEDzIlp?= =?iso-8859-1?Q?FeKb5PJCZxImkf9TAX8R0Il11di0iYWVsiB0PP5EUIm6M0YqszMaii7OUQ?= =?iso-8859-1?Q?Vjzp9wO8k7rEol2PBPMA4tNmvY89oMQWwmqW8+h/aOTvMBGnpqa0KOsXvQ?= =?iso-8859-1?Q?ubtZIw9VNFs77qAR2vIcztf6+YmzXWGz+UCbIB2rHsTPAi5rVw05Tdn+bJ?= =?iso-8859-1?Q?m+F0Els/PcVp/ylim0S/46wjUJUihAuXNPlBcbKHEshr/Q058wqLTB4YNm?= =?iso-8859-1?Q?dOYKMjKcq/S11/vGSngCPbsx+H6wcMXBCq+vcnn4nqE91APW/gGrN7y0wR?= =?iso-8859-1?Q?zGCtmHLqOpQR2pO/gOt+WastykWVEaZP5g803Q8bK1mmJR7DI95SUdR/P1?= =?iso-8859-1?Q?99KzjtbQedgaP30DUblu2iQXgbz4WGb48p2PttHa3OB/44mVCcmPmmIi30?= =?iso-8859-1?Q?RRdnzmN8MyP/5ZRgnRVZkY7o6y1HcdKvQqsomUcK85MaPW0VrTmTPzKaF4?= =?iso-8859-1?Q?JWFnsKIRpkF6NXoeT8ayx930xg/tZ/DOm3y7CIjFmzh39GLWLB3RnsXtxn?= =?iso-8859-1?Q?Xh7KQFkNQr0p8W8E9bYl6YDOgvWxL9VYsZ/hniO2uTTjcCTow8yAFGWqYi?= =?iso-8859-1?Q?cznY5Z6HUuwbYnwR8aR1UCNGs37oWR0gwhNXw5ypFgcgIJVHMrG5YRa/11?= =?iso-8859-1?Q?1SiiN8pvimGH6T8svvxoLipKDyxthIPkLhK90sB0ZcCHUkhl2ZKaVPVAsw?= =?iso-8859-1?Q?eNlk9vxsomyRC/82QzLOlh26RzboXcvpUDlwnuUGvDEy0bUSiRKc1Sk1gS?= =?iso-8859-1?Q?wjeA6ZkSS2KgQVmkuhM0v2QtCk36+bjR3A0n2xzNNbOSCFTOFh1Y0GDrfU?= =?iso-8859-1?Q?e+gtVC2dAOCaE5bpSywRQNd8T/2hzJZyLcyyDUoDQV6LFcSFl2Enowx0sK?= =?iso-8859-1?Q?+rYOkTRjyasLKgjGhJ3k+69fRTBdFc01oGGvEEw+Kr+urM5yZWiGct4mXJ?= =?iso-8859-1?Q?SUuQwLt1y5dmwzCgMs6piT6Pm+wrpgdeVnq/qZQRCYqqz6RyBXRkAWZReB?= =?iso-8859-1?Q?6mF66cT3NfRLv4wOhJECDdfL1cnCPDJxkbuP98dnwJRkVFJ0RhYIB5BK9a?= =?iso-8859-1?Q?gj9PrX/KVtbk9KDr8mnWmsc5UG1rQPquuOkFAhyGlbXNee3/HartefH3ZS?= =?iso-8859-1?Q?lEPgT6ycjw3JhWIyfjc5ThI7LAFmLoJXyVBeK976KF+q7JCYS5Yxz1DuuI?= =?iso-8859-1?Q?/poTBr/FP1Ibb9JRBaaowdQ7zbPZrzEk0h0549znDCqQlOg50Qz/VEK3kP?= =?iso-8859-1?Q?Mlknf0hDOhx8d0sK1/9soFdgNc9dP3p7cw?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SN7PR11MB7019.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6549c23d-5d18-44d6-602a-08db01340c51 X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Jan 2023 13:32:05.9488 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: ij7Xis0DjM9Bb4AIAA7KZACqCUZtbVYpNzEk9QF1U3ZUey2pVy6LGPN/Pmko/nMRqEPuB1patFIpoQBDICul/A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL3PR11MB6363 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Bruce, Sorry for the late reply. We are in the Spring Festival holiday last week. Thanks for your comments. Replies are inline. Thanks, Cheng > -----Original Message----- > From: Richardson, Bruce > Sent: Wednesday, January 18, 2023 12:52 AM > To: Jiang, Cheng1 > Cc: thomas@monjalon.net; mb@smartsharesystems.com; dev@dpdk.org; Hu, > Jiayu ; Ding, Xuan ; Ma, WenwuX > ; Wang, YuanX ; He, > Xingguang > Subject: Re: [PATCH v3] app/dma-perf: introduce dma-perf application >=20 > On Tue, Jan 17, 2023 at 12:05:26PM +0000, Cheng Jiang wrote: > > There are many high-performance DMA devices supported in DPDK now, and > > these DMA devices can also be integrated into other modules of DPDK as > > accelerators, such as Vhost. Before integrating DMA into applications, > > developers need to know the performance of these DMA devices in > > various scenarios and the performance of CPUs in the same scenario, > > such as different buffer lengths. Only in this way can we know the > > target performance of the application accelerated by using them. This > > patch introduces a high-performance testing tool, which supports > > comparing the performance of CPU and DMA in different scenarios > > automatically with a pre-set config file. Memory Copy performance test = are > supported for now. > > > > Signed-off-by: Cheng Jiang > > Signed-off-by: Jiayu Hu > > Signed-off-by: Yuan Wang > > Acked-by: Morten Br=F8rup > > --- >=20 > More input based off trying running the application, including some thoug= hts on > the testing methodology below. >=20 >=20 > > +static void > > +output_result(uint8_t scenario_id, uint32_t lcore_id, uint16_t dev_id, > uint64_t ave_cycle, > > + uint32_t buf_size, uint32_t nr_buf, uint32_t memory, > > + float bandwidth, uint64_t ops, bool is_dma) { > > + if (is_dma) > > + printf("lcore %u, DMA %u:\n" > > + "average cycles: %" PRIu64 "," > > + " buffer size: %u, nr_buf: %u," > > + " memory: %uMB, frequency: %" PRIu64 ".\n", > > + lcore_id, > > + dev_id, > > + ave_cycle, > > + buf_size, > > + nr_buf, > > + memory, > > + rte_get_timer_hz()); > > + else > > + printf("lcore %u\n" > > + "average cycles: %" PRIu64 "," > > + " buffer size: %u, nr_buf: %u," > > + " memory: %uMB, frequency: %" PRIu64 ".\n", > > + lcore_id, > > + ave_cycle, > > + buf_size, > > + nr_buf, > > + memory, > > + rte_get_timer_hz()); > > + >=20 > The term "average cycles" is unclear here - is it average cycles per test= iteration, > or average cycles per buffer copy? The average cycles stands for average cycles per buffer copy, I'll clarify = it in the next version. >=20 >=20 > > + printf("Average bandwidth: %.3lfGbps, OPS: %" PRIu64 "\n", > > +bandwidth, ops); > > + >=20 > >=20 > > + > > +static inline void > > +do_dma_mem_copy(uint16_t dev_id, uint32_t nr_buf, uint16_t kick_batch, > uint32_t buf_size, > > + uint16_t mpool_iter_step, struct rte_mbuf **srcs, > struct rte_mbuf > > +**dsts) { > > + int64_t async_cnt =3D 0; > > + int nr_cpl =3D 0; > > + uint32_t index; > > + uint16_t offset; > > + uint32_t i; > > + > > + for (offset =3D 0; offset < mpool_iter_step; offset++) { > > + for (i =3D 0; index =3D i * mpool_iter_step + offset, index < nr_buf= ; > i++) { > > + if (unlikely(rte_dma_copy(dev_id, > > + 0, > > + srcs[index]->buf_iova + > srcs[index]->data_off, > > + dsts[index]->buf_iova + > dsts[index]->data_off, > > + buf_size, > > + 0) < 0)) { > > + rte_dma_submit(dev_id, 0); > > + while (rte_dma_burst_capacity(dev_id, 0) =3D=3D 0) > { > > + nr_cpl =3D rte_dma_completed(dev_id, 0, > MAX_DMA_CPL_NB, > > + NULL, NULL); > > + async_cnt -=3D nr_cpl; > > + } > > + if (rte_dma_copy(dev_id, > > + 0, > > + srcs[index]->buf_iova + > srcs[index]->data_off, > > + dsts[index]->buf_iova + > dsts[index]->data_off, > > + buf_size, > > + 0) < 0) { > > + printf("enqueue fail again at %u\n", > index); > > + printf("space:%d\n", > rte_dma_burst_capacity(dev_id, 0)); > > + rte_exit(EXIT_FAILURE, "DMA enqueue > failed\n"); > > + } > > + } > > + async_cnt++; > > + > > + /** > > + * When '&' is used to wrap an index, mask must be a > power of 2. > > + * That is, kick_batch must be 2^n. > > + */ > > + if (unlikely((async_cnt % kick_batch) =3D=3D 0)) { > > + rte_dma_submit(dev_id, 0); > > + /* add a poll to avoid ring full */ > > + nr_cpl =3D rte_dma_completed(dev_id, 0, > MAX_DMA_CPL_NB, NULL, NULL); > > + async_cnt -=3D nr_cpl; > > + } > > + } > > + > > + rte_dma_submit(dev_id, 0); > > + while (async_cnt > 0) { > > + nr_cpl =3D rte_dma_completed(dev_id, 0, > MAX_DMA_CPL_NB, NULL, NULL); > > + async_cnt -=3D nr_cpl; > > + } >=20 > I have a couple of concerns about the methodology for testing the HW DMA > performance. For example, the inclusion of that final block means that we= are > including the latency of the copy operation in the result. >=20 > If the objective of the test application is to determine if it is cheaper= for > software to offload a copy operation to HW or do it in SW, then the prima= ry > concern is the HW offload cost. That offload cost should remain constant > irrespective of the size of the copy - since all you are doing is writing= a descriptor > and reading a completion result. However, seeing the results of running t= he app, > I notice that the reported average cycles increases as the packet size in= creases, > which would tend to indicate that we are not giving a realistic measureme= nt of > offload cost. We are trying to compare the time required to complete a certain amount of work using DMA with the time required to complete it using CPU. I think in = addition to the offload cost, the capability of the DMA itself is also an important = factor to be considered. The offload cost should be constant , but when DMA copies memory of differe= nt lengths, the time costs are different. So the reported average cycles increases as t= he packet size increases. Therefore, this test result includes both offload cost and DMA operation co= st. To some extent, it should be a relative realistic measurement result. Do you think it makes sense to you? >=20 > The trouble then becomes how to do so in a more realistic manner. The mos= t > accurate way I can think of in a unit test like this is to offload > entries to the device and measure the cycles taken there. Then wait until= such > time as all copies are completed (to eliminate the latency time, which in= a real- > world case would be spent by a core doing something else), and then do a > second measurement of the time taken to process all the completions. In t= he > same way as for a SW copy, any time not spent in memcpy is not copy time,= for > HW copies any time spent not writing descriptors or reading completions i= s not > part of the offload cost. Agreed, we are thinking about adding offload cost as one of test results in= the future. >=20 > That said, doing the above is still not fully realistic, as a real-world = app will likely > still have some amount of other overhead, for example, polling occasional= ly for > completions in between doing other work (though one would expect this to = be > relatively cheap). Similarly, if the submission queue fills, the app may= have to > delay waiting for space to submit jobs, and therefore see some of the HW = copy > latency. >=20 > Therefore, I think the most realistic way to measure this is to look at t= he rate of > operations while processing is being done in the middle of the test. For = example, > if we have a simple packet processing application, running the applicatio= n just > doing RX and TX and measuring the rate allows us to determine the basic p= acket > I/O cost. Adding in an offload to HW for each packet and again measuring = the > rate, will then allow us to compute the true offload copy cost of the ope= ration, > and should give us a number that remains flat even as packet size increas= es. For > previous work done on vhost with DMA acceleration, I believe we saw exact= ly > that - while SW PPS reduced as packet size increased, with HW copies the = PPS > remained constant even as packet size increased. >=20 > The challenge to my mind, is therefore how to implement this in a suitabl= e unit- > test style way, to fit into the framework you have given here. I would su= ggest > that the actual performance measurement needs to be done - not on a total > time - but on a fixed time basis within each test. For example, when doin= g HW > copies, 1ms into each test run, we need to snapshot the completed entries= , and > then say 1ms later measure the number that have been completed since. In = this > way, we avoid the initial startup latency while we wait for jobs to start > completing, and we avoid the final latency as we await the last job to co= mplete. > We would also include time for some potentially empty polls, and if a que= ue size > is too small see that reflected in the performance too. I understand your concerns, but I think maybe we are not discussing the sam= e performance number here. We are trying to test the maximum bandwidth of DMA, and what you said is ho= w to measure the offload cost more accurately if I understand it correctly. I think these two performance data are both important. Maybe we can add you= r test methodology as one of performance aspect for DMA in the future, I need to reconsider it and get back to you later. Thanks a lot, Cheng >=20 > Thoughts, input from others? >=20 > /Bruce